Parity Sync will be completed in... 1,428 days (and counting)!

QPlus7 · July 13, 2021

Hi, all.

I've been having problems with my system for a while. I haven't powered the server on in about 6 months but it was having problems back then.

Today, I powered it on and was able to fix one of my issues where Docker wasn't starting.

My array started fine but I lost my parity disk. It showed up in the Unassigned Devices section, though.

I stopped the array and added the drive as a parity disk and the sync started. I'm getting tons of errors on Disk 2 and it is going extremely slow. To the point where the estimated completion time is fluctuating into the 1000s of days (it's gone up to over 3,500).

Is there anything I can do or am I screwed since I lost my parity drive?

I've attached my diagnostics for reference.

Any help is greatly appreciated - thanks in advance!

via-nas01-diagnostics-20210713-0032.zip

JorgeB · July 13, 2021

SMART report for disk2 is incomplete, see if you can get a manual SMART report, but the disk does appear to be failing:

smartctl -x /dev/sdf

QPlus7 · July 13, 2021

4 hours ago, JorgeB said:
SMART report for disk2 is incomplete, see if you can get a manual SMART report, but the disk does appear to be failing:
smartctl -x /dev/sdf

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Blue
Device Model: WDC WD10EZEX-08WN4A0
Serial Number: WD-WCC6Y0AER31Z
LU WWN Device Id: 5 0014ee 26679ccf1
Firmware Version: 02.01A02
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 3b
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Jul 13 09:21:52 2021 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x82)   Offline data collection activity
                   was completed without error.
                   Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0)   The previous self-test routine completed
                   without error or no self-test has ever
                   been run.
Total time to complete Offline
data collection:        (11040) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                   Auto Offline data collection on/off support.
                   Suspend Offline collection upon new
                   command.
                   Offline surface scan supported.
                   Self-test supported.
                   Conveyance Self-test supported.
                   Selective Self-test supported.
SMART capabilities: (0x0003)   Saves SMART data before entering
                   power-saving mode.
                   Supports SMART auto save timer.
Error logging capability: (0x01)   Error logging supported.
                   General Purpose Logging supported.
Short self-test routine
recommended polling time:    ( 2) minutes.
Extended self-test routine
recommended polling time:    ( 114) minutes.
Conveyance self-test routine
recommended polling time:    ( 5) minutes.
SCT capabilities:    (0x3035)   SCT Status supported.
                   SCT Feature Control supported.
                   SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 096 001 051 Past 5444
3 Spin_Up_Time POS--K 182 173 021 - 1900
4 Start_Stop_Count -O--CK 100 100 000 - 35
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 094 092 000 - 4905
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 34
192 Power-Off_Retract_Count -O--CK 200 200 000 - 20
193 Load_Cycle_Count -O--CK 200 200 000 - 179
194 Temperature_Celsius -O---K 099 084 000 - 44
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 182 182 000 - 3009
198 Offline_Uncorrectable ----CK 182 182 000 - 2999
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 001 001 000 - 875695
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb6 GPL,SL VS 1 Device vendor specific log
0xb7 GPL,SL VS 48 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xdf GPL,SL VS 1 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 64757 (device log contains only the most recent 24 errors)
   CR = Command Register
   FEATR = Features Register
   COUNT = Count (was: Sector Count) Register
   LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
   LH = LBA High (was: Cylinder High) Register ] LBA
   LM = LBA Mid (was: Cylinder Low) Register ] Register
   LL = LBA Low (was: Sector Number) Register ]
   DV = Device (was: Device/Head) Register
   DC = Device Control Register
   ER = Error register
   ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 64757 [4] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e 5c f8 40 00 Error: UNC at LBA = 0x000e5cf8 = 941304

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 18 00 e0 00 00 00 0e 5c f8 40 08 04:24:20.395 READ FPDMA QUEUED
60 02 d0 00 d8 00 00 00 0e f9 40 40 08 04:24:20.395 READ FPDMA QUEUED
60 01 20 00 d0 00 00 00 0e ac 58 40 08 04:24:20.395 READ FPDMA QUEUED
60 02 78 00 c8 00 00 00 0e e9 d0 40 08 04:24:20.395 READ FPDMA QUEUED
60 00 28 00 c0 00 00 00 0e f9 18 40 08 04:24:20.395 READ FPDMA QUEUED

Error 64756 [3] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e e9 08 40 00 Error: UNC at LBA = 0x000ee908 = 977160

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 f0 00 40 00 00 00 0e f7 a8 40 08 04:24:14.636 READ FPDMA QUEUED
60 00 80 00 38 00 00 00 0e f8 98 40 08 04:24:14.636 READ FPDMA QUEUED
60 00 28 00 30 00 00 00 0e f9 18 40 08 04:24:14.636 READ FPDMA QUEUED
60 02 78 00 28 00 00 00 0e e9 d0 40 08 04:24:14.636 READ FPDMA QUEUED
60 01 20 00 20 00 00 00 0e ac 58 40 08 04:24:14.636 READ FPDMA QUEUED

Error 64755 [2] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e 5b 70 40 00 Error: UNC at LBA = 0x000e5b70 = 940912

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 02 68 00 98 00 00 00 0e e7 68 40 08 04:24:09.176 READ FPDMA QUEUED
60 00 18 00 90 00 00 00 0e 5c f8 40 08 04:24:09.176 READ FPDMA QUEUED
60 01 a0 00 88 00 00 00 0e 5b 58 40 08 04:24:09.176 READ FPDMA QUEUED
60 02 d0 00 80 00 00 00 0e f9 40 40 08 04:24:09.175 READ FPDMA QUEUED
60 03 08 00 78 00 00 00 0e fc 10 40 08 04:24:09.175 READ FPDMA QUEUED

Error 64754 [1] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e 5a a8 40 00 Error: UNC at LBA = 0x000e5aa8 = 940712

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 48 00 98 00 00 00 0f 02 78 40 08 04:24:05.281 READ FPDMA QUEUED
60 00 f0 00 90 00 00 00 0e f7 a8 40 08 04:24:05.280 READ FPDMA QUEUED
60 00 80 00 88 00 00 00 0e f8 98 40 08 04:24:05.280 READ FPDMA QUEUED
60 00 28 00 80 00 00 00 0e f9 18 40 08 04:24:05.280 READ FPDMA QUEUED
60 04 50 00 78 00 00 00 0f 02 c0 40 08 04:24:05.280 READ FPDMA QUEUED

Error 64753 [0] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e 65 f0 40 00 Error: UNC at LBA = 0x000e65f0 = 943600

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 28 00 18 00 00 00 0e f9 18 40 08 04:24:01.385 READ FPDMA QUEUED
60 00 80 00 10 00 00 00 0e f8 98 40 08 04:24:01.385 READ FPDMA QUEUED
60 00 f0 00 08 00 00 00 0e f7 a8 40 08 04:24:01.385 READ FPDMA QUEUED
60 00 48 00 00 00 00 00 0f 02 78 40 08 04:24:01.385 READ FPDMA QUEUED
60 02 68 00 f8 00 00 00 0e e7 68 40 08 04:24:01.385 READ FPDMA QUEUED

Error 64752 [23] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e 65 60 40 00 Error: UNC at LBA = 0x000e6560 = 943456

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 88 00 a8 00 00 00 0f 01 f0 40 08 04:23:55.728 READ FPDMA QUEUED
60 02 78 00 a0 00 00 00 0e e9 d0 40 08 04:23:55.728 READ FPDMA QUEUED
60 01 20 00 98 00 00 00 0e ac 58 40 08 04:23:55.728 READ FPDMA QUEUED
60 03 08 00 90 00 00 00 0e fc 10 40 08 04:23:55.728 READ FPDMA QUEUED
60 00 b0 00 88 00 00 00 0e 5a a8 40 08 04:23:55.728 READ FPDMA QUEUED

Error 64751 [22] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0e f3 78 40 00 Error: UNC at LBA = 0x000ef378 = 979832

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 50 00 d8 00 00 00 0f 02 c0 40 08 04:23:44.965 READ FPDMA QUEUED
60 00 28 00 c0 00 00 00 0e f9 18 40 08 04:23:44.965 READ FPDMA QUEUED
60 00 80 00 b8 00 00 00 0e f8 98 40 08 04:23:44.965 READ FPDMA QUEUED
60 00 f0 00 b0 00 00 00 0e f7 a8 40 08 04:23:44.965 READ FPDMA QUEUED
60 00 48 00 a8 00 00 00 0f 02 78 40 08 04:23:44.965 READ FPDMA QUEUED

Error 64750 [21] occurred at disk power-on lifetime: 4899 hours (204 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 00 0f 01 80 40 00 Error: UNC at LBA = 0x000f0180 = 983424

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 d0 00 00 00 0f 07 10 40 08 04:23:40.549 READ FPDMA QUEUED
60 00 88 00 c8 00 00 00 0f 01 f0 40 08 04:23:40.549 READ FPDMA QUEUED
60 02 78 00 b8 00 00 00 0e e9 d0 40 08 04:23:40.548 READ FPDMA QUEUED
60 01 20 00 b0 00 00 00 0e ac 58 40 08 04:23:40.548 READ FPDMA QUEUED
60 03 08 00 a8 00 00 00 0e fc 10 40 08 04:23:40.548 READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
Device State: Active (0)
Current Temperature: 44 Celsius
Power Cycle Min/Max Temperature: 44/49 Celsius
Lifetime Min/Max Temperature: 26/59 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (270)

Index Estimated Time Temperature Celsius
271 2021-07-13 01:24 44 *************************
... ..( 85 skipped). .. *************************
357 2021-07-13 02:50 44 *************************
358 2021-07-13 02:51 ? -
359 2021-07-13 02:52 44 *************************
... ..( 7 skipped). .. *************************
367 2021-07-13 03:00 44 *************************
368 2021-07-13 03:01 45 **************************
... ..( 4 skipped). .. **************************
373 2021-07-13 03:06 45 **************************
374 2021-07-13 03:07 46 ***************************
375 2021-07-13 03:08 47 ****************************
... ..( 4 skipped). .. ****************************
380 2021-07-13 03:13 47 ****************************
381 2021-07-13 03:14 48 *****************************
... ..( 3 skipped). .. *****************************
385 2021-07-13 03:18 48 *****************************
386 2021-07-13 03:19 49 ******************************
... ..( 13 skipped). .. ******************************
400 2021-07-13 03:33 49 ******************************
401 2021-07-13 03:34 48 *****************************
... ..( 3 skipped). .. *****************************
405 2021-07-13 03:38 48 *****************************
406 2021-07-13 03:39 49 ******************************
407 2021-07-13 03:40 49 ******************************
408 2021-07-13 03:41 49 ******************************
409 2021-07-13 03:42 48 *****************************
... ..( 8 skipped). .. *****************************
418 2021-07-13 03:51 48 *****************************
419 2021-07-13 03:52 47 ****************************
... ..( 5 skipped). .. ****************************
425 2021-07-13 03:58 47 ****************************
426 2021-07-13 03:59 48 *****************************
... ..( 8 skipped). .. *****************************
435 2021-07-13 04:08 48 *****************************
436 2021-07-13 04:09 47 ****************************
... ..( 12 skipped). .. ****************************
449 2021-07-13 04:22 47 ****************************
450 2021-07-13 04:23 46 ***************************
... ..( 9 skipped). .. ***************************
460 2021-07-13 04:33 46 ***************************
461 2021-07-13 04:34 45 **************************
462 2021-07-13 04:35 46 ***************************
463 2021-07-13 04:36 45 **************************
... ..( 35 skipped). .. **************************
21 2021-07-13 05:12 45 **************************
22 2021-07-13 05:13 44 *************************
... ..(247 skipped). .. *************************
270 2021-07-13 09:21 44 *************************

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 39599 Vendor specific

Spoiler

SMART Report

I'm creating an image using ddrescue to save any data I can off of Disk 2 right now and it seems to be going well. I've got about 208GB out of 850GB so far.

Edited July 13, 2021 by QPlus7

JorgeB · July 13, 2021

4 minutes ago, QPlus7 said:

I'm creating an image using ddrescue to save any data I can off of Disk 2 right now and it seems to be going well.

Yep, that's what I was going to suggest, disk appears to be terminal.

QPlus7 · July 13, 2021

1 minute ago, JorgeB said:

Yep, that's what I was going to suggest, disk appears to be terminal.

Thanks for the confirmation.

Once I replace the drive, what should my plan of action be to restore the .img file since my parity drive is out of sync and I can't rebuild the array?

QPlus7 · July 13, 2021

By the way, this is the command I'm running: ddrescue -d -f -r3 /dev/sdb /mnt/disks/easystore/Disk2.img &

...hopefully that does the trick in creating a restorable image.

JorgeB · July 13, 2021

I would run ddrescue wit another disk (of the same size) as destination, then after it's done use that disk and re-sync parity.

QPlus7 · July 13, 2021

2 minutes ago, JorgeB said:

I would run ddrescue wit another disk (of the same size) as destination, then after it's done use that disk and re-sync parity.

Would I add it to the array before or after running ddrescue on the new disk?

JorgeB · July 13, 2021

After, you need to do a new config, keep all the other disk assignments and assign the cloned disk in place of old disk1.

QPlus7 · July 15, 2021

On 7/13/2021 at 10:10 AM, JorgeB said:

After, you need to do a new config, keep all the other disk assignments and assign the cloned disk in place of old disk1.

I'm not sure I understand this exactly, but I will follow-up on that later.

Currently ddrescue is still running. I have a 1TB ISO file (my drive is 1TB) but ddrescue is still going.

This is what I am currently seeing on the console:

root@VIA-NAS01:~# ddrescue -d -f -r3 /dev/sdf /mnt/disks/easystore/Disk2.img rescue.log
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:    3721 MB, non-trimmed:  938803 kB,  current rate:       0 B/s
     opos:    3721 MB, non-scraped:        0 B,  average rate:  11015 kB/s
non-tried:   12042 MB,  bad-sector:        0 B,    error rate:   16384 B/s
  rescued:  987223 MB,   bad areas:        0,        run time:  1d 53m 43s
pct rescued:   98.70%, read errors:    14325,  remaining time:    412d 18h
                              time since last successful read:         33s
Copying non-tried blocks... Pass 5 (forwards)

The remaining time is fluctuating, but nothing manageable. It just went down to 90 days and then back up to 275 days while I was typing this out.

Since this is Pass 5, is it safe to cancel the ddrescue operation and see what I can do with my ISO file?

Thanks in advance!

JorgeB · July 15, 2021

You can cancel but it's still not finished, pass 5 is not repeating a previous one, is trying untried blocks, it might recover a little more data if you let it finish, it's in this part:

17 minutes ago, QPlus7 said:

non-trimmed: 938803 kB,

then it still goes through the "non-scrapped" phase, these can take days for a badly damaged disk, and you're used -r3, this means each error is retried 3 times, I usually don't use that, if first read doesn't succeed unlikely the other ones will.

QPlus7 · July 18, 2021

Well, I now have an .img file that I'm trying to mount but am getting the following error:

root@VIA-NAS01:/mnt/disks/easystore# mount -t xfs -o loop SDEi.img /mnt/baddisk/
mount: /mnt/baddisk: wrong fs type, bad option, bad superblock on /dev/loop4, missing codepage or helper program, or other error.

How can I mount the .img file ddrescue created?

Thanks in advance!

JorgeB · July 18, 2021

Didn't notice before since I usually use a disk as destination, not an image, for an image you need to clone only the partition, as Linux will fail to read the partition table from a regular file, there might be a way of doing it by passing the partition offsets in the mount command, but can't help with that, alternatively clone to a disk or if cloning to an image again use /dev/sdX1 as source.

QPlus7 · July 20, 2021

First of all, thank you so much for all of your help up onto this point!

I now have a new 1 TB drive (sdb) in the computer that is not a part of the array. The array is currently stopped.

I am currently pre-clearing the new drive with hopes of mounting it after the pre-clear has completed. Once mounted, I think my next step is to run:

ddrescue -d -f /dev/sdf /mnt/disks/

Am I right up to this point?

If so, once ddrescue completes, what should by next step? Keep in mind my parity is out-of-sync.

Thanks again!

Edited July 20, 2021 by QPlus7

JorgeB · July 20, 2021

3 minutes ago, QPlus7 said:

ddrescue -d -f /dev/sdb /mnt/disks/1TBNEW

You can't use a mount point as dest, I use this:

ddrescue -d /dev/sdX /dev/sdY /boot/ddrescue.log

Replace X with source, Y with dest., log can be useful if you need to interrupt the copy, or if it gets interrupted, if you type the command with the same log it will resume from where it was.

5 minutes ago, QPlus7 said:

If so, once ddrescue completes, what should by next step?

See if the cloned disk mounts with UD, if yes you can do a new config with it and re-sync parity.

QPlus7 · July 20, 2021

2 minutes ago, JorgeB said:
You can't use a mount point as dest, I use this:
ddrescue -d /dev/sdX /dev/sdY /boot/ddrescue.log
Replace X with source, Y with dest., log can be useful if you need to interrupt the copy, or if it gets interrupted, if you type the command with the same log it will resume from where it was.

See if the cloned disk mounts with UD, if yes you can do a new config with it and re-sync parity.

Got it - thanks!

Since I won't be able to mount the drive, should I stop the pre-clear?

QPlus7 · July 20, 2021

One more thing, does the drive need to be formatted before I start ddrescue?

Edit: quick Google search indicated that it doesn't need to be formattted as the process will overwrite whatever is on the target drive anyway.

Edited July 20, 2021 by QPlus7

QPlus7 · July 26, 2021

Ok, ddrescue finally recovered the data to a new drive. I stopped the array and replaced the failed drive in my array. I then went to do a new config but didn't click preserve assignments.

Did I just lose my data? Some of my drives don't appear to mount so I can't check using Unassigned Devices right now. I've just restarted the server to see if they mount automatically.

QPlus7 · July 26, 2021

My drives didn't automatically mount after the restart. Attached is what I see now.

Any help is greatly appreciated!

JorgeB · July 27, 2021

These were the assignments form the previous diags:

Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk1: (sdh) ST1000DM010-2EP102_W9A5XR84 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk2: (sdf) WDC_WD10EZEX-08WN4A0_WD-WCC6Y0AER31Z size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk3: (sde) WDC_WD10EZEX-08WN4A0_WD-WCC6Y7DEJER5 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk4: (sdg) WDC_WD1003FZEX-00MK2A0_WD-WCC3F5KH9HU3 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk5: (sdi) WDC_WD10EFRX-68JCSN0_WD-WCC1U3983233 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk6: (sdd) WDC_WD80EFAX-68LHPN0_7SGH6MTC size: 7814026532

Re-assign them all except with new disk1, don't assign parity, start array and post new diags.

QPlus7 · July 27, 2021

12 hours ago, JorgeB said:

These were the assignments form the previous diags:


Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk1: (sdh) ST1000DM010-2EP102_W9A5XR84 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk2: (sdf) WDC_WD10EZEX-08WN4A0_WD-WCC6Y0AER31Z size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk3: (sde) WDC_WD10EZEX-08WN4A0_WD-WCC6Y7DEJER5 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk4: (sdg) WDC_WD1003FZEX-00MK2A0_WD-WCC3F5KH9HU3 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk5: (sdi) WDC_WD10EFRX-68JCSN0_WD-WCC1U3983233 size: 976762552
Jul 12 20:23:10 VIA-NAS01 kernel: md: import disk6: (sdd) WDC_WD80EFAX-68LHPN0_7SGH6MTC size: 7814026532

Re-assign them all except with new disk1, don't assign parity, start array and post new diags.

Thanks for posting that - that is a huge help!

I assume you mean Disk 2, though? The ddrescue command I ran was against /dev/sdf which is Disk 2.

JorgeB · July 28, 2021

11 hours ago, QPlus7 said:

I assume you mean Disk 2, though?

Yes, the disk that was cloned.

QPlus7 · July 29, 2021

Ok, so I got all the disks (except the bad Disk 2) installed and assigned to a new array. However, the sdX assignments are different but I mapped everything out as it was in relation to Disk 1, Disk 2 (new), Disk 3, Disk 4, Disk 5 and Disk 6.

I'm getting an error that says Disk 2, 4, and 5 are unmountable with no file system. Not sure if this is accurate or not but I am going to pull the drives and put them in a Windows machine where I am more comfortable and see if I can read the data (I have LinuxFS installed) there.

While I figure that out, attached are the diags as requested, and a screenshot of my array configuration.

via-nas01-diagnostics-20210728-2242.zip

QPlus7 · July 29, 2021

2 minutes ago, QPlus7 said:

Ok, so I got all the disks (except the bad Disk 2) installed and assigned to a new array. However, the sdX assignments are different but I mapped everything out as it was in relation to Disk 1, Disk 2 (new), Disk 3, Disk 4, Disk 5 and Disk 6.

I'm getting an error that says Disk 2, 4, and 5 are unmountable with no file system. Not sure if this is accurate or not but I am going to pull the drives and put them in a Windows machine where I am more comfortable and see if I can read the data (I have LinuxFS installed) there.

While I figure that out, attached are the diags as requested, and a screenshot of my array configuration.

via-nas01-diagnostics-20210728-2242.zip 94.1 kB · 0 downloads

There is definitely data on Disk 2, Disk 4, and Disk 5 as I am able to view everything on the Windows machine.

I have a 14TB drive where I could copy the contents of all drives to and then just build a brand new array and then transfer from back from 14TB drive.

Is there hope or is that my best strategy? I know it will take time, but I'm fine with that if that is what it takes to get my data consolidated again.

Thanks again for all the help!

JorgeB · July 29, 2021

3 hours ago, QPlus7 said:

However, the sdX assignments are different

That doesn't matter.

3 hours ago, QPlus7 said:

I'm getting an error that says Disk 2, 4, and 5 are unmountable with no file system

An it's not just filesystem corruption, no filesystem is being detected on those disks, which is very strange.

2 hours ago, QPlus7 said:

There is definitely data on Disk 2, Disk 4, and Disk 5 as I am able to view everything on the Windows machine.

Don't know what you mean, you are seeing the data from the disks that mount only, if that's everything you should have good.

Parity Sync will be completed in... 1,428 days (and counting)!

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation