Next Steps Regarding Drive Issues

thefly · March 19, 2018

After suffering numerous disk issues I have completed an uncorrected parity-check and I am left with (what I believe are) numerous disk irregularities. Can someone provide a next step(s) to deal with them. Attached is a screenshot of main. Thanks.

BradJ · March 19, 2018

You should always upload your log files so people can help you troubleshoot.

JorgeB · March 19, 2018

Looks like multiple disks dropped offline, likely a controller issue, grab current diagnostics, reboot, grab new diagnostics and post both.

bonienl · March 19, 2018

Once your array is back in good shape, you should consider converting your disks from reiserFS to XFS. RFS is no longer developed and in your situation with near full disks will perform very poor.

thefly · March 20, 2018

Restarted. See pre and post diagnostics. Disk 5, Disk 10 now gone

tower-diagnostics-current.zip

tower-diagnostics-restart.zip

JorgeB · March 20, 2018

Disk 5, Disk 10 now gone

You didn't have a disk5.

Disk10 completely dropped offline:

Mar 16 15:59:59 Tower kernel: ata12: hard resetting link
Mar 16 15:59:59 Tower kernel: ata12.00: failed to read native max address (err_mask=0x1)
Mar 16 15:59:59 Tower kernel: ata12.00: HPA support seems broken, skipping HPA handling
Mar 16 15:59:59 Tower kernel: ata12.00: both IDENTIFYs aborted, assuming NODEV
Mar 16 15:59:59 Tower kernel: ata12.00: revalidation failed (errno=-2)
Mar 16 16:00:04 Tower kernel: ata12: hard resetting link
Mar 16 16:00:04 Tower kernel: ata12.00: both IDENTIFYs aborted, assuming NODEV
Mar 16 16:00:04 Tower kernel: ata12.00: revalidation failed (errno=-2)
Mar 16 16:00:04 Tower kernel: ata12.00: disabled

Check connections or try it in a different PC, if still not detected it's likely dead.

JorgeB · March 20, 2018

Also, these disks are likely failing, do you have notifications enable?

Run an extended SMART test to confirm:

Device Model:     ST2000DM001-9YN164
Serial Number:    S1E0562C
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       8

Device Model:     ST2000DM001-9YN164
Serial Number:    S240BXD9
187 Reported_Uncorrect      0x0032   066   066   000    Old_age   Always       -       34
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       40
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       40

Device Model:     ST3000DM001-9YN166
Serial Number:    Z1F0VLHS
  5 Reallocated_Sector_Ct   0x0033   067   052   036    Pre-fail  Always       -       44240
183 Runtime_Bad_Block       0x0032   097   097   000    Old_age   Always       -       3
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       3117
197 Current_Pending_Sector  0x0012   001   001   000    Old_age   Always       -       20672
198 Offline_Uncorrectable   0x0010   001   001   000    Old_age   Offline      -       20672

Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WMAZ20246473
197 Current_Pending_Sector  0x0032   196   196   000    Old_age   Always       -       1462

Device Model:     WDC WD20EARS-00MVWB0
Serial Number:    WD-WMAZ20266575
  5 Reallocated_Sector_Ct   0x0033   041   041   140    Pre-fail  Always   FAILING_NOW 1265
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       1265
197 Current_Pending_Sector  0x0032   196   196   000    Old_age   Always       -       1394

JorgeB · March 20, 2018

These disks looked familiar, and I see why, you could have mentioned your previous thread, saving me the time of going through all SMART reports again, also I see you didn't follow my previous advice of running extended SMART tests, but the errors on your first screenshot line up with the suspect disks, so that's confirmation enough they are failing.

Still not sure on what you're trying to accomplish, unRAID can't keep working with more bad disks than parity disks, one in your case, so you basically have 2 options:

1) do a new config with all the good disks plus new disks to replace the failing ones, re-sync parity and then copy everything you can from the failing disks to the new ones, e.g., by mounting one at a time with the UD plugin.

2) clone all failing disks to new ones using ddrescue and then do a new config and re-sync parity.

trurl · March 20, 2018

4 hours ago, johnnie.black said:

you could have mentioned your previous thread,

In fact, you should have just used your previous thread since all this is just a continuation of that dire situation you have allowed to happen.

4 hours ago, johnnie.black said:

Still not sure on what you're trying to accomplish, unRAID can't keep working with more bad disks than parity disks, one in your case

I don't recall, and I'm not going to go back and read it all again. Do you have backups? Maybe before doing anything else you should copy whatever irreplaceable and important files you may be able to access to your PC. You are in serious danger of losing a lot of data. A single parity disk cannot help with multiple drive failures.

thefly · March 21, 2018

I'm stuck with Disk 2 showing missing, contents emulated and Disk 10 disabled, contents emulated. Obviously I can not start the array. I am resigned to losing these disks but would like some direction on the order of new drive replacements. Is there any way to get a directory of what was on the drives? Do I replace Disk 2 or 10 first? Depressing...

trurl · March 21, 2018

6 minutes ago, thefly said:

I'm stuck with Disk 2 showing missing, contents emulated and Disk 10 disabled, contents emulated.

No disks are emulated, since you have single parity but 2 missing or disabled disks.

The course of action suggested by johnnie.black seems like the best idea:

17 hours ago, johnnie.black said:

1) do a new config with all the good disks plus new disks to replace the failing ones, re-sync parity and then copy everything you can from the failing disks to the new ones, e.g., by mounting one at a time with the UD plugin.

2) clone all failing disks to new ones using ddrescue and then do a new config and re-sync parity.

The 1st option is probably going to be the simplest for you. The 2nd option would actually have fewer steps but requires working carefully at the command line.

JorgeB · March 21, 2018

7 hours ago, trurl said:

The 1st option is probably going to be the simplest for you. The 2nd option would actually have fewer steps but requires working carefully at the command line.

The 2nd option could be useful for example if the data is mostly videos and you don't have backups/source files, as in the 1st option any file with a single read error won't be copied, the 2nd option should copy most files skipping any errors, obviously the file will be corrupt but for videos files, and if it's just a few errors, it should still be playable with or without some glitches, still better than nothing when there are no backups.

thefly · March 21, 2018

Both Disk 2 and 10 are mostly videos. I have no source back-up files. I have chosen ddrescue and have installed the plug in. As I understand it, if I place a new drive in free slot 19 and wish to clone disable device Disk 10, I would enter the following terminal command to attempt a clone??:

ddrescue -f /dev/sd19 /dev/sd10 /boot/ddrescue.log

Edited March 21, 2018 by thefly

trurl · March 21, 2018

1 minute ago, thefly said:
Both Disk 2 and 10 are mostly videos. I have no source back-up files. I have chosen ddrescue and have installed the plug in. As I understand it, if I place a new drive in free slot 19 and wish to clone disable device Disk 10, I would enter the following terminal command to attempt a clone??:
ddrescue -f /dev/sd19 /dev/sd10 /boot/ddrescue.log

Those devices don't exist. You need to use the drive letters corresponding to those disk numbers. Be very very careful here, since the drive letters assigned to a particular disk can change on each boot.

JorgeB · March 21, 2018

2 minutes ago, thefly said:

As I understand it, if I place a new drive in free slot 19 and wish to clone disable device Disk 10, I would enter the following terminal command to attempt a clone??:

That's not correct, and since you need to do a new config anyway I would recommend cloning to a disk outside the array since it's faster.

trurl · March 21, 2018

Also, I have never used this but it looks like you have the source and destination backwards. BE CAREFUL!

JorgeB · March 21, 2018

11 minutes ago, trurl said:

BE CAREFUL!

Yes, from the link:

ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log

Quote

Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.

X and Y are the disks identifiers, like sdb, sdc, etc.

thefly · March 21, 2018

Got it. So don't I add a disk using UD plug in and just clone to it?

JorgeB · March 21, 2018

Yes, both disks just need to be connected to server.

Next Steps Regarding Drive Issues

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation