Millions of read errors during a disk rebuild

January 2, 201610 yr

Hey all,

So during my monthly parity check, one of my disks (disk14) started having read errors. Having an extra precleared drive ready to go (old:2TB, new:4TB), I stopped the parity check, pulled the drive, replaced it and started to rebuild the disk. During this rebuild, however, another one of my drives (disk3) started experiencing millions of read errors. The rebuild completed and I'm pretty sure the data is bad (directory names don't match up with their actual contents). Files can't be accessed, etc.

So here's the deal. I'm fairly certain that disk3 is fully dead. When its in a bay, it just sounds like its repeatedly trying to spin up and then I get a "click." The original disk14 may not be a total loss, however. But I don't think there's much I can do to salvage disk3 now, since parity is probably bonked as well.

Does anyone have any recommendations of how I should proceed? Is it pretty safe to say that disk3's data is a total loss? I've never run into anything like this before in the years I've ran unRAID, so I guess I was due. Any and all suggestions are much appreciated. Thanks!

Edit: Well I put that 2TB back in and ran a SMART test. It seems to be fine? It passed. Attaching it here. So at least I should have disk14's data intact, but that doesn't really help me with disk3, right? And I don't even know what I could do while trying to keep whatever parity I have left, if any. Am I even making sense?

titan-smart-20160102-1134.zip

Quote

January 2, 201610 yr

Community Expert

What version of unRAID are you running exactly? It might be possible to New Config without rebuilding parity, then use whatever parity you have to try to get something of disk3 rebuilt well enough to try some file recovery methods.

Quote

January 2, 201610 yr

Author

6.1.6.

How would that process go? Pull the rebuilt disk14, put back in the old disk14, replace missing disk3, and do a new config?

Thanks!

Quote

January 2, 201610 yr

Community Expert

That good SMART was for original disk14?

Haven't really tried this, but according to 6.1.5 release notes, New Config with Trust Parity works as expected now. So, put original disk14 in since it is OK and should have good data, put new disk3 in, then New Config with Trust Parity. Start then Stop array. Unassign disk3. Start array so it sees disk3 unassigned. Stop array and reassign disk3. Start array and it should rebuild disk3. After done rebuilding it may be there are still problems but maybe something can be recovered. What filesystem was on it?

Quote

January 2, 201610 yr

Author

ReiserFS. I haven't done a migration to xfs, except for when I add new disks. Yes, the smart report was from the old disk14. Not sure why I got read errors during parity check, but I should have run a smart report before deciding to pull it. Jumped the gun there.

Ok, so let me repeat this back to you just to ensure I'm fully understanding this procedure.

- Pull newly rebuilt disk14 (4TB), replace with original disk14 (2TB)

- Put in a blank precleared drive for disk3

- Keep the array stopped

- Run "new config" with the trust parity option

- Pretty sure I'd have to reassign all the drives though with new config, right?

- Start array, stop array

- Unassign disk3

- Start array, stop array

- Reassign disk3 and let it rebuild

Essentially this is tricking unRAID into thinking disk3 initially has data on it when it in fact doesn't, keeping parity, and thus allowing that drive to hopefully be rebuilt successfully?

Thanks again!

Edit: And the fact that the old and new disk14 are different sizes won't cause any issues with the parity? I have no clue how drive sizes and parity work as it relates to the rebuilding of data.

Quote

January 2, 201610 yr

I had a similar issue, it was my power supply. I would do a power supply isolation test if i were you.

Quote

January 3, 201610 yr

Community Expert

ReiserFS. I haven't done a migration to xfs, except for when I add new disks. Yes, the smart report was from the old disk14. Not sure why I got read errors during parity check, but I should have run a smart report before deciding to pull it. Jumped the gun there.

Ok, so let me repeat this back to you just to ensure I'm fully understanding this procedure.

- Pull newly rebuilt disk14 (4TB), replace with original disk14 (2TB)

We want to do this because we don't think disk14 rebuild is OK.

- Put in a blank precleared drive for disk3

- Keep the array stopped

- Run "new config" with the trust parity option

- Pretty sure I'd have to reassign all the drives though with new config, right?

- Start array, stop array

- Unassign disk3

- Start array, stop array

- Reassign disk3 and let it rebuild

Essentially this is tricking unRAID into thinking disk3 initially has data on it when it in fact doesn't, keeping parity, and thus allowing that drive to hopefully be rebuilt successfully?

Right

Thanks again!

Edit: And the fact that the old and new disk14 are different sizes won't cause any issues with the parity? I have no clue how drive sizes and parity work as it relates to the rebuilding of data.

The disk14 rebuild would only write to disk14 and not parity so parity should not have been affected.

Maybe it will all work, and maybe some additional file recovery will have to be attempted after the rebuild but seems like this is all we have to work with if the original disk3 is dead.

Quote

January 5, 201610 yr

Author

Hi trurl (or anyone else who'd be so kind to chime in),

So finally got the replacement drive precleared, and I figured before I swap back in an old drive that did give me read errors at one point (though the smart report looks fine) and do a newconfig, why not just try and see what happens if I rebuild the failed disk without swapping back in the old disk14. So I rebuilt drive3, and surprisingly enough, everything appears to be working fine--I believe all of my files on that drive are still intact (unRAID rocks!).

The weird thing is, though, I don't see any errors on the webgui, but my email notification looked like this:

Event: unRAID Data rebuild:
Subject: Notice [TITAN] - Data rebuild: finished (411641108 errors)
Description: Duration: unavailable (no parity-check entries logged)
Importance: warning

Could those errors be a carryover from when I was having read errors that for some reason never got cleared out, or is this something I should be concerned about?

Thanks for all of your help!

Quote

Millions of read errors during a disk rebuild

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)