January 2, 201610 yr Hey all, So during my monthly parity check, one of my disks (disk14) started having read errors. Having an extra precleared drive ready to go (old:2TB, new:4TB), I stopped the parity check, pulled the drive, replaced it and started to rebuild the disk. During this rebuild, however, another one of my drives (disk3) started experiencing millions of read errors. The rebuild completed and I'm pretty sure the data is bad (directory names don't match up with their actual contents). Files can't be accessed, etc. So here's the deal. I'm fairly certain that disk3 is fully dead. When its in a bay, it just sounds like its repeatedly trying to spin up and then I get a "click." The original disk14 may not be a total loss, however. But I don't think there's much I can do to salvage disk3 now, since parity is probably bonked as well. Does anyone have any recommendations of how I should proceed? Is it pretty safe to say that disk3's data is a total loss? I've never run into anything like this before in the years I've ran unRAID, so I guess I was due. Any and all suggestions are much appreciated. Thanks! Edit: Well I put that 2TB back in and ran a SMART test. It seems to be fine? It passed. Attaching it here. So at least I should have disk14's data intact, but that doesn't really help me with disk3, right? And I don't even know what I could do while trying to keep whatever parity I have left, if any. Am I even making sense? titan-smart-20160102-1134.zip
January 2, 201610 yr Community Expert What version of unRAID are you running exactly? It might be possible to New Config without rebuilding parity, then use whatever parity you have to try to get something of disk3 rebuilt well enough to try some file recovery methods.
January 2, 201610 yr Author 6.1.6. How would that process go? Pull the rebuilt disk14, put back in the old disk14, replace missing disk3, and do a new config? Thanks!
January 2, 201610 yr Community Expert That good SMART was for original disk14? Haven't really tried this, but according to 6.1.5 release notes, New Config with Trust Parity works as expected now. So, put original disk14 in since it is OK and should have good data, put new disk3 in, then New Config with Trust Parity. Start then Stop array. Unassign disk3. Start array so it sees disk3 unassigned. Stop array and reassign disk3. Start array and it should rebuild disk3. After done rebuilding it may be there are still problems but maybe something can be recovered. What filesystem was on it?
January 2, 201610 yr Author ReiserFS. I haven't done a migration to xfs, except for when I add new disks. Yes, the smart report was from the old disk14. Not sure why I got read errors during parity check, but I should have run a smart report before deciding to pull it. Jumped the gun there. Ok, so let me repeat this back to you just to ensure I'm fully understanding this procedure. - Pull newly rebuilt disk14 (4TB), replace with original disk14 (2TB) - Put in a blank precleared drive for disk3 - Keep the array stopped - Run "new config" with the trust parity option - Pretty sure I'd have to reassign all the drives though with new config, right? - Start array, stop array - Unassign disk3 - Start array, stop array - Reassign disk3 and let it rebuild Essentially this is tricking unRAID into thinking disk3 initially has data on it when it in fact doesn't, keeping parity, and thus allowing that drive to hopefully be rebuilt successfully? Thanks again! Edit: And the fact that the old and new disk14 are different sizes won't cause any issues with the parity? I have no clue how drive sizes and parity work as it relates to the rebuilding of data.
January 2, 201610 yr I had a similar issue, it was my power supply. I would do a power supply isolation test if i were you.
January 3, 201610 yr Community Expert ReiserFS. I haven't done a migration to xfs, except for when I add new disks. Yes, the smart report was from the old disk14. Not sure why I got read errors during parity check, but I should have run a smart report before deciding to pull it. Jumped the gun there. Ok, so let me repeat this back to you just to ensure I'm fully understanding this procedure. - Pull newly rebuilt disk14 (4TB), replace with original disk14 (2TB) We want to do this because we don't think disk14 rebuild is OK. - Put in a blank precleared drive for disk3 - Keep the array stopped - Run "new config" with the trust parity option - Pretty sure I'd have to reassign all the drives though with new config, right? - Start array, stop array - Unassign disk3 - Start array, stop array - Reassign disk3 and let it rebuild Essentially this is tricking unRAID into thinking disk3 initially has data on it when it in fact doesn't, keeping parity, and thus allowing that drive to hopefully be rebuilt successfully? Right Thanks again! Edit: And the fact that the old and new disk14 are different sizes won't cause any issues with the parity? I have no clue how drive sizes and parity work as it relates to the rebuilding of data. The disk14 rebuild would only write to disk14 and not parity so parity should not have been affected. Maybe it will all work, and maybe some additional file recovery will have to be attempted after the rebuild but seems like this is all we have to work with if the original disk3 is dead.
January 5, 201610 yr Author Hi trurl (or anyone else who'd be so kind to chime in), So finally got the replacement drive precleared, and I figured before I swap back in an old drive that did give me read errors at one point (though the smart report looks fine) and do a newconfig, why not just try and see what happens if I rebuild the failed disk without swapping back in the old disk14. So I rebuilt drive3, and surprisingly enough, everything appears to be working fine--I believe all of my files on that drive are still intact (unRAID rocks!). The weird thing is, though, I don't see any errors on the webgui, but my email notification looked like this: Event: unRAID Data rebuild: Subject: Notice [TITAN] - Data rebuild: finished (411641108 errors) Description: Duration: unavailable (no parity-check entries logged) Importance: warning Could those errors be a carryover from when I was having read errors that for some reason never got cleared out, or is this something I should be concerned about? Thanks for all of your help!
Archived
This topic is now archived and is closed to further replies.