huladaddy Posted October 4, 2019 Share Posted October 4, 2019 A couple of days ago I had a 2 TB drive fail. I just received a new 8 TB drive today, and installed it. The array was in the process of rebuilding when it all of a sudden was cancelled. A different drive (5 TB) is now reportedly in error state. As soon as I saw this, I stopped the array. Is there a way to recover from this situation? What should I do next? Quote Link to comment
huladaddy Posted October 5, 2019 Author Share Posted October 5, 2019 3 hours ago, jpowell8672 said: Thanks for the quick response. I had a read through those threads, and their situation is a little different. I also have some new information. The two drives that failed for me are Disk 8 and Disk 10. As I mentioned above, after I replaced Disk 8 and while it was rebuilding, Disk 10 failed. Currently, this is what Unraid is reporting: Disk 8: Orange Triangle "Device contents emulated" "Unmountable: No file system" Disk 10: Red X "Device is disabled, contents disabled" "Unmountable: No file system" I could really use some help... Quote Link to comment
jpowell8672 Posted October 5, 2019 Share Posted October 5, 2019 Tools>Diagnostics. Attach zip. Quote Link to comment
huladaddy Posted October 5, 2019 Author Share Posted October 5, 2019 Here you go. Thanks for helping! undrobo-diagnostics-20191004-0434.zip Quote Link to comment
JorgeB Posted October 5, 2019 Share Posted October 5, 2019 You have two invalid disks with single parity, Unraid can't rebuild a disk like that, SMART for both disks looks fine, do you have the diags from when disk10 failed? Quote Link to comment
huladaddy Posted October 5, 2019 Author Share Posted October 5, 2019 8 hours ago, johnnie.black said: You have two invalid disks with single parity, Unraid can't rebuild a disk like that, SMART for both disks looks fine, do you have the diags from when disk10 failed? Here's the SMART report: undrobo-smart-20191004-1847.zip Quote Link to comment
huladaddy Posted October 5, 2019 Author Share Posted October 5, 2019 At this point what are my options? I assume worse case scenario is that I will have to build the array from scratch. Will I be able to start with a very small array (parity and one data drive), mount a drive from the old array, and copy files from the mounted old drive to the new array? Then once copied, add the old drive to the array and repeat with the next old drive? Am I correct? Is there a better option? Is there any way to recover files from old Disk 8 and 10? It seems that there is no longer a file system on them. They were XFS. Can I perform some sort of recovery or undelete? Is there a way to recover the old directory structure and file lists? Quote Link to comment
itimpi Posted October 5, 2019 Share Posted October 5, 2019 The data on all the ‘good’ disks will be safe and will not need any sort of recovery. At the very least you will be able to create an array of those disks and keep their data intact. The question is what to do about the disks which are in a suspect state to try and recover all their data. There is a good chance the data on the other disks that ‘failed’ may also be recoverable so keep those disks intact. Do you still have the 2TB disk that you replaced with another disk? If so it’s data is probably intact if it has not physically failed and thus recoverable. Have you taken any other action that needs to be taken into account? Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 Your assumptions are spot on. When I replaced Disk 8 (2 TB) with the new drive (8 TB), I set it aside, so like you, I assume the data on that old 2 TB drive is still intact. When it was part way through rebuilding Disk 8, it reported Disk 10 had failed, and it cancelled the rebuild. I let it sit for a few minutes while I scratched my head. After a while, a bunch of other disks were reporting failures, so I shutdown. Assuming it was a cabling or power supply issue, I cracked open the case to make sure all cables were seated properly. Then I started up the machine, and noticed that now Disk 8 seemed to be mounted (even though the rebuild only got to around 3%) and it now said it was rebuilding Disk 10. WHAT! I cancelled that rebuild (it only got to around 1-2%). That is the state I am now in. Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 (edited) 15 hours ago, huladaddy said: Here's the SMART report: I meant the diags from when it failed, so we could see the syslog. Assuming disk10 is OK you can re-enable it with the invalid slot command to rebuild disk8. Edited October 6, 2019 by johnnie.black Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 To to that follow the instructions below carefully: -Tools -> New Config -> Retain current configuration: All -> Apply -Assign any missing disk(s) -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 8 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 1 hour ago, johnnie.black said: I meant the diags from when it failed, so we could see the syslog. Assuming disk10 is OK you can re-enable it with the invalid slot command to rebuild disk8, disk8 can be rebuilt on top of the old one or to be safer using a spare. I don't have the diags from when it failed. I don't think I can assume Disk 10 is OK. I reports as having no filesystem. It seems that a rebuild of that drive started as I mentioned above. Is there a way to see if the drive is OK so that I can re-enable as you mention? Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Just now, huladaddy said: I reports as having no filesystem. That's expected since single parity can't emulate two disks, see instructions above for rebuilding disk8, disk10 should mount imediatly also when doing that after array start. Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Oh, it's also a good idea to replace/swap cables on disk10 before doing it, just to rule them out. Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 (edited) So, "mdcmd set invalidslot 8 29" enables Disk 10? Just want to verify this, since I see "8" and not "10" in the command. Edited October 6, 2019 by huladaddy Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Yes, enables all disks except disk8, which will start rebuilding at array start, and disk29 (parity2 since you don't have one) Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 And to confirm, Parity 1 will not get overwritten, and will be used to rebuild Disk 8? Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Correct, just make sure to follow instructions carefully. Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 OK. So far two notifications: - Disk 8, drive not ready, content being reconstructed - Parity sync / Data rebuild started Does that look good? Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Yes, are all disks mounted? Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 (edited) Yes. So I'm assuming you were able to determine that Disk10 didn't actually get overwritten at all, even though it was reported that it was rebuilding Disk10? Edited October 6, 2019 by huladaddy Quote Link to comment
JorgeB Posted October 6, 2019 Share Posted October 6, 2019 Disk10 got disabled, it wasn't rebuilding, so original data is all there. Quote Link to comment
huladaddy Posted October 6, 2019 Author Share Posted October 6, 2019 OK. Wow. So it looks like I might be OK! I can't believe it. You have been so unbelievably helpful, I can't thank you enough. I will update you when there is something else to report. The data rebuild is at 0.4%. I will let it run overnight and check in on it in the morning. Again, thank you very much. Quote Link to comment
huladaddy Posted October 8, 2019 Author Share Posted October 8, 2019 Unraid Parity sync / Data rebuild: 2019-10-07 09:33 AM Notice [UNDROBO] - Parity sync / Data rebuild finished (0 errors) Duration: 1 day, 23 hours, 20 seconds. Average speed: 47.3 MB/s Unraid Disk 8 message: 2019-10-07 09:33 AM Notice [UNDROBO] - Disk 8 returned to normal operation ST8000DM004-2CX188_ZCT0AQQK (sdc) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.