bjreich Posted September 26, 2018 Share Posted September 26, 2018 Hi everyone, I've found myself in a potential data loss situation and I'd appreciate some expert help before I do anything that might make it worse. I had a drive (Disk 6) go missing on me after moving my unraid box due to a cable that came off. Regrettably I didn't immediately deal with the situation because then a real disk fault occurred on Disk 2 (unrecoverable write followed by unmountable filesystem declaration by unraid). The moment that happened Disk6 went from missing to unassigned (but parity hasn't been re-checked since that happened so it is still calculated with Disk 6 in the array). My plan was to use the "Trust my array" procedure to re-add Disk 6 since I know the parity is in-line with it being there and being just a cabling issue that occurred while the system was turned off, I know the data hasn't changed and that the parity is still calculated with it being part of the array. Then I was going to replace Disk 2 and let the parity rebuild it. But if I try to remove the parity, I can't then start my array to make it forget the parity (too many missing disks) so I can re-add with "My parity is valid" ticked. If I just try to re-add Disk 6, it's detected as a new disk even though I'm fairly certain its fine. I didn't start my array with any of these experimented changes and put everything back the way it was - so the parity drive is still assigned and disk 6 still unassigned, the array is stopped (I don't let it auto-start so it always starts stopped). I've attached my diagnostics, I'm hoping someone is able to guide me down the safest path- if there is one. TLDR Disk 2 is faulty, Disk 6 is unassigned, Parity should still be valid except for the failed write to Disk 2 that started this scenerio unfolding, Unraid reports too many missing disks to start the array without the parity drive. Thanks in advance for any help anyone can provide. storage-diagnostics-20180926-1800.zip Link to comment
JorgeB Posted September 26, 2018 Share Posted September 26, 2018 You can try this, though it will only work if parity is still valid with the missing disk6. -Tools -> New Config -> Retain current configuration: All -> Apply -Assign any missing disk(s), including the new disk2 -Important - After checking the assignments leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type: mdcmd set invalidslot 2 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk2 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Link to comment
bjreich Posted September 26, 2018 Author Share Posted September 26, 2018 Thanks for your help Johnnie, I'll give that a go tomorrow afternoon when I have the replacement disk in hand. That's sounds like a promising process because I get to keep the old disk 2 untouched in case I need to try and do recovery on it later. Do you mind if I ask what the mdcmd set invalidslot 2 29 does? It sounds like it tells the unraid layer to invalidate and rebuild disk 2 but what about the "29" part? Regardless of what happens I'll let everyone know how I go. Link to comment
JorgeB Posted September 26, 2018 Share Posted September 26, 2018 25 minutes ago, bjreich said: but what about the "29" part? Since you don't have dual parity, is so that parity2 is also set as invalid. Link to comment
JonathanM Posted September 26, 2018 Share Posted September 26, 2018 3 hours ago, bjreich said: Regrettably I didn't immediately deal with the situation because then a real disk fault occurred on Disk 2 How long was the array running between the first and second event? Link to comment
trurl Posted September 26, 2018 Share Posted September 26, 2018 3 minutes ago, jonathanm said: How long was the array running between the first and second event? And do you have Notifications setup? Link to comment
bjreich Posted September 26, 2018 Author Share Posted September 26, 2018 No I don't have notifications setup, which is bad bad bad (I'm an IT guy). The array was running for about two weeks I ( I was busy and totally forgot about it), but the day before the incident I logged in and saw the Dashboard still said missing, the next day Disk 2 had an error and Disk 6 changed to unassigned. 1 day since then. Link to comment
JonathanM Posted September 26, 2018 Share Posted September 26, 2018 So the array was in use for two weeks with disk6 missing? Link to comment
JorgeB Posted September 26, 2018 Share Posted September 26, 2018 That's a long time, like I mentioned, this will only work if parity remains valid with the missing disk installed, i.e., no data changed on that disk since it got disabled. Link to comment
trurl Posted September 26, 2018 Share Posted September 26, 2018 24 minutes ago, johnnie.black said: no data changed on that disk since it got disabled And in case you don't know how it works, reads and writes to the emulated disk continue while the actual disk is disabled. All other disks plus parity are read to calculate the missing disk data for reading, and similarly to calculate the parity update for writing. Emulated data writes are recorded by the parity update so those writes can be read by the emulation, and so those writes can be part of the rebuild. The original disk can be out of sync because parity got the writes the disabled disk missed. Link to comment
bjreich Posted September 27, 2018 Author Share Posted September 27, 2018 I'm about to perform the process above fingers crossed. Luckily, disk 6 is 100% full so there shouldn't have been even any emulated writes to it. When I upgrade my discs next I'm going to spread the data out so I have no more than 90-95% full per disk. Two weeks was a long time, I'm kicking myself for not immediately dealing with it. My unraid box is just SO stable and problem free I became too complacent. Lesson learnt! Link to comment
bjreich Posted September 27, 2018 Author Share Posted September 27, 2018 Just waiting for the array to mount now and see what happens. Very nerve wracking ignoring all the warnings that I'm doing the opposite of what I want. If disk 2 is lost at least I have the old one to put into a caddy, filesystem check and hopefully recover data from it. Link to comment
bjreich Posted September 27, 2018 Author Share Posted September 27, 2018 Well it's not rebuilding my parity (great) and it is rebuilding disk 2 (even better), so far so good. Thank you VERY much for your help Johnnie Black, its definitely appreciated. I'm just having a look through the emulated data now. I'll let you know what the end result is when its done. Link to comment
bjreich Posted September 29, 2018 Author Share Posted September 29, 2018 Quick update. Disk 2 does still have filesystem issues after it's rebuild, its online but files complain about being read-only and have permissions issues. Unfortunately, reiserfsck recommends a --rebuild-tree to fix some of the issues. That is still running now, so we'll see. I have the old Disk 2 safe and sound in case I need to repair it and compare results. Many of the issues with the current Disk 2 may be from differences in Parity rather part of the initial filesystem issue it had. Link to comment
JorgeB Posted September 29, 2018 Share Posted September 29, 2018 38 minutes ago, bjreich said: Many of the issues with the current Disk 2 may be from differences in Parity rather part of the initial filesystem issue it had. Very likely, if you don't have checksums and after the filesystem is fixed you should copy every file you can from the old disk overwriting existing ones, every file successfully copy can be assumed OK. You can also run a binary or checksum compare between the two to find the differences. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.