joshstrange Posted April 1, 2020 Share Posted April 1, 2020 Let me start of by saying no, I did not do regular parity checks, no I didn't take good inventory of my data to know what I've lost, yes I understand if you don't a backup then that's your own fault, and yes, I am an idiot. Now that that's out of the way. I had a data disk go bad in my array, I got a new drive put in mid-day yesterday and then right before I went to sleep I checked the progress one last time and saw this (first image). I knew I was screwed but couldn't deal with it last night. This morning it looks like this (second image). I'm not going to beg for ways to save the data (I understand it's gone). What I want to do is stem the tide of damage. Should I kill the rebuild and just write off 10TB (old drive was 5TB that I replaced)? How can I do that while saving the remaining data? Again, I know this is my own fault and I've lost a good chunk of data but I would really appreciate any help in saving whatever I can. Quote Link to comment
JorgeB Posted April 1, 2020 Share Posted April 1, 2020 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Well I was going to attach them but it's been working on it for 20min now, I'm going to leave the page open but should I expect it to ever finish? Also here is an update on the rebuild process Quote Total size: 8 TB Elapsed time: 20 hours, 21 minutes Current position: 2.20 TB (27.5 %) Estimated speed: 398.3 KB/sec Estimated finish: 168 days, 13 hours, 21 minutes Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Crap, I just looked and /var/log is full so idk if that's what is causing the issue (my /var/log is 2GB in size for reference) Quote Link to comment
trurl Posted April 1, 2020 Share Posted April 1, 2020 May not be as bad as you think. Best case, you disturbed the connection to disk5 when replacing disk3, and fixing that before restarting the rebuild will result in complete success. Might even be possible that the disk you replaced wasn't bad either but something else caused it to be disabled. Or it could indeed be bad due to all the neglect, though it would be unlikely to lose more than just that 1 or 2 disks. We won't know without more information than just those screenshots. Absolutely no point in continuing with that rebuild with all of the disk5 errors. Stop it and see if you can get diagnostics. Quote Link to comment
JorgeB Posted April 1, 2020 Share Posted April 1, 2020 The rebuild should be canceled, but there's a change data can be salvaged, disk5 might have dropped offline. Power down, check/replace cables on disk5, power back on and post diags. Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 On it! Thank you guys. I really appreciate the responses while I'm in panic mode, it helps a lot! Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 (edited) Hmm, ok I can't stop the rebuild. When I click to cancel and then confirm it makes a request to /update.htm with the following form data: startState:STARTED file: csrf_token: A202<REMOVED>1BDA43 cmdNoCheck: Cancel but it just hangs ("pending") and never completes. What is the safest way for me to take down this machine or is pulling the power my only option? EDIT: It times out (504) after a while Edited April 1, 2020 by joshstrange Update Quote Link to comment
JorgeB Posted April 1, 2020 Share Posted April 1, 2020 Type "poweroff" on the console Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Tried that but after 8 minutes it doesn't appear to be shutting down. Normally I would tail the syslog to figure out what the issue was but it's not updating due to it not having any space... root@Tower:~# poweroff Broadcast message from root@Tower (pts/1) (Wed Apr 1 11:52:26 2020): The system is going down for system halt NOW! Quote Link to comment
JorgeB Posted April 1, 2020 Share Posted April 1, 2020 You'll need to force it. Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Ok I tried: root@Tower:~# poweroff -f And got no output/response and the machine is still up (pingable/sshable) a few minutes later. Am I not waiting long enough or is it hung? I can see that 2 instances of "shutdown -h 0 w", 1 instance of "poweroff -f", and 1 instance of "/usr/local/sbin/emhttpd" are all stuck in uninterruptible sleep mode "D" using htop. Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Ok, I ended up just holding down the power button to kill it. Here are the diagnostics. When I opened up the machine drive 5 (the second failing drive) looked like the sata power cable was ajar but I can't be sure if that was from me removing the sata data cable. After boot the drive isn't showing up at all now so I'm shutting it back down to replace the sata data cable but I grabbed diagnostics first. tower-diagnostics-20200401-1257.zip Quote Link to comment
joshstrange Posted April 1, 2020 Author Share Posted April 1, 2020 Wow, I could kiss you all. I was 100% ready to write off this data (and that still may need to happen) but after replacing the sata cable disk 5 is showing up and I can see data on it. I'm starting another rebuild so wish me luck. THANK YOU, THANK YOU, THANK YOU. I've had sata cables go bad before but I was sure this was 100% my fault (still a true statement) and I had 2 drives on their last leg (again still could be the case but at least now I have some hope). Thank you again and I hope in a day or so I'll have it all rebuilt! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.