I screwed up


Recommended Posts

Let me start of by saying no, I did not do regular parity checks, no I didn't take good inventory of my data to know what I've lost, yes I understand if you don't a backup then that's your own fault, and yes, I am an idiot.

 

Now that that's out of the way. I had a data disk go bad in my array, I got a new drive put in mid-day yesterday and then right before I went to sleep I checked the progress one last time and saw this (first image). I knew I was screwed but couldn't deal with it last night. This morning it looks like this (second image). 

 

I'm not going to beg for ways to save the data (I understand it's gone). What I want to do is stem the tide of damage. Should I kill the rebuild and just write off 10TB (old drive was 5TB that I replaced)? How can I do that while saving the remaining data? Again, I know this is my own fault and I've lost a good chunk of data but I would really appreciate any help in saving whatever I can. 

Screenshot 2020-03-31 22.34.50.png

Screenshot 2020-04-01 11.02.17.png

Link to comment

Well I was going to attach them but it's been working on it for 20min now, I'm going to leave the page open but should I expect it to ever finish? 

 

Also here is an update on the rebuild process

Quote

 

Total size: 8 TB

Elapsed time: 20 hours, 21 minutes

Current position: 2.20 TB (27.5 %)

Estimated speed: 398.3 KB/sec

Estimated finish: 168 days, 13 hours, 21 minutes

 

 

Link to comment

May not be as bad as you think. Best case, you disturbed the connection to disk5 when replacing disk3, and fixing that before restarting the rebuild will result in complete success. Might even be possible that the disk you replaced wasn't bad either but something else caused it to be disabled.

 

Or it could indeed be bad due to all the neglect, though it would be unlikely to lose more than just that 1 or 2 disks.

 

We won't know without more information than just those screenshots. Absolutely no point in continuing with that rebuild with all of the disk5 errors. Stop it and see if you can get diagnostics.

Link to comment

Hmm, ok I can't stop the rebuild. When I click to cancel and then confirm it makes a request to /update.htm with the following form data:

startState:STARTED

file:

csrf_token: A202<REMOVED>1BDA43

cmdNoCheck: Cancel

but it just hangs ("pending") and never completes. What is the safest way for me to take down this machine or is pulling the power my only option?

 

 

EDIT: It times out (504) after a while

Edited by joshstrange
Update
Link to comment

Tried that but after 8 minutes it doesn't appear to be shutting down. Normally I would tail the syslog to figure out what the issue was but it's not updating due to it not having any space...

root@Tower:~# poweroff

Broadcast message from root@Tower (pts/1) (Wed Apr  1 11:52:26 2020):

The system is going down for system halt NOW!

 

Link to comment

Ok I tried:

 

root@Tower:~# poweroff -f

 

And got no output/response and the machine is still up (pingable/sshable) a few minutes later. Am I not waiting long enough or is it hung?

 

I can see that 2 instances of "shutdown -h 0 w", 1 instance of "poweroff -f", and 1 instance of "/usr/local/sbin/emhttpd" are all stuck in uninterruptible sleep mode "D" using htop.

Link to comment

Ok, I ended up just holding down the power button to kill it. Here are the diagnostics. When I opened up the machine drive 5 (the second failing drive) looked like the sata power cable was ajar but I can't be sure if that was from me removing the sata data cable. After boot the drive isn't showing up at all now so I'm shutting it back down to replace the sata data cable but I grabbed diagnostics first.

tower-diagnostics-20200401-1257.zip

Link to comment

Wow, I could kiss you all. I was 100% ready to write off this data (and that still may need to happen) but after replacing the sata cable disk 5 is showing up and I can see data on it. I'm starting another rebuild so wish me luck. THANK YOU, THANK YOU, THANK YOU. I've had sata cables go bad before but I was sure this was 100% my fault (still a true statement) and I had 2 drives on their last leg (again still could be the case but at least now I have some hope). Thank you again and I hope in a day or so I'll have it all rebuilt!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.