Recovery from Power Outage


pantner

Recommended Posts

Hey all, need some help/advise recovering from a power outage with what may be 2 faulty HDDs.

 

My server has been running perfectly for months, almost 2 weeks ago i have a PSU die.

Finally got a replacement yesterday and booted it up.

My Unraid server is a VM on ESXi, i don't think that affects anything, just mentioning it in case it does.

Running version 6.6.6 (ha ha...)

 

First time i booted the VM i had the SAS cables plugged into the wrong HBA, and all my large HDDs showed as 2.2TB drives.

Shut down, swapped the cables over and booted again.

All drives recognised however my Parity disk and "Disk 2" are showing alerts.

 

Parity Drive is showing reallocated sectors, etc.

Disk 2 is showing UDMA CRC errors.

 

Screenshots of both attached.

 

On top of that (i don't know if it is related?) the drive array is taking a very long time to start. I left it for at-least 30 minutes and it was still on Disk 4.

Also attached screenshot of that.

 

I didn't want to leave it overnight in case it did start, started a parity check and all the errors on the Parity Disk killed data.

 

I had a look in the log and couldn't see any errors, sorry i don't have a copy of it ATM.

 

I was thinking i should just pull the parity disk, start the array without parity, manually move the data off Disk 2 and remove it from the array and then independently test each of them.

 

Or is there anything else i should be doing/checking?

 

Thanks in advance :)

Unraid Parity.PNG

Unraid Disk 2.PNG

Unraid mounting.PNG

Link to comment

CRC errors are not Drive errors.  They indicate a failure of the CRC check of the data being transmitted over the SATA data cable between the hard drive and SATA controller where the CRC is verified to determine that there was no corruption in the transfer.  (By the way, this number is permanently stored on the disk and can not be reset.  Monitor to make sure it does not increase.)   It is usually caused by a bad SATA cable or a loose connection of the cable. The data on the disk is most likely fine.   The data will simply be retransmitted until it is received without failing.

 

You should post up a Diagnostics report      Tools   >>>   Diagnostics  so that someone can have a better look at the parity drive.  (Personally, I would replace the parity disk, rebuild parity on that new disk.  Then I would have a better look at the old parity drive to see what its issues are.  With the limited data that I see here, I would suspect that it is toast.)

 

EDIT:  By the way, 227 days between parity checks is much too long.  You should be doing it (at least) monthly.  If you had not done it, you should also setup the notification system and have it sending you status reports on the health of your server.

Edited by Frank1940
Link to comment

Thanks for the reply.

Just booted up the VM to get that diagnostics for you, got distracted by something for a couple of minutes and when i looked back the array was already mounted and a parity-check was running. I stopped it immediately. It says it found 2 errors in 45 seconds. I'm not sure if that would attempt to fix the parity or fix the data?

 

Here is the diagnostics, though i suspect you are correct. Parity drive is most likely toast other disk might be fine.

 

I'll remove the parity drive and test it independently.

 

I'm just happy the array mounted ok and i can access my files again.

 

Yea, i have gotten very bad with that. I used to manually run it. I actually thought i set up e-mail alerts, but obviously not/not properly.

Will sort out that and a scheduled parity check too.

tower-diagnostics-20191204-1916.zip

Link to comment

I would replace that parity drive immediately.  With over 13,000 remapped sectors  and over 2400 sectors needing to be remapped, it is a disaster  waiting to happen Even if you could get it to remapped those 2400 pending (and, apparently, completely unreadable) sectors to remap, when would the next one popup?  Probably right in the middle of a rebuild of another disk which would cause that rebuild of that disk to fail.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.