Jump to content
Ryan_M

Some help figuring out what's going on here?

15 posts in this topic Last Reply

Recommended Posts

This morning I came down to find unraid had started a parity check, power had not gone out plus the unraid box is on a UPS. Also all drives were showing healthy. I figured I'd let it do it's thing but when I got home from work the system had locked up so unfortunately I had to do a reset which would start a new parity check. It's running painfully slow and about every 30s I hear a high pitch chirp coming from a drive in that box. all drives are still showing healthy in the main screen though it has found about 7 parity errors in the first 30GB of data checked.

 

I'd like some guidance before I jump in not really knowing what I'm doing and mess something up.

 

Thanks for any help,

Ryan

syslog-2015-12-07.zip

Share this post


Link to post

If the "unexpected" parity check was not a scheduled one, then the only other explanation is that it was started after an unclean shutdown so there must have been a power problem you weren't aware of. A few parity errors after an unclean shutdown is not entirely unexpected and is the reason unRAID does an automatic parity check after an unclean shutdown.

Share this post


Link to post

Disk 6 WDC WD20EARS-00MVWB0,      WD-WMAZ20137354  and disk 7 ST2000DL003-9VT166,            6YD0RZ6F are both having heart attacks.

 

Wouldn't be a bad idea to power off the server, reseat all the cables to the drives and try a parity check again.

Share this post


Link to post

Looks like I'm into a bigger mess than I thought. I replaced the cables for both those drives and seemed to fix the issue for a short while then they started throwing fits again. I had a spare controller around so I swapped that out and it made no difference. Then disk 6 red balled. Both of those drives are 2TB, I'm going to pick up a new 4TB drive to swap them out. Any advice on how to replace 1 dead and 1 failing drive with one bigger one? I still find it weird to have two drives fail at the exact same time though.

Share this post


Link to post

Looks like I'm into a bigger mess than I thought. I replaced the cables for both those drives and seemed to fix the issue for a short while then they started throwing fits again. I had a spare controller around so I swapped that out and it made no difference. Then disk 6 red balled. Both of those drives are 2TB, I'm going to pick up a new 4TB drive to swap them out. Any advice on how to replace 1 dead and 1 failing drive with one bigger one? I still find it weird to have two drives fail at the exact same time though.

Something triggered the initial parity check, and if that resulted in some sort of power surge then that could explain multiple disks failing at the same time.  A good reason to consider adding a UPS to an unRAID system is to protect against exactly this sort of problem.

 

One thing it might be worth doing is starting the array in maintenance mode and running a file system check on each drive individually.  I have seen cases where file system corruption can have strange side-effects.  Also providing the diagnostics (via Tools->Diagnostics) may give additional information that will help decide if the disks are physically failing.

 

In terms of replacing the failing drives, do you have their contents backed up elsewhere.  The answer to that is likely to affect the advice given.  There is no standard way to reduce the number of drives in an unRAID array except by using the 'New Config' method and  that discards the parity information so you lose any chance of recovering a failed drive.  If you have the data backed up then that is probably the easiest way forward, but if not then more thought will be needed into the best way forward that will minimise any data loss.

Share this post


Link to post

...  A good reason to consider adding a UPS to an unRAID system is to protect against exactly this sort of problem.

...  Also providing the diagnostics (via Tools->Diagnostics) may give additional information that will help decide if the disks are physically failing.

Has an UPS according to OP.

 

On v5 so instead of Diagnostics, provide SMART reports for all drives.

 

Share this post


Link to post

...  A good reason to consider adding a UPS to an unRAID system is to protect against exactly this sort of problem.

...  Also providing the diagnostics (via Tools->Diagnostics) may give additional information that will help decide if the disks are physically failing.

Has an UPS according to OP.

You are right - I missed that.

 

On v5 so instead of Diagnostics, provide SMART reports for all drives.

I wonder if the 'diagnostics' script runs on v5?  If so it might be worth making it available as a download.  One very quickly gets used to the additional material provided when trying to diagnose issue.

Share this post


Link to post

I don't have the data backed up else where. It's not critical data but it would be a HUGE pain to get it all set back up again so obviously I'd like to avoid losing it. So what is the best way to proceed?

 

The redballed HDD was a 2TB drive as was the other drive that was acting up. I unplugged the redballed drive and started the computer just to see exactly what was on those drives and so far it's not throwing any fits. I have a new 4TB drive that is being pre-cleared as we speak though that will probably take a couple days. Also I see I have ~2.75TB free on another drive in the array so I have the capacity to copy that much data off of the suspect drives. So what now?

 

FWIW I found an older 2TB drive that I had hanging around. IIRC it gave a couple errors during a pre-clear so I set it aside to run other tests and never got around to it. I'm putting it in another system now just so see what's up with it and run a few tests.... maybe not good enough to put in the array but will work short term as a place to temporarily put some data.

 

Thanks for the help!

Share this post


Link to post

So how many drives are actually redballed? Post a screenshot, and SMART attributes for all drives. We need to have a better understanding of your situation before we can make good recommendations.

Share this post


Link to post

Sorry for the scattered info, I'll try and condense it here:

 

- In the log file in post #1 that disk6 and disk7 were throwing fits and the system was doing an unscheduled parity check.

- Aside from the errors I could hear some pinging about every 30s or so.

- I tried new cables for those two drives - same result.

- I tried a spare controller with the same result again, then shortly after only disk6 redballed. I shut down the array to prevent any further damage. The main screen on the web gui was showing errors only on disk6 and it was into the millions.

- I removed only disk6 and started the array to see what was on those drives to make life a little easier trying to figure out what I lost should I lose the drives. So far it seems to be functioning normally.

 

I don't know if SATA drives will share channels but disk6 and disk 7 were plugged in next to each other on the controller... just in case one might affect the other.

 

I'll get the smart reports and post them. Unfortunately I panicked and shut down the array when disk6 red balled and I didn't get a screen shot or log file at that time.

 

Thanks

Share this post


Link to post

You should be able to unassign disk6 and start the array without it. Then you can browse the emulated disk6 to see if it looks like all the files are there. The emulated disk6 contents are what would be rebuilt onto the actual disk6.

 

If you have a spare disk, you could rebuild onto that instead and keep the original as it is in case something goes wrong with the rebuild.

Share this post


Link to post

I haven't done anything yet to try to fix this problem as I was waiting for my new drive to finish its preclear - which it did successfully.

 

I came down to find my computer was beeping like crazy. It really sounded like it was coming from somewhere in the bank of drives. I figured disk 6 had finally quit hard. I shut down the system and unplugged disk 6. When trying to boot the beeping was still there and unraid was having troubles booting. I shut down again and unplugged disk 7, this time no beeping and unraid booted fine. I tried booting again with disk 6 back in and still no beeping and unraid booted fine again. So seems to me disk 7 is a gonner.

 

So what are my options here? As far as I can see the only option I have is to replace disk 7, un-redball the disk 6, sacrifice a chicken, and hope it rebuilds the data onto disk 7? The reality of it I'm guessing is both those drives are gonzo and so is the data. Thoughts?

 

[edit] I wanted to add that I had some issues with disk 7 HERE a few months ago. It appeared that changing the cable fixed the issue but it seems there were deeper issues.[/edit]

Share this post


Link to post

I haven't done anything yet to try to fix this problem as I was waiting for my new drive to finish its preclear - which it did successfully.

 

I came down to find my computer was beeping like crazy. It really sounded like it was coming from somewhere in the bank of drives. I figured disk 6 had finally quit hard. I shut down the system and unplugged disk 6. When trying to boot the beeping was still there and unraid was having troubles booting. I shut down again and unplugged disk 7, this time no beeping and unraid booted fine. I tried booting again with disk 6 back in and still no beeping and unraid booted fine again. So seems to me disk 7 is a gonner.

 

So what are my options here? As far as I can see the only option I have is to replace disk 7, un-redball the disk 6, sacrifice a chicken, and hope it rebuilds the data onto disk 7? The reality of it I'm guessing is both those drives are gonzo and so is the data. Thoughts?

To un-redball disk6 you have to either rebuild it, or set a new config and rebuild parity. Obviously rebuilding disk7 doesn't fit into either scenario.

 

Search the whole forum for "invalidslot", I think it's something you run at the command line to tell unRAID which disk you want it to consider bad. Maybe someone else will chime in.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.