6.6.5 - Parity Drive Bad - errors across other drives


Recommended Posts

Hello,

 

I woke up this morning to a warning on my dashboard that my parity drive was in failed state. I also noticed that seven of my other drives were showing errors. Screenshot below:

 

1489582025_ScreenShot2019-08-22at7_40_27AM.thumb.png.f977216d864438b19789e89acdcdd7ba.png

 

I am running 6.6.5 as I am afraid to upgrade to the 6.7.x series due to the ongoing SQLLite corruption bug.

 

Parity and drives 1 - 7 are on a Dell PERC H310, drives 8 - 10 are on the onboard SATA on my motherboard.

 

I'm running a few things in Docker and I have one VM. Otherwise this is just straight NAS.

 

Thoughts on where I go from here? Did I lose my data across the first seven drives?

Link to comment
53 minutes ago, johnnie.black said:

Based on the screenshot looks like a controller problem, the one connected to the 8 drives with errors, make sure it's well seated and cooled, if possible try a different slot, if it keeps happening could be going bad.

That was my first thought as well (re: controller problem).

 

The server has not been physically touched in nearly six months so it's leading me to think cooling issue or failure versus being unseated. I know there was high I/O last night.

 

The server is currently off. Are you suggesting I reboot it after it has had time to cool down and see where I am? 

Link to comment
6 minutes ago, johnnie.black said:

After rebooting all drives should come online, unless the HBA is really dead, either way parity will need to be re-synched.

Ok - once I'm back at the server I'll try powering it back on.

 

Should I force maintenance mode? I wasn't able to check the box when powering down; can I force with "startArray=“no”" in disk.cfg?

 

I'm concerned about all of the errors on the seven data drives. Are those "false" and will be clear with powering back on (assuming the controller isn't dead) or am I looking at data loss?

 

I have diagnostics and syslogs. Not sure if those are helpful at this point.

Link to comment

Those errors are normal if the HBA stopped working, but Unraid only disables one disk max with single parity, 2 with dual parity, and luckily parity was the first one to error so the one that got disabled, errors will be reset after a power cycle, data on all disks should be fine, just parity will need re-synching.

Link to comment
4 hours ago, autumnwalker said:

 

The server has not been physically touched in nearly six months so it's leading me to think cooling issue or failure versus being unseated. I know there was high I/O last night.

And clean the inside of the case if it is dirty.  Check that all fans are working and that cooling fins on heat sinks, air intakes, and exhaust ports are not blocked.  

  • Upvote 1
Link to comment
3 minutes ago, johnnie.black said:

If you're not sure best to rebuild the disk, to the same or a new one, but make sure it's mounting correctly before overwriting the old one.

 

If it goes bad again you'll need to re-start the rebuild from the beginning.

I'll try re-seating the LSI card and power it back on, see if everything mounts ok.

 

The last time it ran for just over a day before going bad again. My fear is whatever this is is now intermittent - it initially powers on fine, but dies somewhere after a few hours of power. If it powers on ok (disk looks mounted properly) and it starts parity rebuild, but dies mid rebuild is my data trashed or am I exactly in the same spot I am now (one "bad" drive)?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.