6.6.5 - Parity Drive Bad - errors across other drives - General Support

August 22, 20196 yr

Hello,

I woke up this morning to a warning on my dashboard that my parity drive was in failed state. I also noticed that seven of my other drives were showing errors. Screenshot below:

I am running 6.6.5 as I am afraid to upgrade to the 6.7.x series due to the ongoing SQLLite corruption bug.

Parity and drives 1 - 7 are on a Dell PERC H310, drives 8 - 10 are on the onboard SATA on my motherboard.

I'm running a few things in Docker and I have one VM. Otherwise this is just straight NAS.

Thoughts on where I go from here? Did I lose my data across the first seven drives?

Quote

August 22, 20196 yr

Community Expert

Based on the screenshot looks like a controller problem, the one connected to the 8 drives with errors, make sure it's well seated and cooled, if possible try a different slot, if it keeps happening could be going bad.

Quote

August 22, 20196 yr

Author

53 minutes ago, johnnie.black said:

Based on the screenshot looks like a controller problem, the one connected to the 8 drives with errors, make sure it's well seated and cooled, if possible try a different slot, if it keeps happening could be going bad.

That was my first thought as well (re: controller problem).

The server has not been physically touched in nearly six months so it's leading me to think cooling issue or failure versus being unseated. I know there was high I/O last night.

The server is currently off. Are you suggesting I reboot it after it has had time to cool down and see where I am?

Quote

August 22, 20196 yr

Community Expert

After rebooting all drives should come online, unless the HBA is really dead, either way parity will need to be re-synched.

Quote

August 22, 20196 yr

Author

6 minutes ago, johnnie.black said:

After rebooting all drives should come online, unless the HBA is really dead, either way parity will need to be re-synched.

Ok - once I'm back at the server I'll try powering it back on.

Should I force maintenance mode? I wasn't able to check the box when powering down; can I force with "startArray=“no”" in disk.cfg?

I'm concerned about all of the errors on the seven data drives. Are those "false" and will be clear with powering back on (assuming the controller isn't dead) or am I looking at data loss?

I have diagnostics and syslogs. Not sure if those are helpful at this point.

Quote

August 22, 20196 yr

Community Expert

Those errors are normal if the HBA stopped working, but Unraid only disables one disk max with single parity, 2 with dual parity, and luckily parity was the first one to error so the one that got disabled, errors will be reset after a power cycle, data on all disks should be fine, just parity will need re-synching.

Quote

August 22, 20196 yr

Author

Ok! So just power it back on and see what happens?

(afraid of data loss ... and dealing with restoring 18 TB from CrashPlan)

Edited August 22, 20196 yr by autumnwalker

Quote

August 22, 20196 yr

Community Expert

4 hours ago, johnnie.black said:

make sure it's well seated and cooled, if possible try a different slot,

Quote

August 22, 20196 yr

Community Expert

4 hours ago, autumnwalker said:

The server has not been physically touched in nearly six months so it's leading me to think cooling issue or failure versus being unseated. I know there was high I/O last night.

And clean the inside of the case if it is dirty. Check that all fans are working and that cooling fins on heat sinks, air intakes, and exhaust ports are not blocked.

Quote

August 22, 20196 yr

Author

52 minutes ago, Frank1940 said:

And clean the inside of the case if it is dirty. Check that all fans are working and that cooling fins on heat sinks, air intakes, and exhaust ports are not blocked.

Agree - I'll take the opportunity to do that now that it's offline.

Quote

August 22, 20196 yr

Author

Ok. I opened it up to take a look. Surprisingly little dust. Nothing clogged / blocked. No obvious damage (blown capacitors, scorch marks, etc.).

I powered it back up in safe mode. Parity is still disabled and two disks are missing - disk 5 and disk 6.

Thoughts?

Quote

August 22, 20196 yr

Community Expert

Get the Diagnostics file Tools >>> Diagnostics and upload it in a new post.

Quote

August 22, 20196 yr

Author

Before seeing your post I powered it down, removed and reseat all of the SATA cables and powered it back on. Now all disks are showing; parity still disabled.

Dare I take this out of safe mode and let it attempt to rebuild parity?

@Frank1940 I assume the diagnostics are no good now that all drives are visible?

Edited August 22, 20196 yr by autumnwalker

Quote

August 22, 20196 yr

Community Expert

To the best of my knowledge, you can rebuild parity while in the Safe Mode. Having said, I would still grab the Diagnostics file and post it up. Then folks can have a look at the SMART data on that parity drive.

Quote

August 22, 20196 yr

Author

Current diagnostics attached. System still in safe mode, array offline, parity disk disabled.

nas01-diagnostics-20190822-2050.zip

Quote

August 23, 20196 yr

Community Expert

Parity disk seems to be fine. Go ahead and see if you can rebuild parity. See this for a how-to-do:

http://lime-technology.com/wiki/index.php?title=FAQ#How_do_I_re-enable_a_failed_disk.3F

Quote

August 23, 20196 yr

Author

Thank you @Frank1940 and @johnnie.black.

Parity rebuild just finished 12h 2m, 92.3 MB/s, 0 errors.

Safe to pull this out of safe mode now?

Quote

August 23, 20196 yr

Community Expert

Yep

Quote

August 25, 20196 yr

Author

All was fine for a dayish and the same eight drives (controller) went bad again. This time disk 4 was disabled.

I suspect the controller is bad (or the PCIe slot) ... should I do anything different this time where disk 4 was disabled vs. parity? Diagnostics attached.nas01-diagnostics-20190825-1409.zip

Quote

August 25, 20196 yr

Community Expert

It might be worth checking the card is properly seated in the PCIe slot. I once had similar problems that I eventually tracked down to the fact that the card was not properly seated and if would every so often drop all drives for no apparent reason.

Quote

August 25, 20196 yr

Author

I'll remove / reseat the card to be sure.

When powering back on will I need to rebuild the array? Can I "trust" it?

Quote

August 26, 20196 yr

Community Expert

Yes, if you're sure no writes were done to the disabled disk, you'll still need to run a parity check.

Quote

August 26, 20196 yr

Author

4 hours ago, johnnie.black said:

Yes, if you're sure no writes were done to the disabled disk, you'll still need to run a parity check.

I'm not sure. Parity rebuild then?

What happens if it's in the middle of a rebuild and goes bad again?

Quote

August 26, 20196 yr

Community Expert

If you're not sure best to rebuild the disk, to the same or a new one, but make sure it's mounting correctly before overwriting the old one.

If it goes bad again you'll need to re-start the rebuild from the beginning.

Quote

August 26, 20196 yr

Author

3 minutes ago, johnnie.black said:

If you're not sure best to rebuild the disk, to the same or a new one, but make sure it's mounting correctly before overwriting the old one.

If it goes bad again you'll need to re-start the rebuild from the beginning.

I'll try re-seating the LSI card and power it back on, see if everything mounts ok.

The last time it ran for just over a day before going bad again. My fear is whatever this is is now intermittent - it initially powers on fine, but dies somewhere after a few hours of power. If it powers on ok (disk looks mounted properly) and it starts parity rebuild, but dies mid rebuild is my data trashed or am I exactly in the same spot I am now (one "bad" drive)?

Quote

6.6.5 - Parity Drive Bad - errors across other drives

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)