Monthly parity ended due to 718 errors, disk8 in error state, 4 disk read errors

visionmaster · February 1, 2017

Hi guys,

I need help on how to best proceed. My monthly parity started at midnight as usual and my system was fully up to date and has been working well without any errors. I'm on 6.2.4 and it's been up for 51 days since last reboot. Around 9:35am this morning, I got 3 email notifications that said I had problems: 1) Disk 8 in error state 2) Disk 6 - 78 errors, Disk 9 - 78 errors, Disk 11 - 78 errors, Disk 8 - 690 errors 3) Parity check ended with 718 errors. I am waiting to power down, reboot, disable disks, rebuild disks, etc until I get your good advice. Let me know if I forgot anything or if more info is needed. Thanks!

tower-diagnostics-20170201-1818.zip

trurl · February 2, 2017

The disabled disk8 isn't reporting SMART so it has lost connection. Check connections and try again with another diagnostic.

With that many disks you really should consider dual parity.

visionmaster · February 2, 2017

Silly question, but should I powerdown and try reseating all the cables? I just looked and everything seems to be well connected. Will I be in danger of losing any data powering down at this point?

trurl · February 2, 2017

Silly question, but should I powerdown and try reseating all the cables? I just looked and everything seems to be well connected. Will I be in danger of losing any data powering down at this point?

Yes, powerdown and reseat cables. Many people like to bundle their cables and it can cause problems when they do because it will put some stress at the connector which can cause it to work loose, or not be square on the connector.

Whether you will wind up with any data loss isn't completely clear since you didn't have a good parity check, but whatever damage there will be has probably already been done at this point and powerdown shouldn't affect that.

visionmaster · February 2, 2017

It took awhile to unmount everything and a bunch of errors popped up on several other disks. I went to the cmd console and forced unmount then did a powerdown. All cables seemed fine and I double checked and replugged everything. I powered up and everything came online ok with disk 8 in red ball state but everything else ok. I checked SMART on all the drives and all pass and seem good. 0 reallocated sectors and 0 pending sectors on all 23 drives.

The parity check was going well for 9 and a half hours before the disks errored. I'm wondering if a cable of connection came loose which caused the issues. I'm sure nothing was written to the server since yesterday when everything was working fine and before 9.5 hours of parity check. At this point, should I rebuild Disk 8 with a new disk or bring it back online and run another non correcting parity check?

trurl · February 2, 2017

Was it a correcting or non-correcting parity check?

visionmaster · February 2, 2017

According to scheduler, in the unRAID GUI, it was set to write corrections. So does that mean that parity is incorrect and rebuilding from parity will place errors on the rebuilt disk? My gut says that the data disks are actually good (even disk 8 which is red balled).

visionmaster · February 2, 2017

According to the GUI, the parity check ended by "error code:user abort" finding 718 errors. But I didn't stop it, it stopped itself, 9.5 hours in. I guess this happened when cable to disk 8 and several other cables got unseated? So even though it was doing a monthly parity check with correct errors set to yes, did incorrect parity get written?

My question at this point is? Is there away to restart the array with disk 8 in a green state and to recheck the parity? If it comes back all ok, then chalk it up to loose cables and if there are errors this time then assume disk 8 is bad and rebuild with a new disk.

I'm not sure I'm explaining my question correctly but it makes sense to me in my delirious state.

Thanks.

JorgeB · February 2, 2017

Another case of SASLP crashing and dropping disks offline, assuming all SMART reports are fine and no writes to the disabled disk do a new config instead of rebuild since parity was incorrectly update and is now corrupt.

Recommend changing scheduled parity check to non correct.

Monthly parity ended due to 718 errors, disk8 in error state, 4 disk read errors

Recommended Posts

visionmaster

Link to comment

trurl

Link to comment

visionmaster

Link to comment

trurl

Link to comment

visionmaster

Link to comment

trurl

Link to comment

visionmaster

Link to comment

visionmaster

Link to comment

JorgeB

Link to comment

Join the conversation