Monthly parity ended due to 718 errors, disk8 in error state, 4 disk read errors


Recommended Posts

Hi guys,

 

I need help on how to best proceed.  My monthly parity started at midnight as usual and my system was fully up to date and has been working well without any errors. I'm on 6.2.4 and it's been up for 51 days since last reboot. Around 9:35am this morning, I got 3 email notifications that said I had problems: 1) Disk 8 in error state 2) Disk 6 - 78 errors, Disk 9 - 78 errors, Disk 11 - 78 errors, Disk 8 -  690 errors 3) Parity check ended with 718 errors.  I am waiting to power down, reboot, disable disks, rebuild disks, etc until I get your good advice.  Let me know if I forgot anything or if more info is needed. Thanks!

tower-diagnostics-20170201-1818.zip

Link to comment

Silly question, but should I powerdown and try reseating all the cables? I just looked and everything seems to be well connected. Will I be in danger of losing any data powering down at this point?

Yes, powerdown and reseat cables. Many people like to bundle their cables and it can cause problems when they do because it will put some stress at the connector which can cause it to work loose, or not be square on the connector.

 

Whether you will wind up with any data loss isn't completely clear since you didn't have a good parity check, but whatever damage there will be has probably already been done at this point and powerdown shouldn't affect that.

Link to comment

It took awhile to unmount everything and a bunch of errors popped up on several other disks. I went to the cmd console and forced unmount then did a powerdown.  All cables seemed fine and I double checked and replugged everything.  I powered up and everything came online ok with disk 8 in red ball state but everything else ok.  I checked SMART on all the drives and all pass and seem good. 0 reallocated sectors and 0 pending sectors on all 23 drives.

 

The parity check was going well for 9 and a half hours before the disks errored. I'm wondering if a cable of connection came loose which caused the issues.  I'm sure nothing was written to the server since yesterday when everything was working fine and before 9.5 hours of parity check. At this point, should I rebuild Disk 8 with a new disk or bring it back online and run another non correcting parity check?

Link to comment

According to the GUI, the parity check ended by "error code:user abort" finding 718 errors. But I didn't stop it, it stopped itself, 9.5 hours in. I guess this happened when cable to disk 8 and several other cables got unseated?  So even though it was doing a monthly parity check with correct errors set to yes, did incorrect parity get written?

 

My question at this point is? Is there away to restart the array with disk 8 in a green state and to recheck the parity?  If it comes back all ok, then chalk it up to loose cables and if there are errors this time then assume disk 8 is bad and rebuild with a new disk.

 

I'm not sure I'm explaining my question correctly but it makes sense to me in my delirious state.

 

Thanks.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.