Multi disk read errors during parity check 6.5.1 <SOLVED>


Recommended Posts

Upgraded a few days back o 6.5.1. update went fine server working correctly. Expanded arry 2 days ago with additional 4 tb drive, no issues detected. monthly parity check started last night, this morning i see email notifications of one disk disabled and 8 with read errors.

parity check still running but sync errors just keep increasing, 391748176 and counting. Something very strange and wrong here. 

Should i stop the parity check at this point? What steps can i take to troubleshoot? 

 

tower-diagnostics-20180501-0552.zip

Edited by wirenut
problem solved
Link to comment

Thank you for the help.

 

OK. I'll look into replacement of the controller. Been lucky to date i guess as this hasn't happened before.

 

After checking I booted server and it came back on with notification array turned good. Array has 0 disks with read errors.

However disk 1 is disabled. Do i replace it? rebuild it or something else?

Link to comment

If SMART looks fine, since it was offline you'll need to post new diags if you want us to check it, and no data was changed since the errors started I would do a new config instead, then a parity check, preferably without those controllers.

 

If you want to rebuild then use a spare and keep the old disk intact in case something goes wrong.

 

If SMART is bad it's a different story.

Link to comment

Short smart test complete without errors. Running long test now. when done I will post with new diagnostics. I have one spare so replace and rebuild is option if needed. no data changed after errors started. thanks.

Edited by wirenut
Link to comment

that took awhile longer then i remember. smart test passed without error, attached report along with new diagnostics.

new config the way to go then? I've never tried this, ive gathered from what ive read that just reassign all discs to their original assignments, confirm parity is valid and start array?

then do a parity check.

As changing the controllers out are not immediately an option I suppose the same thing could happen and am aware of that.

tower-smart-20180501-1520.zip

tower-diagnostics-20180501-1524.zip

Link to comment

Now home from work with better time to research I think the rebuild with spare disk is best option for my current situation as it appears the controller is the point of mistrust and my backups are not where they should be. If u have any links to steer me in the direction of the LSI controllers that would work with my  current board pci 2.0 slots I'd be greatful. Thanks for the help and advice. 

Link to comment
10 hours ago, wirenut said:

that took awhile longer then i remember. smart test passed without error, attached report along with new diagnostics.

Disk looks fine.

 

10 hours ago, wirenut said:

new config the way to go then? I've never tried this, ive gathered from what ive read that just reassign all discs to their original assignments, confirm parity is valid and start array?

Correct.

 

As for the controllers, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed.

Link to comment
  • 2 months later...
  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.