Jump to content
Sign in to follow this  
sonofdbn

Disk in error state - continue parity check?

9 posts in this topic Last Reply

Recommended Posts

This morning I found my server unresponsive - seemed to be on, but couldn't SSH in and no response from pings. So I switched it off and restarted and it automatically went into a parity check. A few minutes later I got a messages for Disk 3: udma crc error count, then a message saying it was in error state, and warning that the array had errors, 1 disk with read errors.

 

I'm assuming I should replace the disk. Is that correct? (Diagnostics attached.) Also, should I stop the parity check? Finally, if I replace this 4TB disk, can I replace it with an 8TB disk (which is the size of my parity disks)? For the replace, do I just put in the new one in place of the old one and rebuild?

tower-diagnostics-20200325-1526.zip

Share this post


Link to post

Disk3 dropped offline, most likely a power/connection problem, but since it dropped there's no SMART report, you should cancel the parity check, check/replace cables and post new diags so we can check SMART.

Share this post


Link to post

Thanks for the quick response. OK, I cancelled the parity check and checked the cables, which seemed to be OK. I restarted but the disk still has an "x" next to it.

 

(Weirdly I get a notification saying the array has turned good, array has 0 disks with read errors. While I get that if the disk is disabled but the array still works, doesn't seem like the state of the array should be termed "good".)

 

If I need to restart the parity check, should I be writing corrections to parity?

 

Diagnostics are attached.

tower-diagnostics-20200325-1911.zip

Share this post


Link to post
8 minutes ago, sonofdbn said:

I restarted but the disk still has an "x" next to it.

That's expected, once a disk is disabled it needs to be rebuilt.

 

Disk looks fine, high number of CRC errors suggests a SATA cable problem, if you don't want to replace it at least swap with another disk to rule it out in case it gets disable again, then and since the emulated disk is mounting correctly you can rebuild on top.

Share this post


Link to post
Just now, johnnie.black said:

at least swap with another disk

Just a warning that this can be a little risky for the rebuild, if there are errors on another disk and you let it finish.

Share this post


Link to post

Thanks for all the help. All done and seems to be OK. The disk was connected via a forward breakout cable to a SAS HBA and I was a little worried, having no spare breakout cable. Fortunately I found the last remaining SATA port on the MB and managed to use that with a new SATA cable. I'll watch out for more CRC errors, though.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this