Disk error during parity check


Recommended Posts

Hello. I was half way into a parity check on my HOME server (specs in sig below) when Disk 1 went into an error state and was disabled. Since I can't see any SMART info with the drive disabled I don't know if there's just a problem with the disk itself or if it maybe the SATA or power cable just came loose. In any case, I do have a replacement disk of the same size handy if I need it and I'm wondering what the best course of action is now? I assume it's:

 

1) cancel the parity check (now paused)

2) power down and check Disk 1 connections

3) power back and up and see if Disk 1 is still disabled/missing or showing SMART errors

4) If no, start the array and a new parity check

5) if yes, power down and install replacement Disk 1, start the array and a new parity check / data rebuild

 

Yes? Anything I'm missing? Diagnostics attached. Thanks.

home-diagnostics-20191213-1923.zip

Link to comment
6 hours ago, johnnie.black said:

BTW, parity checks should be non correct unless sync errors are expect, or there's a risk of corrupting parity if a disk fails, though that didn't happen in this case.

Great tip. Thanks Johnnie. Alright I canceled the paused parity check and rebooted. There's still a red X by Disk 1 indicating the drive is disabled and contents emulated, however now I can see SMART info for the disk with no errors highlighted and Main says configuration is valid. I didn't power down and check the physical connection to the drive also because I'm doing this remotely and I'm not sure what to expect now if I start the array from here. Is the red X going to go away or is it going to show the array as unprotected and prompt me about replacing the missing disk as soon as possible? Because with the red X still there I wasn't expecting to see configuration valid and green balls everywhere else? Anyway, new diagnostics attached and thanks again for your help.

home-diagnostics-20191214-1335.zip

Edited by ElJimador
Link to comment
6 hours ago, johnnie.black said:

BTW, parity checks should be non correct unless sync errors are expect, or there's a risk of corrupting parity if a disk fails, though that didn't happen in this case.

hijack: so you're telling me i should have my monthly scheduled parity check be a "NO" for Write Corrections?

Is that suggestion listed in a Best Practices someplace?  I can understand the logic though, based on your response.

Link to comment
17 hours ago, ElJimador said:

There's still a red X by Disk 1 indicating the drive is disabled and contents emulated

That's expected, once a disk gets disabled it need to be rebuilt, to the same or to a new one.

 

17 hours ago, ElJimador said:

however now I can see SMART info for the disk with no errors highlighted

No highlighted issues but it was a disk problem, you can see the UNC @ LBA error (read error) that happened:

Error 2 [1] occurred at disk power-on lifetime: 37789 hours (1574 days + 13 hours)

A little over a day ago, you should run an extended SMART test, these type of error can be intermittent, or even happen once or twice and then disk be fine for a long time, though they are never a good sign.

 

 

You also want to keep an eye on this SMART attribute:

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   197   051    -    54

If it keeps increasing there will likely be more errors soon.

Link to comment
17 hours ago, sota said:

so you're telling me i should have my monthly scheduled parity check be a "NO" for Write Corrections?

Yes, parity check should always be non correct unless sync errors are expected, like after an unclean shutdown, though not common parity can get wrongly updated (and corrupted) if a disk fails during a correcting check.

Link to comment
5 hours ago, johnnie.black said:

That's expected, once a disk gets disabled it need to be rebuilt, to the same or to a new one.

 

No highlighted issues but it was a disk problem, you can see the UNC @ LBA error (read error) that happened:


Error 2 [1] occurred at disk power-on lifetime: 37789 hours (1574 days + 13 hours)

A little over a day ago, you should run an extended SMART test, these type of error can be intermittent, or even happen once or twice and then disk be fine for a long time, though they are never a good sign.

 

 

You also want to keep an eye on this SMART attribute:


ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   197   051    -    54

If it keeps increasing there will likely be more errors soon.

Thanks Johnnie. I think since I have a replacement disk handy I'll just go ahead and swap it out now. 

Link to comment
On 12/15/2019 at 2:51 AM, johnnie.black said:

Yes, parity check should always be non correct unless sync errors are expected, like after an unclean shutdown, though not common parity can get wrongly updated (and corrupted) if a disk fails during a correcting check.

Did not know that. already changed.  thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.