Parity errors - how to proceed


Recommended Posts

Hi,

 

I have been using my Unraid server for a couple of years now. Last night during a monthly parity check some errors occurred while reading from one of the data drives (main page shows 917 errors for that drive). The system log shows those read errors as well:

 

Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965144
Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965152
Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965160
Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965168
Oct 1 04:50:26 Spire kernel: md: disk6 read error, sector=3689965176

 

The parity check that is run monthly by the scheduler is a correcting parity check. When finished it stated that 2 parity errors were corrected. I assume that parity on this 2 positions is now wrong since the data from the data drive could not be read, I am not sure though.

 

Is there any way to determine which files are actually affected by this read operation?

 

What is the best way to proceed now? I normally would have assumed that I replace the defective drive with a new one and let it rebuild parity, but since parity was corrected (possibly with wrong values) I am not so sure anymore. Should I stop the mover schedule for now to prevent further writing to the faulty drive? Any other recommendations?

Link to comment

First thing, get a Diagnostics file  (   Tools   >>>   Diagnostics   ) and upload it to a NEW post.   This gives the Guru's some real meat to work on.  Second thing, I, personally, would not be writing any new data to the array until I could get this resolved.    (Your data is 'safe' on the cache drive at this point.)

 

Question:  Did the parity check actually complete or was it aborted? 

Link to comment

The parity check completed (stating that it corrected 2 parity errors), hence my thought that it might have read faulty from the failing drive and written wrong parity in those 2 places.

 

It is a correcting parity check that runs on the first of every month.

 

Since I won't be home the next days I am thinking about shutting down the server until I have a plan on how to proceed.

Link to comment
On 10/1/2019 at 6:26 PM, taalas said:

 

The parity check that is run monthly by the scheduler is a correcting parity check.

Change that, it should be non correct.

 

On 10/1/2019 at 6:26 PM, taalas said:

Is there any way to determine which files are actually affected by this read operation?

Not easily without pre existing checksums or if using btrfs, you can just rebuild the disk and then use ddrescue on the old one, it will identify the affected files and then you can replace them form a backup if available.

Link to comment

Thanks!

 

So if I replace the drive and let it rebuild from parity, what is the supposed damage if the last (correcting) parity check detected 2 parity errors (and possibly made 2 faulty corrections on the parity disk). 2 files, 2 sectors? The log showed a lot more read errors (917) but the parity check ended with 2 errors.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.