Data Rebuild - Parity Check


Go to solution Solved by clowncracker,

Recommended Posts

I recently had a drive fail during a parity check (was not writing corrections).  I then replaced the drive and did a data-rebuild.  Afterwards I ran Check Filesystem Status and corrected the newly rebuilt drive.  I started another Parity Check (non-correcting), but I've run into 3168 errors in the first 5 minutes.  Do I have an issue with my parity or my newly rebuilt drive?  Is there a way I can check and validate that my data isn't corrupted?

EDIT:  I also just noticed that my rebuilt drive isn't showing the correct capacity in Unraid.  I replaced a 5tb drive with a 14tb drive, but the drive is still showing as only 5tb in Unraid.

 

image.thumb.png.40a45b89aef1e27c78629c4f8787512c.png

I still have access to my old drive and a backup of my usb drive.  Should I go back and try and fix my old drive and then replace it?  What is best practice here?  Would that fix my parity problem as well?

Edited by clowncracker
Link to comment
  • clowncracker changed the title to Data Rebuild - Parity Check
9 minutes ago, trurl said:

There are 2 screenshots in your post that shows both of these. I assume one was taken before the rebuild?

 

Can you see the missing drive in BIOS?

 

Sorry, both of those screenshots are from after the rebuild.  WDC_WD140EDGZ was already rebuilt and is selectable from the dropdown as a replacement for disk 3.  I was just showing that it recognizes TOSHIBA_HDWE150 as disk 3, even though it was already rebuilt with WDC_WD140EDGZ.

Link to comment
5 minutes ago, clowncracker said:

it recognizes TOSHIBA_HDWE150 as disk 3, even though it was already rebuilt with WDC_WD140EDGZ.

That suggests it thinks the toshiba is still disk3. Did you make a flash backup after the replacement? Your array disk assignments are in config/super.dat (not a plain-text file).

Link to comment
3 minutes ago, trurl said:

That suggests it thinks the toshiba is still disk3. Did you make a flash backup after the replacement? Your array disk assignments are in config/super.dat (not a plain-text file).


I did not make a backup after the replacement.  I didn't think I would need to, since in theory once it was replaced the active config should be recognize the WD drive as disk 3.

Link to comment
6 minutes ago, trurl said:

You might check your flash drive in your PC to make sure it isn't corrupt.

Flash drive looks fine.  Just plugged it in and it looks like there are no issues.

I guess I could rebuild disk 3 again, but I am concerned with the number of parity errors it was throwing out yesterday when I ran it after the rebuild.  It also might have been throwing the parity errors because for some reason disk 3 wasn't recognized correctly after I did the Check Filesystem.  Maybe I should have done the Check Filesystem before I rebuilt disk 3?  It did show disk 3 only having 5TB after the rebuild, which is weird.  The screenshot for that is in the original post.

Edited by clowncracker
Link to comment
On 11/1/2023 at 2:39 PM, trurl said:

Emulated disk3 is mounted and 82% full. Should be OK to rebuild to the new disk.

 

No idea what happened before. Post new diagnostics without rebooting if there are problems or when rebuild completes.

Rebuild just finished.  Attached i s the new diagnostics file without rebooting (pulled it immediately after the rebuild finished).

clowncracker-diagnostics-20231102-1555.zip

Link to comment
On 11/2/2023 at 4:05 PM, trurl said:

Looks OK.

 

Make sure you make a backup of flash 

Sorry for the delay, but I've been out of town for a few days.  I just started another parity check.  I've already found 7685 errors in the past 10 minutes, so I'm concerned there is still a problem.  I decided to cancel the parity check and restart my server.  The config still looks correct, I've attached is another diagnostics log.

clowncracker-diagnostics-20231107-1044.zip

Edited by clowncracker
Link to comment

Could also be controller or a disk, if it's a disk it can be a pain to find out which one, you basically need to test without one disk at a time, try the controller first if possible, and note that memtest not finding errors does not completely rule out RAM, if you have more than one RAM stick try with one at a time, also remember that after any change you need to run two checks, since the first one can still find errors.

Link to comment
7 hours ago, JorgeB said:

Could also be controller or a disk, if it's a disk it can be a pain to find out which one, you basically need to test without one disk at a time, try the controller first if possible, and note that memtest not finding errors does not completely rule out RAM, if you have more than one RAM stick try with one at a time, also remember that after any change you need to run two checks, since the first one can still find errors.

I honestly don't think it's a hardware issue (memory, controller, cables, etc).  It might be disk related, but how would I go about testing that?  Stopping the array, disabling a disk, starting the server and just running a parity check?  If that's the case how would I actually go about fixing the issue?

Edited by clowncracker
Link to comment
19 minutes ago, JorgeB said:

 

Like this:

 

I'm going to assume the parity disks aren't the issue, since they've both been replaced in past 4 months with brand new drives.  I'm going to test disk 3 that was just replaced (which caused all of these issues to begin with), disk 4 that has sector count issues and disk 10 (which is the newest drive in the array).

So to confirm:
1) Save a backup of super.dat (I'll just use the file from the diagnostics).

2) Stop the array.

3) Tools > New config, selecting all in the Preserve current assignments section.

4) DESELECT DISK 3 - making sure Parity is Valid is UNCHECKED.

5) Start the array.

6a) Run a parity check, if there are issues I should restart the process with another drive.

6b) If there are no issues, run another parity check to make sure there are no issues.

7) Once I've identified the drive with problems, stop the array and rebuild it with a new disk.

8) Run two parity checks to make sure there are no issues.

image.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.