problem installing 6TB parity drive, not sure how to proceed


Recommended Posts

Hi.

 

I replaced a 3TB parity drive with a new 6TB WD red. After everything was done, I ran a parity check. Sometime between starting it in the morning and coming home from work in the evening, a few drives went offline and the parity check logged about 2 million sync errors per drive.

 

The process I followed was:

- ran a parity check with old 3TB parity drive

- pre-cleared the new 6TB drive

- replaced 3TB parity drive with 6TB drive

- rebuilt parity on 6TB drive

- ran a parity check with 6TB drive. This is where the drives went offline.

 

I checked all connections and restarted the system. All drives and controllers seem okay, and the array comes on line okay with all green balls.

 

Then I started a read-only parity check. About 20 minutes in, sync errors started to show up.

 

I see two options after I check all the hardware.

 

Option 1. Assume the parity is bad on the 6TB drive. Run a parity check to update the 6TB drive again.

Option 2. Put the original 3TB parity drive back in. Run a read-only parity check and if all is well, pre-clear the 6TB drive and start over. If there are problems with the 3TB read-only parity check, some of data drives have been corrupted. Find a new course of action.

 

Option 1 will be much faster. It took 60 hours to pre-clear the 6TB drive with speeds at 100-140Mb/s. But it doesn't tell me the state of the other drives in the array.

 

Thanks for any advice,

- Eric

Link to comment

What controller card are you using?  I had a similar issue with my Supermicro AOC-SAS2LP-MV8.  Spurious reports on the web about when you're really taxing them that errors can happen.  In my case, I found that if I set my tunables via unraid-tunables-tester.sh to the max I would get the errors during parity checks.  If I changed it to best bang for the buck, the errors all disappeared.

Link to comment

It doesn't seem like there are any real options.

 

If I put the old 3TB parity drive back in, the array will think it's a new drive and will want to initialize it, since the array was last running with the 6TB drive for partiy.

 

So if that's true, and the 3TB drive with valid parity is useless now to check the array, I'll have to rebuild parity on the 6TB drive and hope there's no corruption on any other drive.

 

Link to comment

It doesn't seem like there are any real options.

 

If I put the old 3TB parity drive back in, the array will think it's a new drive and will want to initialize it, since the array was last running with the 6TB drive for partiy.

 

So if that's true, and the 3TB drive with valid parity is useless now to check the array, I'll have to rebuild parity on the 6TB drive and hope there's no corruption on any other drive.

 

 

Reset the config using the New Config Utility. Select the desired drives, including the 3T parity, and check the box that indicates parity is good.

Link to comment

That's fantastic. Thanks!

 

I didn't know you could keep parity with new config/initconfig. I'll get the array back up this way and then do a parity check to see if anything has gone wrong on the data drives.

 

Is there a way to identify which files may be affected if there are sync errors found? Since the data and parity are striped across all drives, I wouldn't think the disk with a bad file could be identified.

Link to comment

Is there a way to identify which files may be affected if there are sync errors found? Since the data and parity are striped across all drives, I wouldn't think the disk with a bad file could be identified.

I do not believe that you can.  You only know that there is a problem with a particular sector on one of the drives.  Even if you know the drive there is no easy (i.e. realistic) way to convert a sector to the file that contains the sector.

Link to comment

Since the data and parity are striped across all drives, I wouldn't think the disk with a bad file could be identified.

 

The data is NOT striped across all the drives -- it's only on the drive where you wrote the particular file.    Parity is simply computed across all drives, so when you encounter a sync error that simply means that the current parity bit doesn't match what it should be.  UnRAID always assumes the error is in the parity bit itself, since that's by far the most likely.  It would, with the right utility, be possible to identify the SET of files that might be involved ... by identifying the file on every data disk that includes the bit where the error was - but to my knowledge there are no utilities available to do that.  It would indeed be handy, however, as you could then just check those specific files (either by verifying checksums or by comparing them to your backups) instead of having to do that check for the entire array to confirm a sync error didn't result from file corruption.

 

Since all the drives show good status, I'd simply recomputed parity (or just run a correcting check, which will effectively do the same thing) ... then run another parity check, which should be error-free.

 

 

Link to comment

The new config operation and read-only parity check with the old 3TB parity drive are complete.

 

The parity check immediately showed 1867 sync errors (Main | Array Operations page), then completed a half day later with no more sync errors. There were zero disk errors (Main | Array Devices page). syslog showed about 40 parity errors, one in sector 128 and all the rest in consecutive sectors (counting by 8) starting at sector 12584. All the top level directories look like they should and the drives look as full as they should.

 

So it looks like my data is intact, with possibly a bit of loss or a bit of bad parity.

 

Next is to run smart tests and maybe reiserfsckd. Then I'll start over, pre-clear and install the new 6TB parity drive.

 

Thanks for all the hand holding.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.