Added new drive and now HDD Parity disk has read errors


Recommended Posts

What's happened here is that odds on you slightly disturbed the cabling to the parity drive when you added in the new drive.  (Not an uncommon occurrance if you don't have hot-swap bays).

 

You need to reseat the cabling to the drives and/or motherboard and then rebuild the parity drive.  (Stop array, unassign parity drive, start the array, stop the array, reassign the parity drive, start the array)

Link to comment

I shutdown my PC. Reseated all the cables and rebooted the PC. However when booting up there was an x next to the parity drive and it said parity device is disabled. In the main dashboard there is an x under parity as well. I'm now performing a read-check and it will take approx 13hrs. Please let me know if this is the right approach. Thanks

 

Nvm. I'm an idiot. (Stop array, unassign parity drive, start the array, stop the array, reassign the parity drive, start the array) fixed the x issue. Rebuilding the parity drive again.

Edited by bobokun
Link to comment

You're having issues with multiple disks, check cables, PSU, etc, e.g.:

 

Jun 21 19:53:13 unNAS kernel: ata3: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata2: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata5: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata6: link is slow to respond, please be patient (ready=0)

 

Link to comment
22 minutes ago, bobokun said:

Do you think I might not be getting enough power?

It's enough for 5 disks, I've used similar PSUs for up to 8 disks, though it could be failing, those errors are hardware related, it may also be the controller/board, but I would say that isn't very likely.

Link to comment

In addition to the new hard drive I also added a new PCIe quad gigabit nic card. Maybe the addition of both the PCIe card and extra HDD is drawing too much power. I'll disconnect the PCIe card and see if there are any issues.

 

I bought a new PSU (Corsair TX550M 80 plus gold) and i'll report back once installed. Hopefully no more errors.

Edited by bobokun
Link to comment

Unfortunately after installing the new PSU that didn't help. I did remove all the SATA Cables and rewired everything so now it was a different drive that had the read error instead of the parity. But since the parity was still rebuilding does that mean I lost data? The drive doesn't seem to be disabled automatically from unraid unlike before when it happened to the parity drive it got disabled. Attached are new logs 

unnas-diagnostics-20180623-0342.zip

Edited by bobokun
Link to comment

Parity rebuild is completed with 17errors (From the disk2 drive). Since I'm waiting for my new replacement disk to finish preclearing should I in the mean time do a parity-check with the (Write corrections to parity) checked off in the hopes if it successfully reading the disk2 this time and fixing those 17 read errors?

Link to comment
34 minutes ago, bobokun said:

Since I'm waiting for my new replacement disk to finish preclearing should I in the mean time do a parity-check with the (Write corrections to parity) checked off in the hopes if it successfully reading the disk2 this time and fixing those 17 read errors?

It's a risk, it can have the same or less errors, but it can also have more and corrupt parity even more.

Link to comment

Haven't anyone written a little tool that just makes multiple attempts to read out the broken sectors from the data disk - and in case one of them happens to be read out ok then rewrites that sector so that the parity can be updated with corrected data for that position?

 

It isn't uncommon that multiple retries - especially at different temperatuers - can result in a problematic sector being correctly read. And on write, the drive would normally remap the bad sector if it has the sector flagged as offline-uncorrectable.

Link to comment
13 hours ago, pwm said:

Haven't anyone written a little tool that just makes multiple attempts to read out the broken sectors from the data disk - and in case one of them happens to be read out ok then rewrites that sector so that the parity can be updated with corrected data for that position?

That could be useful, if it worked well, i.e., without a chance of corrupting parity more, no idea how easy or difficult it would be to do.

 

OP, after replacing disk2 you can use ddrescue to clone it and recover as much data as possible, and in case there are still read errors you can also then know which files are affected.

Link to comment

So the parity sync just finished with 0 read errors however it came back with a 0 error in the check and had no corrections in the parity.

 

Quote

ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log


Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.

Just so I'm clear on this, in order to make sure both source and destination are not mounted do I first stop the array. Unassign disk2 from the array and the replacement disk as well (so when I start the array the missing disk is not replaced yet) before running the command ddrescue? Once the command finishes executing then assign the replacement disk back to the array (filling the missing disk2 in the array) and start up the array again.

Link to comment
4 hours ago, johnnie.black said:

That could be useful, if it worked well, i.e., without a chance of corrupting parity more, no idea how easy or difficult it would be to do.

 

I have done it on two disks that got uncorrectable sectors - I managed to recover a few sectors that could then repair the unrecoverable sector state.

 

I have also done it the other way - identifying what files that used the specific sectors and extracting correct data from backup files and rewritten uncorrectable sectors.

 

In both cases it was with disks that got a number of uncorrectable sectors quite early in life - and after rewrite the disks continued to work for several years without producing more errors.

 

Snapraid also have a repair feature when you have additional errors outside of what parity can repair - since Snapraid makes use of hashes for all files, it's possible to improve the recoverability if too many disks failed by pointing Snapraid at one or more alternative data sources that happens to contain backup files with the correct hash. Bit this is something that is hard to move to a normal RAID - Snapraid doesn't do disk-level block parity but instead file-level block parity. So Snapraid already have a table of what sections of files on the different disks that are forming one RAID slice when computing the parity.

 

Anyway - additional checksums are always good to have.

Link to comment

Thanks everyone for all the help. I've completed upgrading the parity to a larger drive and replacing the disk2 (2TB faulty disk) with my old parity 6TB. 

I want to change the disk to match with the physical location of the disks which would mean changing the order they appear in the array. Is there an easy way to change the order of the disks without losing any data? Do I need to create a new config for this?

Link to comment
10 hours ago, bobokun said:

Is there an easy way to change the order of the disks without losing any data? Do I need to create a new config for this?

With single parity you can do a new config, rearrange the data disks as you like and check "parity is already valid" before starting the array.

Link to comment
On 6/26/2018 at 2:44 AM, johnnie.black said:

With single parity you can do a new config, rearrange the data disks as you like and check "parity is already valid" before starting the array.

Thanks, will I have to remake my shares when doing this or will everything stay the same? 

 

Also I've finished 2 cycles of preclear on disk2 (2TB) drive that had read errors and seems like while preclearing the smart attributes returned to normal and no errors or warnings on the smart status of the drive. Is it safe now to add it back to the array as another disk or would it be best to just leave it out.

Link to comment
6 minutes ago, bobokun said:

Thanks, will I have to remake my shares when doing this or will everything stay the same?

Shares will remain, but if you're using include/exclude disks you'll need to reset those.

 

7 minutes ago, bobokun said:

Is it safe now to add it back to the array as another disk or would it be best to just leave it out.

Once a disk fails it's much more likely to fail again in the near future, but it's hard to predict if that one will or not, it's up to you and the level of risk you're comfortable with.

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.