Added new drive and now HDD Parity disk has read errors

bobokun · June 21, 2018

I added a new hard drive (planning on running preclear on it before replacing my parity drive) however when I started up the array I received an array has 1 disk with read errors and that disk was the parity drive. So unraid disabled the parity drive. I'm not sure what I should do next.

unnas-diagnostics-20180621-1903.zip

Squid · June 21, 2018

What's happened here is that odds on you slightly disturbed the cabling to the parity drive when you added in the new drive. (Not an uncommon occurrance if you don't have hot-swap bays).

You need to reseat the cabling to the drives and/or motherboard and then rebuild the parity drive. (Stop array, unassign parity drive, start the array, stop the array, reassign the parity drive, start the array)

bobokun · June 21, 2018

I shutdown my PC. Reseated all the cables and rebooted the PC. However when booting up there was an x next to the parity drive and it said parity device is disabled. In the main dashboard there is an x under parity as well. I'm now performing a read-check and it will take approx 13hrs. Please let me know if this is the right approach. Thanks

Nvm. I'm an idiot. (Stop array, unassign parity drive, start the array, stop the array, reassign the parity drive, start the array) fixed the x issue. Rebuilding the parity drive again.

Edited June 22, 2018 by bobokun

bobokun · June 22, 2018

Unfortunately while rebuilding the parity last night it encountered the same read error :( I've posted a new diagnostic.

unnas-diagnostics-20180622-0639.zip

JorgeB · June 22, 2018

You're having issues with multiple disks, check cables, PSU, etc, e.g.:

Jun 21 19:53:13 unNAS kernel: ata3: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata2: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata5: link is slow to respond, please be patient (ready=0)
Jun 21 19:53:13 unNAS kernel: ata6: link is slow to respond, please be patient (ready=0)

bobokun · June 22, 2018

I've checked all the SATA cables and power cables they seem very firmly connected. I do have a Corsair CX430 PSU. Do you think I might not be getting enough power?

JorgeB · June 22, 2018

22 minutes ago, bobokun said:

Do you think I might not be getting enough power?

It's enough for 5 disks, I've used similar PSUs for up to 8 disks, though it could be failing, those errors are hardware related, it may also be the controller/board, but I would say that isn't very likely.

bobokun · June 22, 2018

In addition to the new hard drive I also added a new PCIe quad gigabit nic card. Maybe the addition of both the PCIe card and extra HDD is drawing too much power. I'll disconnect the PCIe card and see if there are any issues.

I bought a new PSU (Corsair TX550M 80 plus gold) and i'll report back once installed. Hopefully no more errors.

Edited June 22, 2018 by bobokun

bobokun · June 23, 2018

Unfortunately after installing the new PSU that didn't help. I did remove all the SATA Cables and rewired everything so now it was a different drive that had the read error instead of the parity. But since the parity was still rebuilding does that mean I lost data? The drive doesn't seem to be disabled automatically from unraid unlike before when it happened to the parity drive it got disabled. Attached are new logs

unnas-diagnostics-20180623-0342.zip

Edited June 23, 2018 by bobokun

JorgeB · June 23, 2018

No more ATA errors, so that appears to be solved or at least OK for now, read errors on disk2 are because it's failing, let the parity sync finish then replace disk2, though there will likely be some data corruption due to disk2 read errors.

bobokun · June 23, 2018

Parity rebuild is completed with 17errors (From the disk2 drive). Since I'm waiting for my new replacement disk to finish preclearing should I in the mean time do a parity-check with the (Write corrections to parity) checked off in the hopes if it successfully reading the disk2 this time and fixing those 17 read errors?

JorgeB · June 23, 2018

34 minutes ago, bobokun said:

Since I'm waiting for my new replacement disk to finish preclearing should I in the mean time do a parity-check with the (Write corrections to parity) checked off in the hopes if it successfully reading the disk2 this time and fixing those 17 read errors?

It's a risk, it can have the same or less errors, but it can also have more and corrupt parity even more.

pwm · June 23, 2018

Haven't anyone written a little tool that just makes multiple attempts to read out the broken sectors from the data disk - and in case one of them happens to be read out ok then rewrites that sector so that the parity can be updated with corrected data for that position?

It isn't uncommon that multiple retries - especially at different temperatuers - can result in a problematic sector being correctly read. And on write, the drive would normally remap the bad sector if it has the sector flagged as offline-uncorrectable.

JorgeB · June 24, 2018

13 hours ago, pwm said:

Haven't anyone written a little tool that just makes multiple attempts to read out the broken sectors from the data disk - and in case one of them happens to be read out ok then rewrites that sector so that the parity can be updated with corrected data for that position?

That could be useful, if it worked well, i.e., without a chance of corrupting parity more, no idea how easy or difficult it would be to do.

OP, after replacing disk2 you can use ddrescue to clone it and recover as much data as possible, and in case there are still read errors you can also then know which files are affected.

bobokun · June 24, 2018

So the parity sync just finished with 0 read errors however it came back with a 0 error in the check and had no corrections in the parity.

Quote

ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log

Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.

Just so I'm clear on this, in order to make sure both source and destination are not mounted do I first stop the array. Unassign disk2 from the array and the replacement disk as well (so when I start the array the missing disk is not replaced yet) before running the command ddrescue? Once the command finishes executing then assign the replacement disk back to the array (filling the missing disk2 in the array) and start up the array again.

JorgeB · June 24, 2018

Ddrescue can be used after replacing disk2, though is the parity check completed without errors there's no need, just do a standard replacement

pwm · June 24, 2018

4 hours ago, johnnie.black said:

That could be useful, if it worked well, i.e., without a chance of corrupting parity more, no idea how easy or difficult it would be to do.

I have done it on two disks that got uncorrectable sectors - I managed to recover a few sectors that could then repair the unrecoverable sector state.

I have also done it the other way - identifying what files that used the specific sectors and extracting correct data from backup files and rewritten uncorrectable sectors.

In both cases it was with disks that got a number of uncorrectable sectors quite early in life - and after rewrite the disks continued to work for several years without producing more errors.

Snapraid also have a repair feature when you have additional errors outside of what parity can repair - since Snapraid makes use of hashes for all files, it's possible to improve the recoverability if too many disks failed by pointing Snapraid at one or more alternative data sources that happens to contain backup files with the correct hash. Bit this is something that is hard to move to a normal RAID - Snapraid doesn't do disk-level block parity but instead file-level block parity. So Snapraid already have a table of what sections of files on the different disks that are forming one RAID slice when computing the parity.

Anyway - additional checksums are always good to have.

bobokun · June 25, 2018

Thanks everyone for all the help. I've completed upgrading the parity to a larger drive and replacing the disk2 (2TB faulty disk) with my old parity 6TB.

I want to change the disk to match with the physical location of the disks which would mean changing the order they appear in the array. Is there an easy way to change the order of the disks without losing any data? Do I need to create a new config for this?

JorgeB · June 26, 2018

10 hours ago, bobokun said:

Is there an easy way to change the order of the disks without losing any data? Do I need to create a new config for this?

With single parity you can do a new config, rearrange the data disks as you like and check "parity is already valid" before starting the array.

bobokun · June 27, 2018

On 6/26/2018 at 2:44 AM, johnnie.black said:

With single parity you can do a new config, rearrange the data disks as you like and check "parity is already valid" before starting the array.

Thanks, will I have to remake my shares when doing this or will everything stay the same?

Also I've finished 2 cycles of preclear on disk2 (2TB) drive that had read errors and seems like while preclearing the smart attributes returned to normal and no errors or warnings on the smart status of the drive. Is it safe now to add it back to the array as another disk or would it be best to just leave it out.

JorgeB · June 27, 2018

6 minutes ago, bobokun said:

Thanks, will I have to remake my shares when doing this or will everything stay the same?

Shares will remain, but if you're using include/exclude disks you'll need to reset those.

7 minutes ago, bobokun said:

Is it safe now to add it back to the array as another disk or would it be best to just leave it out.

Once a disk fails it's much more likely to fail again in the near future, but it's hard to predict if that one will or not, it's up to you and the level of risk you're comfortable with.

Added new drive and now HDD Parity disk has read errors

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation