Need help, parity disk shows error during rebuild


Recommended Posts

Diagnostics attached (see below).

 

- Two disks (disk19, disk20) failed at once in a Dual Parity system. Both disks show lots of reallocated sectors.

- Started data rebuild of disk19 on a new disk.

- First parity disk shows 64 read errors during data rebuild. Rebuild went to completion.

- First parity disk reported one pending sector during the rebuild (it's still that value after a reboot).

197	Current pending sector --> 1

 

- First Parity disk shows the following read errors during the rebuild (several times):

Jan 10 11:21:27 Tower2 kernel: md: disk0 read error, sector=37066384
Jan 10 11:21:27 Tower2 kernel: md: recovery thread: multiple disk errors, sector=37066384

 

- Parity check history shows that entry after completion.

2019-01-11, 02:52:33	15 hr, 33 min, 33 sec	107,1 MB/s	OK	64 (this stands for the 64 read errors)

 

My questionWas that rebuild successfully or is garbage data, from that 64 read errors, copied to this rebuilded disk19? I ask because two disk were off (disk19 and disk20) and the first parity disk shows read errors during the rebuild. I mean, there was no additional disk to get "good" data from. Was everything successfully?

 

I'm currently rebuilding disk20 and will replace the first parity disk afterwards.

 

Any help is highly appreciated.

 

Many thanks in advance.

 

 

tower2-diagnostics-20190111-0650.zip

Edited by hawihoney
Link to comment

Ahh, this makes more sense than the scenario you described yesterday, the rebuilt disk19 will be mostly OK but there will be some corruption due to the read errors on parity (unless there's no data on those sectors), md: recovery thread: multiple disk errors is Unraid speak for "there are errors in more disks than current redundancy can correct, the rebuild/sync will continue but there will be some (or a lot) of corruption.", note also that the rebuild of disk 20 will also have some corruption because of the previous errors, again unless there's no data on those sectors.

Link to comment

Thanks for your answer.

 

 

To the background:

This week I bought two new barebones and did move everything old in there. Seems that I did handle 1-3 disks a little bit rough.

 

 

To the rebuild result:

Ok" means everything's ok. "Ok" does not mean there might be some corruption. The wording "ok" for the rebuild simply puzzles me. I think there might be some better explanation for the average user like me.

 

For disk20, I do have a full backup. I will copy it over once the rebuild is done. This will "repair" the state of the first parity as well. After this rebuild i will replace the first parity disk.

 

For disk19, according to your answer, I have to live with possibly up to 64 wrong sectors. This disk was full, so there might be some problems then.

 

Arrggghhhh.

 

Link to comment
1 hour ago, hawihoney said:

For disk19, according to your answer, I have to live with possibly up to 64 wrong sectors. This disk was full, so there might be some problems then.

It's till a very small number of errors considering how many sectors there are, and if the disk contains for example mostly video files there will likely be only one corrupt file, and that you likely mean a small (or large) glitch during playback, still these are the situations when it's good to have checksums of all your files (for those not using btrfs) so you could easy find out which file(s) are corrupt.

Link to comment

Sorry, but I do have a hell of time interpreting the wordings around parity checks and disk rebuilding on a dual parity system. Please have a look at these three images. I took them during and after a correctional parity check:

 

- Img1: During correctional parity check parity-disk1 and data-disk3 throw read errors. This kind of error is mentioned in this thread already.

- Img2: Status during parity check says "0 sync errors corrected".

- Img3: Result after parity check says "0 errors found".

 

I don't get that. On the same "Main" page I see "248 Errors" that are not corrected and lead to no errors found. Really?

 

If I look closely I do see 17 writes to disk17. That was an image I did copy during parity sync. If I do ignore these I see one write access to disk3 (the data-disk with the read errors). This write access is not written to parity-disk2 but written to parity-disk1.

 

So what is the result?

 

- Trust the end result: Everything is ok.

- Trust the status: 248 read errors were detected and these are left uncorrected.

- 248 read errors were detected but corrected without Unraids interaction.

- And what might be the scenario to write to parity-disk2 but not to parity-disk1?

 

Am I looking to close and interpreting to much?

 

I'm pretty sure the WD-EFRX disks are the problem. I started with few of them and not that many in a case. It is mentioned they tolerate only 1-8 disk systems. I bet it's a vibration problem that leads to the read-errors. What do you think?

 

 

img1.thumb.jpg.10eb467f6a371ff14b5af7e13a62ecd2.jpg

 

img2.jpg

img3.jpg

Edited by hawihoney
Link to comment

It will retry and sometimes succeed so the errors might have eventually resulted in correct data.

41 minutes ago, hawihoney said:

I see one write access to disk3 (the data-disk with the read errors)

After retries if it still can't read a data disk it will get the data from the parity calculation and then write it back to that data disk.

 

Maybe diagnostics would shed more light on the errors.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.