Data rebuild with errors?

CyberMew · September 15, 2019

Hi, some backstory. I added a new disk8 and was clearing it when the connection on my disk1 died, so I didn't format disk8 and wanted to shutdown the server to fix disk1 connection.

After I managed to fixed it (apparently the cable died for some reason), it prompted for a data rebuild (even though i dont think it was required).

Now, while it went through the parity check/data rebuild process, it seems like several health issues on various drives popped up. Disk 2 had a

Current pending sector count of 1 but 0 reallocated event count (it didn't grow further though). Disk 7 also had its udma crc error count grow from 4 to 10 (I am going to try to replace this sata cable with a new one).

The data rebuild was complete but with 54 errors! Does it mean some parts of the data was not restored? Or does it mean the parity drive was updated?

I'm a bit worried here.. usually it's 0 errors.

Attached logs.

tower-diagnostics-20190915-1235.zip

tower-syslog-20190915-1251.zip

Edited September 15, 2019 by CyberMew
add syslog

JorgeB · September 15, 2019

Unless you had dual parity, if there are errors on another disk during a rebuild there will be some corruption, unless you were lucky and there wasn't any data on those sectors.

CyberMew · September 15, 2019

I see... yea I didn’t have dual parity because I remember reading some limitations or some issues with it early on. Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future?

I’m now doing a parity check (without writing corrections) and see what happens. Currently 14% in with no sync errors detected. Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly? I'm afraid because in the same session I added in a new drive, and it sort of screwed up.

To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free?

What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this?

Edited September 15, 2019 by CyberMew

JorgeB · September 15, 2019

13 minutes ago, CyberMew said:

I remember reading some limitations or some issues with it early on.

Don't remember that.

13 minutes ago, CyberMew said:

Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future?

Problem free yes, always was AFAIK, suggested depends most on array size, but I say it's a very small price to pay for the added redundancy even for smaller arrays, and yes it would save you from a situation like this in the future.

16 minutes ago, CyberMew said:

Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly?

You can try rebuilding again, but disk2 appears to be failing so likely it will have the same or even more errors.

17 minutes ago, CyberMew said:

To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free?

You don't want to run a correcting check with a known bad disk, can corrupt parity.

18 minutes ago, CyberMew said:

What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this?

Replace disk2, only way to found out which files are affected on disk1 would require you had created cheksums before this, or be using btrfs.

CyberMew · September 15, 2019

Unfortunately I am using XFS and not BTRFS.. well I guess my disk1 data is now set in stone... I assume 54 errors means 54 sectors not filling up/corrupted, and assuming 4k aligned so that's 216KB of non/continuous of data lost?

as i am still running parity check (non-correcting), i should not be seeing any errors since disk1 is already set in stone, so i can cancel it now?

i will:

1. replace sata cable/controller for disk7.

2. order 2x 10tb drives - 1 to add as second parity, another to replace disk2. are there any instructions on how best to do this together?

actually, since my brand new disk8 is still unformatted and unused, can i use convert it to the second parity drive for now? or would you recommend me to replace disk2 first?

in the meantime do you think my array safe to use normally (if no more critical errors appear)?

thank you very much for your help.

JorgeB · September 15, 2019

They were on continuous sectors, and each sector in this case is 512 bytes, so just a little data, if you have mostly media files it will most likely translate to a little glitch during playback on a single file.

I would first replace disk2 then add dual parity later, you can't do it at the same time.

CyberMew · September 15, 2019

Got it. I hope it's really part of some media files, 29 KB will be insignificant if so.

Is there a set of instructions that I can refer for replacing disk2? It doesn't seem straightforward. I need to remove Disk8 completely (update parity drive?), then remove disk2, and put the 10tb disk (from disk8) in its place (disk2 slot). Is this correct, do I need to create a new config? 😱

JorgeB · September 16, 2019

Removing disk8 with a known bad disk is not a good option, you should use a new disk to replace disk2.

CyberMew · September 16, 2019

Ok. I’ll proceed to format my disk8 then and make it usable. In the meantime I have ordered a new 10tb to replace disk2. Hopefully the next rebuild on disk2 will be error free!

by the way my parity check completed without issues. So I guess that’s good(?). Will update back here if things go wrong. Thanks a lot for your help!

JorgeB · September 16, 2019

8 minutes ago, CyberMew said:

by the way my parity check completed without issues.

Without either read errors on disk2 or sync errors? One or the other would be expected.

CyberMew · September 16, 2019

Yes without any read errors on disk2 or sync errors. I’ll generate a diagnostics for reference in a moment. Reaching home soon.

JorgeB · September 16, 2019

I guess that could happen if there was no data on the rebuilt disk sectors where there were read errors before, that's the only way it would make sense to me.

Edited September 16, 2019 by johnnie.black

CyberMew · September 16, 2019

🤷‍♂️or maybe it could be a bug somewhere..

Anyway I have attached fresh diagnostics since then, could be useful.

tower-diagnostics-20190917-0046.zip

JorgeB · September 16, 2019

20 minutes ago, CyberMew said:

or maybe it could be a bug somewhere..

Don't think so, only explanation that makes sense is there wasn't any data on the sectors Unraid couldn't read, I would expect it would write zeros to those on the rebuilt disk, and if there were no sync errors now it means those sectors were already all zeros before, i.e., no data.

CyberMew · September 18, 2019

In other words, it’s possible I got lucky? 😀

CyberMew · September 29, 2019

I got a new drive recently and was trying to replace disk2, but near the middle it seems that my usb drive disconnected/died. Is the restoration process still going on or should I just shut it down and check/replace the usb drive?

syslog

CyberMew · September 29, 2019

First time the usb drive disconnected. Ended up rebooting the server, and all seems ok again. Disk2 is rebuilding drive from 0%.

Attached rebooted diagnostics.

tower-diagnostics-20190930-0141.zip

JorgeB · October 5, 2019

You should avoid USB drives on the array.

CyberMew · October 5, 2019

The USB drive is my flash/boot drive, I hope it’s not dying. Any way to check?

Ever since I replaced the data cables and replaced disk2 with a bigger drive, the array has been working fine. I will order a second drive as parity once the drives are on sale. But before that I need to clear my disk2 first before I dump it away for recycling. Is it possible to clear the disk using unraid via a usb3 cable or just direct cable, without adding it onto the array?

JorgeB · October 5, 2019

17 minutes ago, CyberMew said:

Is it possible to clear the disk using unraid via a usb3 cable or just direct cable, without adding it onto the array?

Yes, use preclear script, plugin or docker

mgutt · October 22, 2020

On 9/15/2019 at 12:54 PM, JorgeB said:

They were on continuous sectors,

In which of the diagnostic files can I find this information?

Edited October 22, 2020 by mgutt

JorgeB · October 22, 2020

42 minutes ago, mgutt said:

In which of the diagnostic files can I find this information?

In the syslog, though I should have said continuous blocks, parity is checked on a standard 4k Linux block, each block has 8 sectors (for standard 512E drives), so when the errors error are logged for every 8th sector, they are on continuous blocks.

Data rebuild with errors?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation