Data rebuild with errors?


CyberMew

Recommended Posts

Hi, some backstory. I added a new disk8 and was clearing it when the connection on my disk1 died, so I didn't format disk8 and wanted to shutdown the server to fix disk1 connection.

 

After I managed to fixed it (apparently the cable died for some reason), it prompted for a data rebuild (even though i dont think it was required).

 

Now, while it went through the parity check/data rebuild process, it seems like several health issues on various drives popped up. Disk 2 had a 

Current pending sector count of 1 but 0 reallocated event count (it didn't grow further though). Disk 7 also had its udma crc error count grow from 4 to 10 (I am going to try to replace this sata cable with a new one).

 

 

The data rebuild was complete but with 54 errors! Does it mean some parts of the data was not restored? Or does it mean the parity drive was updated?

I'm a bit worried here.. usually it's 0 errors.

 

Attached logs.

tower-diagnostics-20190915-1235.zip

tower-syslog-20190915-1251.zip

Edited by CyberMew
add syslog
Link to comment

I see... yea I didn’t have dual parity because I remember reading some limitations or some issues with it early on. Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future?

 

I’m now doing a parity check (without writing corrections) and see what happens. Currently 14% in with no sync errors detected. Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly? I'm afraid because in the same session I added in a new drive, and it sort of screwed up. 

 

To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free?

 

What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this?

Edited by CyberMew
Link to comment
13 minutes ago, CyberMew said:

I remember reading some limitations or some issues with it early on.

Don't remember that.

 

13 minutes ago, CyberMew said:

Is it highly suggested and problem free to add a second parity drive nowadays? Will it prevent these kind of rebuild errors in the future?

Problem free yes, always was AFAIK, suggested depends most on array size, but I say it's a very small price to pay for the added redundancy even for smaller arrays, and yes it would save you from a situation like this in the future.

 

16 minutes ago, CyberMew said:

Is it possible to check Disk1 again to make sure the the parity drive writes to it correctly?

You can try rebuilding again, but disk2 appears to be failing so likely it will have the same or even more errors.

 

17 minutes ago, CyberMew said:

To confirm, a correcting parity check does nothing now if the data on disk1 is already not 100% error free?

You don't want to run a correcting check with a known bad disk, can corrupt parity.

 

18 minutes ago, CyberMew said:

What are my moves here? Possible to recommend me what to do next? Is it also possible to find out what files were affected by this?

Replace disk2, only way to found out which files are affected on disk1 would require you had created cheksums before this, or be using btrfs.

 

Link to comment

Unfortunately I am using XFS and not BTRFS.. well I guess my disk1 data is now set in stone... :( I assume 54 errors means 54 sectors not filling up/corrupted, and assuming 4k aligned so that's 216KB of non/continuous of data lost?

 

as i am still running parity check (non-correcting), i should not be seeing any errors since disk1 is already set in stone, so i can cancel it now?

 

i will:

1. replace sata cable/controller for disk7.

2. order 2x 10tb drives - 1 to add as second parity, another to replace disk2. are there any instructions on how best to do this together? 

 

actually, since my brand new disk8 is still unformatted and unused, can i use convert it to the second parity drive for now? or would you recommend me to replace disk2 first? 

 

in the meantime do you think my array safe to use normally (if no more critical errors appear)?

 

thank you very much for your help.

Link to comment

Got it. I hope it's really part of some media files, 29 KB will be insignificant if so.

 

Is there a set of instructions that I can refer for replacing disk2? It doesn't seem straightforward. I need to remove Disk8 completely (update parity drive?), then remove disk2, and put the 10tb disk (from disk8) in its place (disk2 slot). Is this correct, do I need to create a new config? 😱

Link to comment

Ok. I’ll proceed to format my disk8 then and make it usable. In the meantime I have ordered a new 10tb to replace disk2. Hopefully the next rebuild on disk2 will be error free!

 

by the way my parity check completed without issues. So I guess that’s good(?). Will update back here if things go wrong. Thanks a lot for your help!

Link to comment
20 minutes ago, CyberMew said:

or maybe it could be a bug somewhere..

Don't think so, only explanation that makes sense is there wasn't any data on the sectors Unraid couldn't read, I would expect it would write zeros to those on the rebuilt disk, and if there were no sync errors now it means those sectors were already all zeros before, i.e., no data.

Link to comment
  • 2 weeks later...

The USB drive is my flash/boot drive, I hope it’s not dying. Any way to check?

 

Ever since I replaced the data cables and replaced disk2 with a bigger drive, the array has been working fine. I will order a second drive as parity once the drives are on sale. But before that I need to clear my disk2 first before I dump it away for recycling. Is it possible to clear the disk using unraid via a usb3 cable or just direct cable, without adding it onto the array? 

Link to comment
  • 1 year later...
42 minutes ago, mgutt said:

In which of the diagnostic files can I find this information?

In the syslog, though I should have said continuous blocks, parity is checked on a standard 4k Linux block, each block has 8 sectors (for standard 512E drives), so when the errors error are logged for every 8th sector, they are on continuous blocks.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.