Disk with errors (but green) during parity rebuild


Recommended Posts

  • Replies 92
  • Created
  • Last Reply

Top Posters In This Topic

was just thinking that i could also do it the other way round. i can start to reconfig and then put the bad-sector-disk as UD and move it back. better to do the “move” via console or through simple copy &paste within a win10 vom (from mounted disk to mounted disk, so i can “see” easier what’s going on?


Sent from my iPhone using Tapatalk

Link to comment
22 minutes ago, steve1977 said:

was just thinking that i could also do it the other way round. i can start to reconfig and then put the bad-sector-disk as UD and move it back

 

Yes you can.

 

22 minutes ago, steve1977 said:

better to do the “move” via console or through simple copy &paste within a win10 vom (from mounted disk to mounted disk, so i can “see” easier what’s going on?

 

Console would be faster, you can use rsync or midnight commander to get more feedback than mv.

Link to comment

Note that when you decided to rebuild, you destroyed the parity redundancy that could have been used to recompute the contents of the one (or potentially more) invalid sectors of the problematic disk.

 

The goal with one or more parity disks is that unRAID can use the parity to compute the content of a lost disk or of an unreadable sector on a disk. That's why it's basically only after a crash (or after replacing a broken parity disk) that the parity should be rebuilt. After a crash, there may be a number of discrepancies between data disks and the parity since since pending writes need not have had time to flush to all disks. That's also why professional RAID controller cards have a battery backup so they can replay the last writes when the machine is powered up again.

 

When doing rebuilds of the parity, unRAID can not know if a difference in the parity is caused by incorrect data on the parity disk or on one or more of the data disks - so unRAID assumes that the content of the data disks are correct and then adjusts the parity accordingly.

 

That's why it's good to use file systems that makes use of checksumming of the stored data, while letting unRAID just scan the parity. With checksummed file systems, it's possible to know if a mismatch of parity is caused by incorrect data on a specific disk since the file system on that disk will complain about incorrect checksum. So the parity will not accept and permanent silent errors on a disk (i.e. an error where you might have had a transfer error so the disk read says "ok" but the content is wrong).

 

In this case, it wasn't a silent error since the drive itself complained about an integrity error that couldn't be fixed by ECC. In that situation, the contents of the parity drive is extremely important as the purpose with the parity data is to be able to recreate the data on the damaged sector. Rebuilding the parity instantly kills this capability since rebuiding parity tries to perform corrective actions on the already correct parity instead of doing corrective actions on the drive that have issues.

 

I sometimes wish unRAID wasn't so very happy to present a "Check" button with a the checkbox "Write corrections to parity" default checked. Writing "corrections" to parity is among the last things to consider.

Link to comment

The reason why I had to rebuild the parity in the very first case was that I am upgrading my server to 10TB disks. So, I had to swap out the 6TB parity wih a 10TB parity. I don't think there was another way. I was just unfortunate that the bad sectors occured at the same time that I was rebuilding the parity. From what I understand in this thread, this caused my parity disk to be unusable.

 

I have now removed the disk with the bad sectors and am rebuilding my array. Once this is completed, I will try to copy over the content of the disk and let's see. From what I read here, rsync will be the best way to go and it will only copy the files that are in non-corrupt sectors. For others, it will give me an error message.

 

I assume it is better to wait until my parity is rebuilt to move the old files over. From previous experience, building parity and copying large amount of data makes things very slow. So, let me do one step at a time.

 

Does this now all makes sense? Wish me luck!

Link to comment
I assume it is better to wait until my parity is rebuilt to move the old files over.

  Yes.

 

 

The reason why I had to rebuild the parity in the very first case was that I am upgrading my server to 10TB disks. So, I had to swap out the 6TB parity wih a 10TB parity. I don't think there was another way. I was just unfortunate that the bad sectors occured at the same time that I was rebuilding the parity. From what I understand in this thread, this caused my parity disk to be unusable.

  What you can do for the future, and if still using single parity, is to keep the old parity disk untouched and leave the array data unchanged during the sync (this includes stopping dockers/VMs if they use the array), if something goes wrong during the sync you can do a new config and use the old parity to rebuild a disk.

 

Good luck!

 

 

Link to comment

Before swapping disks, it's good to do a parity check (without "Write corrections to parity" checked) just to verify the state of all drives and any parity mismatch.

 

Using dd and reading out the specific sector (as specified in the SMART log) from all data drives and from the old parity drive allows a manual computation of the expected content of the damaged sector.

Link to comment
1 hour ago, johnnie.black said:

  Yes.

 

 

 

  What you can do for the future, and if still using single parity, is to keep the old parity disk untouched and leave the array data unchanged during the sync (this includes stopping dockers/VMs if they use the array), if something goes wrong during the sync you can do a new config and use the old parity to rebuild a disk.

 

Good luck!

 

 

 

Got it, makes sense. Building parity now and will try to get back as many files as possible tomorrow.

 

21 minutes ago, pwm said:

Before swapping disks, it's good to do a parity check (without "Write corrections to parity" checked) just to verify the state of all drives and any parity mismatch.

 

Using dd and reading out the specific sector (as specified in the SMART log) from all data drives and from the old parity drive allows a manual computation of the expected content of the damaged sector.

 

Are you referring to what I have done now (i.e., new config)? Or the general practice of replacing a small/broken with a larger/new disk?

 

Data on this disk is just a bunch of movies. If something got lost, its good to know what it is, but not a big deal. Key is to replace the disk to avoid that all is broken. Also, i may even still be under warranty.

Link to comment

I was referring to replacement of disks, change of file systems etc. I.e. any situation where large amounts of files will be moved or where the system may be temporarily left in a state with missing redundancy.

 

It's always good to know which file is broken - no fun to start viewing a file and find that some important data in the file is broken. But you will see which file is broken when you perform a copy too.

Link to comment

Just finished rsync, but unfortunately only 400GB out of 6TB were copied to the new array. The error messages in putty are incomplete as I can only see the most recent ones. Any idea whether some part of the 5.6TB can still be recovered. As you had mentioned before, the SMART scan completed 60% (and there were only some very few errors), so hop is that not 90%+ will be gone.

Link to comment

MC gave me some choice "skip - skip all - retry". First I hit "skip", but this felt that there was no end. When I hit "skip all", it just stopped everything and got me back to the starting screen. Feels MC is too difficult for me to use ;-) Will give it a shot with Win10 now. This has a simple "cut and paste". So, I know what I am doing.

Link to comment

Here are some MC tutorials:

https://opensource.com/business/15/5/midnight-commander

https://www.linode.com/docs/tools-reference/tools/how-to-install-midnight-commander

just ignore the instalation instructions on these links since it is already installed for you on unRAID.

 

http://www.trembath.co.za/mctutorial.html

This one is just a tutorial but is all text no graphics - that I saw anyway.

 

http://linuxcommand.org/lc3_adv_mc.php

tutorial with graphics on this one.

 

Another text only but describes advanced subjects:

http://klimer.eu/2015/05/01/use-midnight-commander-like-a-pro/

 

A YouTube video along with links to more

https://www.youtube.com/watch?v=LEb7p6Ihu40

 

And lastly a link that has the above and other info that may be useful

http://www.softpanorama.org/OFM/MC/mc_tips.shtml

Link to comment

Thanks. This is turning into a total nightmare. I think I now understand the complication.

 

This disk has many (!) very small files. Seems something that Unraid doesn't really like to be moved in or out of the array. I had these issues before.

 

The situation is now that I have copied the initial 400GB form the 6TB disk to the array. I was now trying to delete them again from the array disk. When deleting them, Unraid crashes (becomes non responsive, webui no longer shows up, VM no longer shows disk size and cannot be accessed anymore). Only change is to detach Unraid from power and restart.

 

I remember that I read this somewhere that Unraid has issues with transfer of large amount of very small files. Any thoughts whether there had been some fix or work-around?

 

I tried both MC, RM and also just delete from the VM. All with same outcome.


Please note that this is now on a brand-new disk that is part of the array.

 

I still need to fix my original problem of recovering the "old" disk, but feels that I need to fix the array issue first.

Link to comment

Different file systems behaves differently - some are better for a few but huge file and some excels at adding/removing a huge number of small files. There are multiple locations on the disk to update for each removed file, since directory, inode and free lists needs to be updated.

Link to comment

thanks. so, the answer is that XFS is just not designed for small fires and that’s not unexpected behavior?

i now firstly want to delete the files from the “new” disk. not even move them, but just delete. “rm -r” freezes things. i’m sure small batches will work, but this will take forever until i have deleted everything.

any other idea how to wipe this folds from the new disk? i could remove it from the array? without parity, it may be easier to delete things?


Sent from my iPhone using Tapatalk

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.