Chasing down disabled drive / CRC errors


Go to solution Solved by Arcaeus,

Recommended Posts

7 minutes ago, JorgeB said:

Yes.

 

 

7 minutes ago, trurl said:

yes and make sure you leave disk5 unassigned

 

Array is started with disk 5 still being emulated. The lost and found folders are as follows:

 

disk 4 - 5,824 objects: 66 directories, 5758 files (183 GB total)

disk 5 -  811 objects: 602 directories, 209 files (2.49 GB total)

 

That seems like a lot to me lol.

 

What's next? Does this indicate that the disks are bad or that it could be caused by the CRC errors? Could I wipe/preclear the disks (one at a time) and then have them rebuilt, or what would you guys recommend?

 

Link to comment
  • Replies 81
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Not much you can do about disk4, corruption could have been caused by parity not being 100% in sync or errors on another disk(s) during the rebuild, CRC errors shouldn't be an issue, except for a small performance impact.

 

As for disk5 I would recommend either rebuilding to a new spare disk and then comparing the data with old one or doing a new config and re-syncing parity instead.

Link to comment
1 minute ago, trurl said:

Post new diagnostics.

 

Do you have any spare disks to rebuild to?

 

I'm wondering if it might give a better result to use the original disk5 to rebuild disk4 to another disk. That is a little more complicated.

 

New diags attached. 

 

I don't have any other 4TBs. I have the 2 16TBs in there that have been precleared, but that would be more than the 4TB parity so not sure how that would work. I still have the old disk 7 sitting on my desk that I haven't done anything with since I pulled it out for the rebuild. Does that help at all? As frowned upon as it may be, I did parity checks every week to make sure it was correct but not sure if that helps us right now.

 

1 minute ago, JorgeB said:

Not much you can do about disk4, corruption could have been caused by parity not being 100% in sync or errors on another disk(s) during the rebuild, CRC errors shouldn't be an issue, except for a small performance impact.

 

As for disk5 I would recommend either rebuilding to a new spare disk and then comparing the data with old one or doing a new config and re-syncing parity instead.

 

"comparing the data to the old one" meaning mount it and go folder by folder? And a new config being like a whole new Unraid setup or what?

 

1 minute ago, trurl said:

Do you have backups of anything important and irreplaceable?

I have some of the files backed up but probably not all of them. My previous backup got corrupted and was in the process of creating a new one. 

mediavault-diagnostics-20220504-1319.zip

Link to comment
Just now, Arcaeus said:

"comparing the data to the old one" meaning mount it and go folder by folder?

Yes, mount the old disk with UD, you'd need to change the XFS UUID but that can be done in UD, then use for example rsync to compare/sync contents.

 

1 minute ago, Arcaeus said:

And a new config being like a whole new Unraid setup or what?

No, you just go to Tools -> New config -> Preserve all assignments, then re-assign disk5, start array to re-sync parity based on the actual disk5 contents, which should not have the filesystem corruption found on the emulated disk, but you can also check that with UD before doing it.

Link to comment
17 minutes ago, JorgeB said:

Yes, mount the old disk with UD, you'd need to change the XFS UUID but that can be done in UD, then use for example rsync to compare/sync contents.

 

No, you just go to Tools -> New config -> Preserve all assignments, then re-assign disk5, start array to re-sync parity based on the actual disk5 contents, which should not have the filesystem corruption found on the emulated disk, but you can also check that with UD before doing it.

 

It seems like the second option may be the better choice, as I'm not fully convinced that the actual disk 5 is corruption free and I don't have another new drive to copy to. Is there anything in particular that I need to consider before doing this? And does this mess anything up if disk 4 also has corruption on it?

Edited by Arcaeus
Link to comment
10 minutes ago, Arcaeus said:

Is there anything in particular that I need to consider before doing this?

Although I expect old disk5 contents to be OK best to check before doing the new config, so mount the disk with UD and check, if you want to do it with the array started you need to change the XFS UUID first, option for that is in the UD plugin settings.

 

12 minutes ago, Arcaeus said:

And does this mess anything up if dick 4 also has corruption on it?

It won't mess anything up but it won't fix it, i.e., current lost+found folder will remain as is on that disk.

Link to comment
15 minutes ago, JorgeB said:

Although I expect old disk5 contents to be OK best to check before doing the new config, so mount the disk with UD and check, if you want to do it with the array started you need to change the XFS UUID first, option for that is in the UD plugin settings.

 

It won't mess anything up but it won't fix it, i.e., current lost+found folder will remain as is on that disk.

 

Sorry, still a bit confused here. I changed the UUID in UD, then mounted the old disk 5. Do I run the system filecheck again on that disk? Then do I need to do the Preserve current assignments part? And is that for both array and pool slots?

 

18 minutes ago, JorgeB said:

Although I expect old disk5 contents to be OK best to check before doing the new config, so mount the disk with UD and check, if you want to do it with the array started you need to change the XFS UUID first, option for that is in the UD plugin settings.

 

It won't mess anything up but it won't fix it, i.e., current lost+found folder will remain as is on that disk.

I guess that's fine? does that data need to stay on there, or can/should it be deleted? Especially for disk 4 having 183GB, it seems like either it's useful and gets put back into the proper spot, or is not useful, the data is rebuilt on the disk, and then the lost + found data gets deleted to not take up unnecessary space.

Link to comment
Just now, Arcaeus said:

Do I run the system filecheck again on that disk?

Shouldn't be needed but you can do it.

 

1 minute ago, Arcaeus said:

Then do I need to do the Preserve current assignments part? And is that for both array and pool slots?

All.

 

1 minute ago, Arcaeus said:

I guess that's fine? does that data need to stay on there, or can/should it be deleted? Especially for disk 4 having 183GB, it seems like either it's useful and gets put back into the proper spot, or is not useful, the data is rebuilt on the disk, and then the lost + found data gets deleted to not take up unnecessary space.

If you can't identify the files or don't think you need them you can delete the folder.

Link to comment
1 minute ago, trurl said:

Were these correcting checks? If you have other problems, correcting parity could instead corrupt it.

 

Not sure of the difference. It should just be the regular parity check that Unraid does that you can set the schedule to run for however long.

Link to comment
7 minutes ago, JorgeB said:

Shouldn't be needed but you can do it.

 

All.

 

If you can't identify the files or don't think you need them you can delete the folder.

 

Ok preserve current assignments is done. I didn't see an option to do the file check on the drive without it being assigned in the array but it sounds like it should be fine without it.

 

Now do I reassign the disk back into the array and let the rebuild begin? Then once it's complete look through the lost and found folder to see if there was anything that was missing? Or what is the right process here?

Link to comment
1 minute ago, Arcaeus said:

Not sure of the difference. It should just be the regular parity check that Unraid does that you can set the schedule to run for however long.

When you manually run a parity check, there is a checkbox that tells it to correct parity.

When you schedule parity checks, there is also checkbox that tells it to correct parity when it runs the scheduled check.

 

Go to Settings - Scheduler and see whether you have been correcting (corrupting) parity.

 

You should only run a correcting check after you have determined there are parity sync errors that need to be corrected, and they are not caused by other problems, such as reading the disks or bad RAM.

Link to comment
2 minutes ago, trurl said:

When you manually run a parity check, there is a checkbox that tells it to correct parity.

When you schedule parity checks, there is also checkbox that tells it to correct parity when it runs the scheduled check.

 

Go to Settings - Scheduler and see whether you have been correcting (corrupting) parity.

 

You should only run a correcting check after you have determined there are parity sync errors that need to be corrected, and they are not caused by other problems, such as reading the disks or bad RAM.

 

Yep, "Write corrections to parity disk" is set to yes. So I'm assuming that should be set to no then?

Link to comment
6 minutes ago, Arcaeus said:

Ok preserve current assignments is done. I didn't see an option to do the file check on the drive without it being assigned in the array but it sounds like it should be fine without it.

 

Now do I reassign the disk back into the array and let the rebuild begin? Then once it's complete look through the lost and found folder to see if there was anything that was missing? Or what is the right process here?

Have you mounted the original disk5 as an Unassigned Device and looked at it contents?

 

If you rebuild onto that same disk, the result will be exactly the same as the emulated contents, assuming everything works correctly. How could it be otherwise since that is all the array has to work with to rebuild the disk?

Link to comment

If instead you rebuild onto another disk and keep the original as it is, then you can compare them to see what you want to keep from either.

 

3 minutes ago, trurl said:

Have you mounted the original disk5 as an Unassigned Device and looked at it contents?

That could also be a good opportunity to copy any files that need to be backed up.

Link to comment
4 minutes ago, trurl said:

Have you mounted the original disk5 as an Unassigned Device and looked at it contents?

 

If you rebuild onto that same disk, the result will be exactly the same as the emulated contents, assuming everything works correctly. How could it be otherwise since that is all the array has to work with to rebuild the disk.

 

So I have it mounted and am looking through the contents. Shouldn't I be looking at the emulated one to see what it will be applying to the disk?

 

2 minutes ago, trurl said:

If instead you rebuild onto another disk and keep the original as it is, then you can compare them to see what you want to keep from either.

 

That could also be a good opportunity to copy any files that need to be backed up.

 

I agree, I just don't have another 4TB free. The only extra I have is the old disk 7.

I have the 2x 16TBs in the server ready to go, they're just not formatted yet. These 2 disks are just sitting in UD and have been precleared. The plan was to just copy the files from the array directly to these disks as a local backup. Should I format these and copy the files from the original disk 5 to them?

 

Just now, trurl said:

You can do the check from the command line.

 

is that the "xfs_repair -v /dev/md5"  command? How do I tell it that I want it on the UD drive? change md5 to sdk or something?

 

Link to comment
3 minutes ago, trurl said:

Does this mean you have already begun New Config? New Config will not let you rebuild anything except parity.

Yes? I thought this was the plan as JorgeB said the disk was probably fine. 

 

1 minute ago, trurl said:

md5 is disk5 in the array. For an unassigned disk you would specify the partition on the disk, such as /dev/sdk1

So then I should run "xfs_repair -v /dev/sdk1" right now to fix any file system errors on the disk? or run it with the -nv flag first?

Link to comment
5 minutes ago, Arcaeus said:

I thought this was the plan as JorgeB said the disk was probably fine. 

Probably is the plan.

 

It is possible to get it to not rebuild parity, then fool it into thinking it still needs to rebuild disk5. But maybe that won't be necessary if contents of original disk looks good enough.

 

I am still a bit concerned about your hardware and its ability to reliably rebuild anything. Do we think the controller firmware update should have fixed this?

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.