First time adding parity, now see errors with drive. (Solved)


Recommended Posts

Hi guys, long time lurker here. I recently swapped my server from an N54L microserver to a dell r720. The transition went smoothly or so i thought. Decided to add a parity drive and cache drive (I confess previously, i was running with out either). I have read a lot about CRC errors, read errors, and write errors etc. Can someone just verify if this is something I should be worried about or not ? I.e. do I need to correct the parity / ignore or replace the disc immediately.

 

Thx in advance.

 

5acdd93a3f5e8_unraiddriveerrorclean.thumb.jpg.60173a4a70dee2c8f05fe61ea71d8504.jpg

tower-diagnostics-20180410-2217.zip

Edited by baldfox
Solved
Link to comment

Thank you for the response.

Is there any way for me to be abe to isolate what files potentially are corrupted ? Anything else i can do besides replacing the disk to help alleviate the issue ?

 

Assuming there isn't. I have another disk there (2tb red) which isn't currently being used. In your opinion, what would be the correct way to proceed to fix the issue ? i.e. the correct way to replace and rebuild ? Or should i just manually isolate the files on the disc and drag them off and somehow remove the drive ?

 

thx

Link to comment
16 minutes ago, baldfox said:

Is there any way for me to be abe to isolate what files potentially are corrupted ? Anything else i can do besides replacing the disk to help alleviate the issue ?

To check them on the rebuilt disk you'd need to have previously built checksums (or be using btrfs), you can rebuild the disk and then copy everything you can form the old disk, every file successfully copied can be assumed OK, file(s) you can't copy due to a read error are likely corrupt.

 

 

Link to comment
5 minutes ago, trurl said:

Be sure you have Notifications setup.

 

Also, maybe obvious to you, but not to everyone apparently.

 

You must have backups. Parity is not a substitute for backups. Plenty of ways to lose files besides a disk failure, including user error.

 

You get to decide what is important enough to backup.

Link to comment

ok. How urgent is this ? with this number of errors, in your experience am I looking at an impending potential catastrophic loss, or is this the start of the end. I know that's like asking how long is a piece of string, but i guess i need to understand whether i should drop everything else am doing to focus solely on this, or this is something that i need to attend to in coming days.

 

Also forgive the questions, but hopefully they'll be useful for others. If I rebuild the disk as mentioned above, how would you suggest i then get the stuff off the old disk ? I think i just misunderstood the context there. In my mind i was thinking copy the files onto the other disks manually. The nuke disk 1 / replace it. Then copy the files back. Sorry for the noobie questions. I want to make sure i give myself every chance of saving the data intact. Also do i need to pre-clear or doing anything to prepare the spare disk that's currently unassigned.

Link to comment
3 minutes ago, trurl said:

 

Also, maybe obvious to you, but not to everyone apparently.

 

You must have backups. Parity is not a substitute for backups. Plenty of ways to lose files besides a disk failure, including user error.

 

You get to decide what is important enough to backup.

I do have backups of the important stuff (cloud, off-site and on-site). Am just trying to advance my situation, and utilise unraid properly. Appreciate all your help. As i mentioned above, hopefully this chain will be useful for others in similar situations.

Link to comment

When you rebuild the disk the new disk will have all the contents of the old disk. johnnie.black mentioned some possibility of corruption due to parity being built from a disk that was giving read errors. That is possible but maybe not a significant problem given the small number of read errors. The only way to find out is to try.

 

The rebuild itself is minimal effort assuming your system is working well (good connections, power, etc.). And rebuilding 2TB won't even take very long.

 

Checking for corruption would be more effort, and if you have good backups of anything important then maybe not a real concern. Just compare the important stuff to your backups and if you someday find a movie or something unimportant that won't play just re-rip or whatever.

Link to comment

So I rebuilt disk 1, and now have zero errors, so everything seems to be ok. From what you guys analysed earlier on the previous disk, is there anything that can be done to re-utilise it, or is this one for the scrap heap ? Below were some of the notifications i got, but ignored, and it finished itself. Thanks for everyone's help. I'm glad we got it solved.

 

 

image.png.9e3f4b7583f88a0296d37876b20aa51f.png

Link to comment
On 4/11/2018 at 10:15 PM, baldfox said:

So I rebuilt disk 1, and now have zero errors, so everything seems to be ok.

 

Remember that you had 75 read errors from disk 1 when you created the parity drive.

That means that the parity drive have 75 blocks with incorrect content.

Which means that when you replaced disk1 and had unRAID rebuild the content, there will be 75 blocks on the new disk1 that has incorrect content.

 

So having zero write errors now doesn't mean you have no file corruption on the replaced disk1. It just means that the new disk1 has no known hardware error so unRAID is able to read/write all content.

 

Playing around with the original disk1, you should be able to figure out which files was incorrectly read when you built the parity drive - and so also which files that has incorrect content on the new disk1 drive.

Link to comment

Yeah. I've noticed a couple of legally obtained mp3s don't seem to be working now but am afraid it seems a bit hit and miss. Is there a more scientific way to see which files have been damaged / corrupted ? I've left the original offending disc in the system albeit unmounted. Not really sure what else to do next. 

 

Anything that you would do different or something I should have done to avoid this situation (for next time).

Link to comment

If you had managed to catch the log of the 75 disk read errors, then it would have been possible to use tools like xfs_bmap or similar to figure out which files that had data stored on these disk blocks.

 

You could compute checksums for all files on new and old disk1 and compare.

Most likely, the checksum program will fail to read the data for some of the files on the old disk so for them you will directly get an indication that most probably the file is broken on the new drive too.

 

Or you could perform a copy of all files from the old drive - any file that fails to copy is probably broken on the new disk too.

 

When storing static data, it's very good to make use of a checksum tool to compute some reasonably strong checksum for all file data and store somewhere - it helps you later to figure out if you suffer disk corruption and can also be used to detect if malware is trying to modify your files.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.