Finding errors after replacing some drives


Recommended Posts

I am finding so errors after replacing three older drives and running the rebuild. I have run a parity check (with correction) between each replacement, to make sure it remained valid. However upon the those checks it has Given me this:

 

Quote
  Last check completed on Tue 07 Aug 2018 01:55:59 AM EDT (today), finding 186 errors.
Duration: 11 hours, 23 minutes, 55 seconds. Average speed: 97.5 MB/sec

 

It has not done this in a very long time, if ever. Please find the log attached below since I am at a loss. 

Thank you so much for the help,

Ice

 

07-08-2018_server_log.rar

Edited by icedragonslair
Link to comment

Yes I set up notifications but saw no problems...could this just be a cable issue?

 

I saw that the notifications tab was off....

 

So i pulled the drive and tested with WD software (since this drive is newer 12/2017) it checked out fine.

I am now running a file system check, should I order a new drive just for backup? 

Edited by icedragonslair
Link to comment
18 minutes ago, icedragonslair said:

Yes I set up notifications but saw no problems...could this just be a cable issue?

 

 

 No. With cable issues you would get UDMA CRC errors and the machine would either manage to read out correct data by retrying the read - or would try to reset the link to the drive if it isn't possible to communicate with the drive.

 

You only posted the server log but you should take a close look at the SMART data for your disks.

  • Like 1
Link to comment

The smart data on both tests show no errors "completed without error"

 

and the log shows 'no errors logged'

running another set now

 

edit From my post above - 

Quote

 

I saw that the notifications tab was off....

 

So i pulled the drive and tested with WD software (since this drive is newer 12/2017) it checked out fine.

I am now running a file system check, should I order a new drive just for backup? 

 

 

Edited by icedragonslair
Link to comment
14 minutes ago, icedragonslair said:

The smart data on both tests show no errors "completed without error"


That's basically "the manufacturer don't think they need to do a warranty replacement of the drive."

 

The SMART data contains much more information than that - for example lots of nice counters. Some of which unRAID checks as default, and some that might be meaningful to add in yourself.

 

But since you haven't posted full diagnostics information, I have no idea if they are problematic values in the SMART data.

  • Like 1
Link to comment

Under the extended test I saw the sector count and submitted for an rma...replacement on the way since the drive is only 6 mos old...:). When it comes in should I be okay to preclear, replace and rebuild without any other expected problems? I do see some crc errors on my cache drive but I will take care of that when this is done

 

Thanks again for all the help,

Ice

Link to comment
5 minutes ago, icedragonslair said:

Also, this corruption...how do I go about fixing it once I have the new drive installed

You'd need to have cheksums for all files (or be using btrfs) and replace the corrupt files, alternatively and if the old disks are OK you can copy them, but you'll need to copy all the files since without checksums you won't know with files are affected.

 

 

Link to comment

Okay, a little beyond me...

I do have two of the old drives untouched, can I just re add them in their previous places and then copy the data from the affected drives (the ones I dont have). Then copy it back on to the server and rebuild the parity that way...then replace the drives again?

 

I always use teracopy so everything goes on to the server correctly, can I use that to do the copies?

 

Is there an easier way, considering there is so much data?

 

What is wierd though is this is a media server and I haven't noticed a single corrupt file and have played everything I added since the replacement, or doesnt that matter

 

Edited by icedragonslair
Link to comment

You could mount the old drives with the UD plugin and copy the data over, you'll need to change the UUID first or they won't mount.

 

3 minutes ago, icedragonslair said:

Is there an easier way, considering there is so much data? 

Not without knowing which files are affected, and note that rebuilding disk18 next will also result in some corruption on that disk.

 

Link to comment

So I am looking at those 4 disks and pulling the data off them completely and then replacing it if I can?

 

Now here's is the biggest question. What if I do this gradually after replacing that drive, since it is a JBOD system wouldn't this corruption go to all drives containing any of the data affected?

 

I am thinking of an 8tb external USB and would I be able to use  Krusader to do the copies, or am I stuck doing it via windows from the shares?

Link to comment

Last few questions (I hope and so do you...lol),

 

Can I just play the media that's been added to see if it is okay, then copy the whole drive if it is. Then reformat the disk and return the files to the disks, then rebuild the parity?

 

I am assuming this is one of the few downsides to xfs file system?

 

Thanks loads,

 

Ice

Link to comment

The only real downside to XFS is that they don't have data checksums.

 

For the old drives, you could compute the checksum for each and every file and compare with the checksum for the disks in the array.

 

Or just copy the data from the old disks on-top-of the data in the array, which will overwrite both correct and broken files, but will reduce the number of broken files you have.

 

The good news is that for media files, it often doesn't matter so much if you have a bit of corruption - many video file formats are designed to be quite robust. So you get one or more broken video frames and then the video playback can return to play good data again.

Link to comment

I suppose my biggest concern/question is that when i used teracopy to move the data to the server, confirmed checksum...wouldn't that have told me if the files were corrupt? If not, how do I change the UUID and mount the two (intact) old drives, one at a time, that i have to make it so I can copy them over? Or am i better off trying to rerip all the files on those disks (yes i have them) and just go from there?

 

Thank you,

 

Ice

Link to comment
20 minutes ago, icedragonslair said:

I suppose my biggest concern/question is that when i used teracopy to move the data to the server, confirmed checksum...wouldn't that have told me if the files were corrupt? 

 

On 8/7/2018 at 6:53 PM, johnnie.black said:

Didn't you notice the read errors on what appears to be a failing disk18 during the rebuild of disks 5, 7 and 13?

 

If you make a rebuild, then unRAID fills the disk with recomputed data. But this requires that all the other drives produces valid data - if one of the disks reads out incorrect data for a sector, then one recomputed sector will contain wrong data that gets written to the rebuilt disk.

 

If you never did any rebuild, but copied the data from the old disks to some scratch storage. And then replaced the disk and copied the data back, then you only end up with broken content if that scratch storage produced errors - or if the machine goofed because of memory issues or similar.

 

So it's only if disk 18 was involved in recreating contents on disk 5, 7 and 13 that read errors on disk 18 would introduce errors to content on disk 5, 7 and 13.

Link to comment

So if I replace the data on 5 & 7 (no longer have 13) and the data on 18 (was only filled since February) then do a check on the data on 13...then rebuild parity I should be good? I have checked about 30% of the files on 18 and none so far are damaged apparently, but like was stated these mkvs are pretty resilient. but of course i continue all. 
 

 

 

Link to comment

Actually they are all my  Blu-ray movie & TV  collection that I have converted to HEVC x265, so I have the originals on disc. so I can always convert again if I absolutely have to but what a time consuming job that is, so at least that's a break in my favor.

 

'I'll have to ask how to change the ID on the drives so I can mount an Unassigned Disk',  share and then transfer the files. And since parity rebuilds did not show any errors until the last two rebuilds. I am assuming that means the first two drives rebuilt (5 & 7) should be free from errors and I can use those, then just replace the data on the last two (13 & 18), correct?

 

Thanks loads,

Ice

Edited by icedragonslair
Link to comment
2 hours ago, icedragonslair said:

'I'll have to ask how to change the ID on the drives so I can mount an Unassigned Disk', 

 

xfs_admin -U generate /dev/sdX1

 

Replace X with the correct unassigned disk identifier.

 

2 hours ago, icedragonslair said:

And since parity rebuilds did not show any errors until the last two rebuilds. I am assuming that means the first two drives rebuilt (5 & 7) should be free from errors and I can use those, then just replace the data on the last two (13 & 18), correct?

 

There were read errors on disk18 during the rebuild of all 3 disks, this means there will be corruption on all 3, though if it happen during only the first rebuild all 3 would still be corrupt, since any subsequent rebuild would use a corrupt disk.

  • Like 1
Link to comment
6 hours ago, johnnie.black said:

There were read errors on disk18 during the rebuild of all 3 disks, this means there will be corruption on all 3, though if it happen during only the first rebuild all 3 would still be corrupt, since any subsequent rebuild would use a corrupt disk.

i'd like to know a little more about the logs and what/when in the log to look for this info, can you point me in the right direction or give me the  line #'s and which log that it is on? This way I can kind of look through it myself when anything pops up.

 

I'll save the question about how I create a checksum for the server (disks), I have been digging but have found nothing on the forums that is recent

 

Thanks again

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.