Jump to content

Where are my errors???


zzgus

Recommended Posts

After a power failure and a parity check I got this: 732 errors.

 

Quote

Event: unRAID Parity check
Subject: Notice [UNRAID-MEDIA] - Parity check finished (732 errors)
Description: Duration: 10 hours, 47 minutes. Average speed: 103.1 MB/s
Importance: warning

 

But I'm unable to see where those errors are.

 

Attached diagnostics file also.

 

Thankyou

Gus

 

 

 

unraid-media-diagnostics-20180207-1005.zip

UNRAID_ERRORS.jpg

Link to comment
5 minutes ago, johnnie.black said:

The circled are read errors, you got sync errors, which are normal after an unclean shutdown and there's no way of knowing which disk they came from, likely parity, in any case you just need to make sure it was a correcting check.

 

Thankyou @johnnie.black  every day we learn something.

 

Quote

in any case you just need to make sure it was a correcting check.

 

Do you refer to this option in the Parity Check that started automatically?

 

Check will start Parity-Check.
Write corrections to parity

 

Thankyou
Gus

Link to comment

If you get errors reported during a parity check then it is not possible to determine which disk caused them.   All you can do is correct parity to correspond to the (assumed correct) data disks.    The purpose of parity is to protect against disk failures.

 

If you want to check for file level corruption then you need a different mechanism as parity is not aware of files as such.   Parity is about protecting against disk hardware failures, not about detecting file corruption (which could occur for a variety of reasons) .  That is why you will often see statements along the lines of “Parity is not a backup”.   The only way to protect against file level corruption is to have off-line backups which can be restored if needed.  It is possible, however, to detect that such corruption may have occurred.   If your array disks are XFS format (which is the default) then the recommended way is to install and use the File Integrity plugin to create and check file checksums.   If they are BTRFS then this has built-in file checksum checking.

 

in both cases there is no automatic recovery as there is not sufficient redundant information stored to allow for this.    The expectation is that you will restore such files from your backups (or perhaps redownload them)

Link to comment
5 minutes ago, johnnie.black said:

 

 

@johnnie.black yes I read this in your past message but english is not my language and sometimes I feel I lost something.

 

If the "write corrections to parity" was enabled I understand that the parity is correct but with 732 errors (read errors?) on some files of my disks. Is this?

 

Quote

If your array disks are XFS format (which is the default) then the recommended way is to install and use the File Integrity plugin to create and check file checksums.

 

And I will have to try this to find the problematic files. Correct?

 

Thankyou

Gus

 

Link to comment

XFS and other filesystems are good at avoiding corruption with their default configurations. Therefore, it is very unlikely a power cut would result in data corruption.

 

Parity is much more likely to be impacted. This is common.

 

UnRaid will automatically start a parity check after a power cut. The default is now non-correcting. (In earlier versions, correcting was the default) If there are sync errors (like you had), you need to manually run a correcting check.

Link to comment

Uffff finally I got it.

 

No matter that "write corrections to parity" is checked in preferences, that when there's a power cut and parity check starts automatically doesn't do those corrections.

 

I will re-run a parity check with corrections.

 

Thankyou
Gus

 

Link to comment
6 minutes ago, johnnie.black said:

Agree, unless you were writing to the server when the power cut happened, in that case some of those files will be corrupt.

Any files that are in progress of being written would need to be recopied. When a disk is mounted after an unclean shutdown, it does special processing to apply writes from the journal. I've never had any corruption per se, but incomplete files are to be expected for anything that was in progress. If a larger multi-file copy was in progress, it would only partially have completed, and you'd have to figure out what it stopped and resume it. Often the timestamp is a good guide of whether a file copy completed. As data is copied, the date/time on the file will be the current date/time. When the copy finishes (and assuming the copy is supposed to preserve the timestamp), the timestamp from the original fine is applied. So if you find a file with the timestamp right before the power cut, it is a good chance that file is incomplete.

 

With background activity very possible, a bit of research might be rewired to figure out if anything didn't complete.

Link to comment
1 hour ago, zzgus said:

And I will have to try this to find the problematic files. Correct?

It isn't likely that you have broken files.

At the most, you had an ongoing file transfer when the server crasched, in which case you could potentially have files with partial size.

 

But most often, the file system will roll back to previous state and the differences will be a number of disk writes that was later ignored. Remember that the data disks don't to random writes just because they are bored. They do writes to internal structures when allocating free space for new file data. And they do writes to these just allocated blocks with the retrieved file data. And they do writes to the directory information. And these writes are normally double-written including the use of a journal to help the disk know about ongoing work in case of a power failure or hang. If the parity disk didn't get corresponding writes, then you get a difference. But nothing more broken than if the data disk had been a single-disk storage partition.

 

So the probability of broken files with unRAID isn't larger than the probability of broken files if your laptop hangs.

 

But the parity error count indicates how many parity sectors that needs to be adjusted for unRAID to recover back to having full parity protection of the RAID again.

Link to comment

If the previous parity check was noncorrecting, and you run another parity check, this time a correcting parity check, it will still find those same errors, but this time it will correct them. Some people (I am one) run another noncorrecting check after that just because they like to see that their last parity check had zero errors. That will assure you that there aren't any other problems causing the parity errors. And zero parity errors is the only acceptable answer if you want to be sure parity can accurately rebuild a missing or disabled data disk.

 

Surprised nobody mentioned getting an UPS. Servers really should have Uninterruptible Power Supplies. If you have frequent outages, you are going to have frequent parity checks.

Link to comment
On 7/2/2018 at 2:09 PM, trurl said:

If the previous parity check was noncorrecting, and you run another parity check, this time a correcting parity check, it will still find those same errors, but this time it will correct them. Some people (I am one) run another noncorrecting check after that just because they like to see that their last parity check had zero errors.

 

I have done a parity check this time with correcting, showing a final result of 732 errors.

 

Will try as you said another parity check to see if it shows 0 errors.

 

Thankyou
Gus

Link to comment
17 hours ago, zzgus said:

 

I have done a parity check this time with correcting, showing a final result of 732 errors.

 

Will try as you said another parity check to see if it shows 0 errors.

 

Thankyou
Gus

 

Parity finished with 0 errors this time.

 

Thankyou
Gus

Link to comment
On 7/2/2018 at 10:45 AM, itimpi said:

  If your array disks are XFS format (which is the default) then the recommended way is to install and use the File Integrity plugin to create and check file checksums.   If they are BTRFS then this has built-in file checksum checking.

 

Yes, my array is XFS.

 

Is this plugin you were referring to?

Dynamix File Integrity

Thankyou
Gus

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...