Parity check (non corrective) found errors, what now?


Recommended Posts

I manually ran a non corrective parity check and I got this:

Unraid Parity check: 25-08-2020 21:49
Notice [HD] - Parity check finished (20 errors)
Duration: 3 hours, 26 minutes, 33 seconds. Average speed: 161.4 MB/s

 

  • All my drives are green and there wasn't any unclean shutdown.
  • Previous parity check was 35~ days ago
  • Uptime was 13 days and 15~ hours after the parity check

 

Is there a way to know which files are affected?

 

 

I checked the SMART logs on all my drives and they all are empty except for the parity drive which has this, shows errors from 25 days uptime(it's what I understand, not 100% sure) but my uptime is only 13 days, so I'm not sure when this happened:

ATA Error Count: 3
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:08:32.180  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:08:32.067  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:08:31.895  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:02:48.494  IDENTIFY DEVICE
  b0 d8 01 01 4f c2 a0 00      00:02:48.409  SMART ENABLE OPERATIONS

Error 2 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:08:32.067  SMART WRITE LOG
  b0 d6 01 e0 4f c2 a0 00      00:08:31.895  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:02:48.494  IDENTIFY DEVICE
  b0 d8 01 01 4f c2 a0 00      00:02:48.409  SMART ENABLE OPERATIONS
  60 10 98 00 00 00 40 00      00:02:48.217  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 00 00 00 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 e0 4f c2 a0 00      00:08:31.895  SMART WRITE LOG
  ec 00 00 00 00 00 a0 00      00:02:48.494  IDENTIFY DEVICE
  b0 d8 01 01 4f c2 a0 00      00:02:48.409  SMART ENABLE OPERATIONS
  60 10 98 00 00 00 40 00      00:02:48.217  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 00      00:01:06.438  FLUSH CACHE EXT

 

I was going to post my diagnostics zip but it seems that even that I chose to Anonymize it, some information is still there so I'll have to check it, please let me know if you need a specific file from it.

 

 

Edited by danielocdh
Link to comment
Unraid Parity check: 26-08-2020 17:50
Notice [HD] - Parity check finished (0 errors)
Duration: 4 hours, 4 minutes, 58 seconds. Average speed: 136.1 MB/s

I didn't reboot, just took a while to start another check. I started one check and cancelled fast because I wasn't sure if I unchecked the corrections checkbox, then started a check again and let it run fully, I think I didn't remove or added any files between the failed and the correct check.

 

How do I know what caused those original 20 errors, and what files(if it wasn't parity mistake) were affected?

 

Thanks

 

Edited by danielocdh
Link to comment
7 hours ago, danielocdh said:

How do I know what caused those original 20 errors

You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks.

 

7 hours ago, danielocdh said:

and what files(if it wasn't parity mistake) were affected?

Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data.

Link to comment
4 hours ago, johnnie.black said:

You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks.

 

Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data.

Sorry for hijacking the thread but i am interested in this, has there been any system or script that does this work for you and you can then later use it to find out which files may be affected if parity errors occur? It would be a great addition to the unraid universe to be able to know exactly which data is affected if parity error were found and corrected.

Link to comment
3 minutes ago, je82 said:

has there been any system or script that does this work for you and you can then later use it to find out which files may be affected if parity errors occur?

There are various options to create checksums, including corz for Windows, Dynamix File Integrity Plugin, etc, another option is to use btrfs since it automatically creates checksums for all blocks.

  • Like 1
Link to comment
On 8/27/2020 at 1:58 AM, johnnie.black said:

You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks.

 

Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data.

The machine is on a configured UPS, the only time the power went out(while I was sleeping) there was no automatic parity check after I turned it on and I'm sure there have been at least 2 manual parity checks without errors after that(before the one with errors). I also don't remember ever having an automatic parity check.

 

It's really weird (in my mind) to not be able to pin point the error(exact file and exact bytes in the file), assuming the parity drive has the same chance of having wrong bytes for whatever reason, seems senseless to not be able to know which file(s)/bytes might be damaged.

 

Thanks for the answers, what I will do for now(when I have enough time) is test openmediavault+snapraid on a VM, I'll be trying to find out if it solves the issue(of telling where the possible damage is) and if I can use it without having to rewrite my drives (besides parity)

 

Edit: it seems that snapraid (in openmediavault) will keep hashes and parity data but they won't get autoupdated like it happens on unraid(with parity) when you edit/add/remove a file (you have to manually sync it), on error it will show the path of the file and the drive symlink/path. Most likely it is easier/better to just install something like Dynamix File Integrity on unraid.

Edited by danielocdh
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.