danielocdh Posted August 26, 2020 Share Posted August 26, 2020 (edited) I manually ran a non corrective parity check and I got this: Unraid Parity check: 25-08-2020 21:49 Notice [HD] - Parity check finished (20 errors) Duration: 3 hours, 26 minutes, 33 seconds. Average speed: 161.4 MB/s All my drives are green and there wasn't any unclean shutdown. Previous parity check was 35~ days ago Uptime was 13 days and 15~ hours after the parity check Is there a way to know which files are affected? I checked the SMART logs on all my drives and they all are empty except for the parity drive which has this, shows errors from 25 days uptime(it's what I understand, not 100% sure) but my uptime is only 13 days, so I'm not sure when this happened: ATA Error Count: 3 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 3 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:08:32.180 SMART WRITE LOG b0 d6 01 e0 4f c2 a0 00 00:08:32.067 SMART WRITE LOG b0 d6 01 e0 4f c2 a0 00 00:08:31.895 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:02:48.494 IDENTIFY DEVICE b0 d8 01 01 4f c2 a0 00 00:02:48.409 SMART ENABLE OPERATIONS Error 2 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:08:32.067 SMART WRITE LOG b0 d6 01 e0 4f c2 a0 00 00:08:31.895 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:02:48.494 IDENTIFY DEVICE b0 d8 01 01 4f c2 a0 00 00:02:48.409 SMART ENABLE OPERATIONS 60 10 98 00 00 00 40 00 00:02:48.217 READ FPDMA QUEUED Error 1 occurred at disk power-on lifetime: 611 hours (25 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 51 01 00 00 00 a0 Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- b0 d6 01 e0 4f c2 a0 00 00:08:31.895 SMART WRITE LOG ec 00 00 00 00 00 a0 00 00:02:48.494 IDENTIFY DEVICE b0 d8 01 01 4f c2 a0 00 00:02:48.409 SMART ENABLE OPERATIONS 60 10 98 00 00 00 40 00 00:02:48.217 READ FPDMA QUEUED ea 00 00 00 00 00 a0 00 00:01:06.438 FLUSH CACHE EXT I was going to post my diagnostics zip but it seems that even that I chose to Anonymize it, some information is still there so I'll have to check it, please let me know if you need a specific file from it. Edited August 26, 2020 by danielocdh Quote Link to comment
JorgeB Posted August 26, 2020 Share Posted August 26, 2020 Run another check (without rebooting) and then post diags. Quote Link to comment
danielocdh Posted August 26, 2020 Author Share Posted August 26, 2020 (edited) Unraid Parity check: 26-08-2020 17:50 Notice [HD] - Parity check finished (0 errors) Duration: 4 hours, 4 minutes, 58 seconds. Average speed: 136.1 MB/s I didn't reboot, just took a while to start another check. I started one check and cancelled fast because I wasn't sure if I unchecked the corrections checkbox, then started a check again and let it run fully, I think I didn't remove or added any files between the failed and the correct check. How do I know what caused those original 20 errors, and what files(if it wasn't parity mistake) were affected? Thanks Edited September 4, 2020 by danielocdh Quote Link to comment
JorgeB Posted August 27, 2020 Share Posted August 27, 2020 7 hours ago, danielocdh said: How do I know what caused those original 20 errors You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks. 7 hours ago, danielocdh said: and what files(if it wasn't parity mistake) were affected? Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data. Quote Link to comment
je82 Posted August 27, 2020 Share Posted August 27, 2020 4 hours ago, johnnie.black said: You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks. Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data. Sorry for hijacking the thread but i am interested in this, has there been any system or script that does this work for you and you can then later use it to find out which files may be affected if parity errors occur? It would be a great addition to the unraid universe to be able to know exactly which data is affected if parity error were found and corrected. Quote Link to comment
JorgeB Posted August 27, 2020 Share Posted August 27, 2020 3 minutes ago, je82 said: has there been any system or script that does this work for you and you can then later use it to find out which files may be affected if parity errors occur? There are various options to create checksums, including corz for Windows, Dynamix File Integrity Plugin, etc, another option is to use btrfs since it automatically creates checksums for all blocks. 1 Quote Link to comment
danielocdh Posted August 27, 2020 Author Share Posted August 27, 2020 (edited) On 8/27/2020 at 1:58 AM, johnnie.black said: You're absolutely sure no unclean shutdowns? If it wasn't that it's likely some hardware issue, but since second check didn't found any errors no much to go on for now, disks look fine and unlikely to be RAM the way it happened, see if you get more errors on future checks. Unfortunately no way of knowing, you'd need per-existing checksums to be able to check all your data. The machine is on a configured UPS, the only time the power went out(while I was sleeping) there was no automatic parity check after I turned it on and I'm sure there have been at least 2 manual parity checks without errors after that(before the one with errors). I also don't remember ever having an automatic parity check. It's really weird (in my mind) to not be able to pin point the error(exact file and exact bytes in the file), assuming the parity drive has the same chance of having wrong bytes for whatever reason, seems senseless to not be able to know which file(s)/bytes might be damaged. Thanks for the answers, what I will do for now(when I have enough time) is test openmediavault+snapraid on a VM, I'll be trying to find out if it solves the issue(of telling where the possible damage is) and if I can use it without having to rewrite my drives (besides parity) Edit: it seems that snapraid (in openmediavault) will keep hashes and parity data but they won't get autoupdated like it happens on unraid(with parity) when you edit/add/remove a file (you have to manually sync it), on error it will show the path of the file and the drive symlink/path. Most likely it is easier/better to just install something like Dynamix File Integrity on unraid. Edited September 6, 2020 by danielocdh Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.