Jerry1111 Posted March 16, 2022 Share Posted March 16, 2022 Unraid was working without problems for a number of years. Suddenly I had "Parity check finished (100 errors)". Below is history of parity checks and diagnostics attached. I don't want to do anything silly, so thought it's better to consult experts on the forum. tower.home-diagnostics-20220316-2206.zip Quote Link to comment
trurl Posted March 16, 2022 Share Posted March 16, 2022 Feb 19 13:23:12 tower emhttpd: unclean shutdown detected will often result in a few sync errors, and the timestamp agrees with your history. The unclean shutdown triggered Feb 19 13:24:23 tower kernel: mdcmd (36): check nocorrect Feb 19 13:24:23 tower kernel: md: recovery thread: check P Q ... You later ran a correcting parity check, but disk2 was having problems while that was happening. Mar 1 04:00:01 tower kernel: mdcmd (37): check Mar 1 04:00:01 tower kernel: md: recovery thread: check P Q ... Mar 1 04:00:09 tower kernel: ata5.00: exception Emask 0x50 SAct 0x3fc00 SErr 0x280901 action 0x6 frozen Mar 1 04:00:09 tower kernel: ata5.00: irq_stat 0x0c000000, interface fatal error Mar 1 04:00:09 tower kernel: ata5: SError: { RecovData UnrecovData HostInt 10B8B BadCRC } Mar 1 04:00:09 tower kernel: ata5.00: failed command: READ FPDMA QUEUED Mar 1 04:00:09 tower kernel: ata5.00: cmd 60/00:50:40:00:00/04:00:00:00:00/40 tag 10 ncq dma 524288 in Mar 1 04:00:09 tower kernel: res 40/00:88:40:1c:00/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Mar 1 04:00:09 tower kernel: ata5.00: status: { DRDY } ....and more Mar 1 04:00:28 tower kernel: md: disk2 read error, sector=7168 Mar 1 04:00:28 tower kernel: md: disk2 read error, sector=7176 Mar 1 04:00:28 tower kernel: md: disk2 read error, sector=7184 ...and more Check connections on disk2 then disable spindown on disk2 and run an extended SMART test. If that passes you need to run another correcting check. Then a non-correcting parity check to verify. The only acceptable result is exactly zero sync errors so you have been in an unacceptable state for several weeks now. You should have seen the I/O errors for disk2 in the Errors column on Main - Array Devices. Check to make sure you aren't still getting them when you correct parity again. Quote Link to comment
Jerry1111 Posted March 17, 2022 Author Share Posted March 17, 2022 I can't see any more errors on disk2, the count in the Errors column is 1024 (suspiciously round number). I have tried some random reads of files from that disk - no errors generated, but then it doesn't tell anything about the disk, it's only a "weak" indication that cables are OK (I have reconnected all connectors). I started extended self-test, we'll see. Are these 100 errors fully correctable, or do I have to worry that 100 sectors might be gone forever? The setup is old (well, it was new in 2012 ), so probably time to refresh. Biggest problem - don't want to change all disks at the same time to avoid risk of being in the same place of the bathtub curve with the whole storage. I only ever had one disk fail in this setup - 1ST500LM021-1KJ152_W621GQWL - 500 GB (sdj), which annoyingly was the only cache at that time. Thankfully it wasn't too difficult to rebuild VMs and dockers. Quote Link to comment
trurl Posted March 17, 2022 Share Posted March 17, 2022 35 minutes ago, Jerry1111 said: Are these 100 errors fully correctable, or do I have to worry that 100 sectors might be gone forever? The read errors on disk2 were due to bad connection. If that is fixed then those sectors should be read next time parity is corrected, assuming nothing actually wrong with disk2 of course. Quote Link to comment
Jerry1111 Posted April 10, 2022 Author Share Posted April 10, 2022 Sorry for reporting back late - got distracted with house stuff. I took the server out, cleaned and re-seated all of the connectors. Ran extended smart test, followed by read-only parity - everything is back to normal. Many thanks for help. Given my random collection of old disks (and this scare!) it's probably time to slowly start to swap the disks for the new ones. Probably I have to do it slowly, to avoid all of the new disks falling into the same valley of the failure bath-curve. On the other hand - if it ain't broke... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.