May 17, 20242 yr The last few parity check results have left me a bit concerned. My disks all show green, and the Error Count column shows 0 for all disks, but when the parity check runs, it always comes back with a large number of errors. In the most current run, it reports finding 28541795 of them. This has been the case for the last several parity runs but I don't know exactly when they started as the check error count is below the fold and not something I generally paid attention to. I'm seeing the following message flooding the logs but I don't know if it has any bearing on this: kernel: program smart smartctl is using a deprecated ISCSI ioctl, please convert it to SG_IO. Honestly, I don't really know where to start with this and I'm hoping maybe someone can point me in the right direction... in my experience, a disk throwing that many errors on a parity check should show other signs of failure and I'm worried that smartctl is failing and lying about the health of my parity disk.
May 17, 20242 yr Community Expert Have you changed the parity drives recently for larger ones? Just asking as there has been an intermittent bug (for which I believe the cause has not been tracked down) where parts of the parity disk beyond the size of the largest data drive (at the time the parity drive size was installed) do not correctly get zeroed. The parity check then gives errors when it reaches this part of the parity drive. Providing your diagnostics so we can see at what point the errors start might give a clue. Since you get no read/write errors on the drives there is a good chance that you need to either run a correcting parity check or simply rebuild parity.
May 17, 20242 yr Author Attaching Diags I haven't changed the physical disks, though I did move the physical SATA connector to another port because the one I was using was giving me troubles with CRC errors. nas02-diagnostics-20240517-0835.zip
May 17, 20242 yr Community Expert Log is being spammed with syslog server errors, fix that, reboot to clear the logs, then run two consecutive non correcting checks without rebooting, and post new diags.
May 18, 20242 yr Author Corrected syslog configuration, reran non-correcting parity twice, posting new log gather nas02-diagnostics-20240518-1602.zip
May 19, 20242 yr Community Expert Looks like it found the same errors, confirm by looking at parity history on main, if yes, run a correcting check and then a non correcting one to confirm it's OK.
May 19, 20242 yr Author The main log shows those same messages: kernel: program smart smartctl is using a deprecated ISCSI ioctl, please convert it to SG_IO. I've run correcting parity checks already and it throws the same messages about ioctl and SG_IO
May 20, 20242 yr Community Expert 12 hours ago, h0rnman said: kernel: program smart smartctl is using a deprecated ISCSI ioctl, please convert it to SG_IO. Cannot see where those started since the syslog rotated, reboot before running the parity check, and if those errors start again post new diags.
May 21, 20242 yr Author So I did a little more digging on my end, and I managed to figure out two things. First, it's multiple items that are writing these log entries. I was able to determine one of them was the Disk Locator plugin. I don't *actually* need that one, so I uninstalled it and now those messages are gone. I did reboot and ran another non-correcting parity check, which resulted in much of the same behavior (several million errors detected, but came back as Parity is Valid). I also ran a short and a long SMART test on the disk and both came back clean. Attaching another diag, and I'm also re-running the non-correcting check in case something with the plugin was acting up and will post when it's done. So far, syslog is just showing several of these: kernel: md: recovery thread: P incorrect, sector=874360 followed by kernel: md: recovery thread: stopped logging so not a lot of help nas02-diagnostics-20240521-1518-pre-ParityCheck.zip
May 22, 20242 yr Community Expert Did you a run a correcting check already? In any case run one correct, then non correct, without rebooting, and post new diags.
May 25, 20242 yr Author Ok, so I ran another non-correcting check (which still had all the errors), then I restarted, ran a correcting check (again errors), then a non-correcting check (came back all clear). So it looks like whatever gremlin was in the parity check is gone and that the last correcting check appeared to have executed successfully. I'm still somewhat concerned that there's a scenario where parity can report as valid even though the checks report errors, and with all of the checking and rebooting, the root cause is still a mystery.
May 26, 20242 yr Community Expert See if any more errors are detected on future checks, and if yes there's likely still a problem, if not you should be OK
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.