omartian Posted May 26, 2021 Share Posted May 26, 2021 Hi Everyone- Ran my monthly parity check (for the first time where i unchecked "write corrections to parity") and came up with 9 errors. Last 2 months, i had 1 error each and the month before that I had 9. Don't recall any unclean shutdowns in the last 30 days. I occasionally get a message about once a week from unraid stating, "the connection to your UPS has been restored" for my APC UPS. Haven't had power go out in months. When i look under the main tab, all of my disks have 0 errors. Is this something to be worried about? Any idea how i identify which files/disks are involved? Attached is my diagnostics. I'm currently on 6.8.3 and haven't updated to 6.9.2 The other issue is my krusader and plex docker programs won't let me update them. It states "version not available". I've restarted both dockers and clicked the check for updates box. Any idea? Are the 2 problems somehow related? Thanks in advance. nasgard-diagnostics-20210526-1506.zip Quote Link to comment
JorgeB Posted May 27, 2021 Share Posted May 27, 2021 You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here. Quote Link to comment
ChatNoir Posted May 27, 2021 Share Posted May 27, 2021 Under 6.8, the "not available" part is normal, Docker changed stuff on their side. The best solution is to upgrade to 6.9. If not possible, you can find a workaround there. (also look for the go-file if you want it to survive a reboot) Quote Link to comment
omartian Posted May 27, 2021 Author Share Posted May 27, 2021 (edited) 7 hours ago, JorgeB said: You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here. That's strange. I updated the bios a few months ago but don't recall manually doing it. I'll take a look and see. Hoping that's the issue. How would I identify what the 9 errors are? Should I click write corrections to parity for my next check? Edited May 27, 2021 by omartian Quote Link to comment
omartian Posted May 27, 2021 Author Share Posted May 27, 2021 5 hours ago, ChatNoir said: Under 6.8, the "not available" part is normal, Docker changed stuff on their side. The best solution is to upgrade to 6.9. If not possible, you can find a workaround there. (also look for the go-file if you want it to survive a reboot) Awesome. I'll just upgrade. Thank you. Quote Link to comment
omartian Posted May 27, 2021 Author Share Posted May 27, 2021 (edited) 10 hours ago, JorgeB said: You're overclocking the RAM, Ryzen with overclocked RAM is known to corrupt data resulting in sync errors, see here. Checked in the Bios and XMP was enabled for my ram. I disabled the xmp profile and took speeds from 3200 to 2100hz. Hopefully that won't affect my plex server performance. Will try doing a correcting check now, hoping for only 9 errors. Edited May 27, 2021 by omartian error Quote Link to comment
JorgeB Posted May 27, 2021 Share Posted May 27, 2021 First check after fixing the problem can still find errors, but after that it should always be 0 errors, which is the only acceptable number of sync errors. Quote Link to comment
omartian Posted June 1, 2021 Author Share Posted June 1, 2021 (edited) On 5/27/2021 at 1:43 PM, JorgeB said: First check after fixing the problem can still find errors, but after that it should always be 0 errors, which is the only acceptable number of sync errors. So i just ran two parity checks. after switching off xmp on my ram, i ran a non-correcting check and only got 2 errors which i thought was weird since i was expecting the 9 from before. I attached that diagnostic. I then went to my server and re-seated all my sata cables and my sata-sas adapter (LSI SAS 9207-8i SATA/SAS 6Gb/s PCI-E 3.0 Host Bus Adapter IT Mode SAS9207-8i US). Decided to run another non-correcting check, and received 6 errors. attached below. I noticed that when the process is running, i get about 65% of the way through w/0 errors. It seems like when i get to disk 6 + 7 (or maybe just 7), the parity errors occur. Do you think it might be that connector or that disk based on the diagnostics? 2 errors noncorrecting check.zip 6 errors noncorrecting check.zip Edited June 1, 2021 by omartian updated Quote Link to comment
itimpi Posted June 1, 2021 Share Posted June 1, 2021 The first set of diagnostics shows the parity check starting but only lasts for s few minutes more so no indication of what sectors had the problem. Are you sure they were taken AFTER the 2 errors occurred? the second diagnostics sdlog shows which sectors had the problem as it has these entries May 31 13:51:10 Nasgard kernel: md: recovery thread: Q incorrect, sector=18795850688 May 31 14:32:14 Nasgard dhcpcd[1903]: br0: failed to renew DHCP, rebinding May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdj May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdh May 31 15:52:33 Nasgard emhttpd: spinning down /dev/sdi May 31 15:53:47 Nasgard kernel: md: recovery thread: Q incorrect, sector=20466240808 May 31 18:14:49 Nasgard kernel: md: recovery thread: Q incorrect, sector=22417914176 May 31 19:46:33 Nasgard kernel: md: recovery thread: Q incorrect, sector=23565254352 May 31 20:12:20 Nasgard kernel: md: recovery thread: Q incorrect, sector=23959071080 May 31 20:38:19 Nasgard emhttpd: spinning down /dev/sdb May 31 20:38:19 Nasgard emhttpd: spinning down /dev/sdc May 31 21:30:28 Nasgard kernel: md: recovery thread: Q incorrect, sector=25097677136 indicating the error sectors. However since there were no corresponding enters in the syslog in the first diagnostics it is not possible to see if there was any correspondence in the sectors reporting errors. 1 Quote Link to comment
itimpi Posted June 1, 2021 Share Posted June 1, 2021 It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful). Quote Link to comment
omartian Posted June 1, 2021 Author Share Posted June 1, 2021 18 minutes ago, itimpi said: It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful). Will check out that plugin. Thank you. Weird. Could have sworn I downloaded the 2 error after a full scan. Anything else you can make out of those bad sectors? Quote Link to comment
omartian Posted June 1, 2021 Author Share Posted June 1, 2021 23 minutes ago, JorgeB said: Run memtest. Will do. Are you seeing anything on the newer diagnostics to indicate a memory issue or is this based on the original diagnostic on the OP? Quote Link to comment
JorgeB Posted June 1, 2021 Share Posted June 1, 2021 #1 reason for unexpected sync errors is RAM related, if just downclocking didn't fix it it could be a bad DIMM. Quote Link to comment
omartian Posted June 2, 2021 Author Share Posted June 2, 2021 On 6/1/2021 at 4:18 AM, JorgeB said: #1 reason for unexpected sync errors is RAM related, if just downclocking didn't fix it it could be a bad DIMM. Memtest is currently on pass 9 and has been running for about 20 hrs w/0 errors. At this point, should i run a correcting check, or is there anything else i should be doing? Quote Link to comment
JorgeB Posted June 2, 2021 Share Posted June 2, 2021 You can, but problem is still likely there. Quote Link to comment
omartian Posted June 2, 2021 Author Share Posted June 2, 2021 1 hour ago, JorgeB said: You can, but problem is still likely there. Going to try that parity check tuning plugin next but any other suggestions on next steps? Quote Link to comment
JorgeB Posted June 2, 2021 Share Posted June 2, 2021 If there are still errors on consecutive checks you basically need to rule out the hardware involved, RAM is still a good candidate even without memtest finding errors, but could also be board/CPU or a disk, I would start by using just one DIMM at at time since it's the easiest thing to rule out. Quote Link to comment
omartian Posted June 2, 2021 Author Share Posted June 2, 2021 (edited) 38 minutes ago, JorgeB said: If there are still errors on consecutive checks you basically need to rule out the hardware involved, RAM is still a good candidate even without memtest finding errors, but could also be board/CPU or a disk, I would start by using just one DIMM at at time since it's the easiest thing to rule out. Ok. One dimm at a time on a non correcting check. Do these sync error mean that the media files on the data disk are no longer valid or just that there is a discrepancy w the parity. I'm wondering if I fixed this issue but since I never ran a correcting check, the same random sync issues pop up. If I run a correcting now, I'm hoping the next non correcting check would be clean. I wish unraid made it easier to isolate the issue. Too many variables.... Edited June 2, 2021 by omartian Quote Link to comment
JorgeB Posted June 2, 2021 Share Posted June 2, 2021 34 minutes ago, omartian said: Do these sync error mean that the media files on the data disk are no longer valid or just that there is a discrepancy w the parity. It means parity doesn't match de calculate from the arrays data devices, but with data corruption the problem can be anywhere, could be already written corrupt, could be parity that is wrong, or just the calculation at that time is wrong. Quote Link to comment
omartian Posted June 2, 2021 Author Share Posted June 2, 2021 14 minutes ago, JorgeB said: It means parity doesn't match de calculate from the arrays data devices, but with data corruption the problem can be anywhere, could be already written corrupt, could be parity that is wrong, or just the calculation at that time is wrong. Thanks for all of your help Jorge. I'll keep tinkering, run a correcting check. Quote Link to comment
omartian Posted June 3, 2021 Author Share Posted June 3, 2021 On 6/1/2021 at 2:50 AM, itimpi said: It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful). Which sectors should i point it to. I have a hard time On 6/1/2021 at 2:50 AM, itimpi said: It has just occurred to me that if you have the Parity Check Tuning plugin installed then you might be able to investigate this far more rapidly using it's Tools -> Parity Problem Assistant feature? That feature was developed for exactly your scenario but I have never had any feedback on how useful it turns out to be in practice so would be interested to get some (plus any suggestions for making it more useful). So tried using this plugin, but getting an error message. Based on the syslog that jorge highlighted above, it looks like the error happens between sectors 18795850000 and 25097678000 When i try to set it to go, i get the error message: "end point too large: The end has been set to more than the size of the disk." i punched in the above #'s as start and endpoint bc it's asking for sector numbers. Do i need to adjust it somehow? Quote Link to comment
itimpi Posted June 3, 2021 Share Posted June 3, 2021 3 hours ago, omartian said: Which sectors should i point it to. I have a hard time At the moment you have to manually search the syslog for the affected sectors (it will typically be near the end if you have just run a check). I was intending to add a button on the input page that would scan the syslog and pop up a dialog with any sectors found so that it is much easier to both know what sectors are involved and to make it easier to select them. I had been waiting on some feedback on people trying to use this assistant and finding it useful before putting the work in to implement that option. Sounds as if it is definitely going to be wanted 3 hours ago, omartian said: So tried using this plugin, but getting an error message. Based on the syslog that jorge highlighted above, it looks like the error happens between sectors 18795850000 and 25097678000 Looks like you added an extra 0 on the end. I will improve the error message to include the acceptable range which might help with picking this up. Quote Link to comment
Tigerherz Posted June 3, 2021 Share Posted June 3, 2021 I had similar problems. It was a problem with spin down disks. I set spin down in disksettings to never and clear stats on the mainpage. I think my controller has a problem with spin down. Quote Link to comment
omartian Posted June 3, 2021 Author Share Posted June 3, 2021 (edited) 4 hours ago, Tigerherz said: I had similar problems. It was a problem with spin down disks. I set spin down in disksettings to never and clear stats on the mainpage. I think my controller has a problem with spin down. Do you disable spin down for parity checks or at all times? If your disks are spinning all the time, isn't that bad for longevity? Also, if you got an error this way, do you run a correcting check afterwards? Edited June 3, 2021 by omartian Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.