Lexicon Posted August 5, 2022 Share Posted August 5, 2022 Hello, I've been seeing some instability with my server since upgrading to 6.10. Sometimes the web UI becomes unresponsive, sometimes the server crashes and I have to reboot it, and quite frequently the parity check doesn't finish and I'm unable to cancel it requiring a reboot. I'm hoping someone can look at the diagnostics and let me know if this is 6.10 related or some sort of coincidental hardware issue. These diagnostics were downloaded after parity check slowed to a crawl and wouldn't let me cancel. When I tried to reboot the server using the webUI it didn't work and I had to do a hard reset. I didn't have any issues like this on the latest 6.9 version. Thanks! tower-diagnostics-20220805-1730.zip Quote Link to comment
trurl Posted August 6, 2022 Share Posted August 6, 2022 GPFs and btrfs csum errors, make me suspect RAM Have you done memtest lately? Quote Link to comment
Lexicon Posted August 6, 2022 Author Share Posted August 6, 2022 Thanks for your reply. I ran memtest on this build for many hours in the past, never finding an issue. Did a quick memtest now and again nothing so far; it's still running. BIOS is in fail-safe default with no overclocking. Quote Link to comment
Lexicon Posted August 6, 2022 Author Share Posted August 6, 2022 I'll add that I can't quite remember if these issues began with an upgrade to 6.10.x or with a USB key failure, but both happened around the same time. I replaced the USB key using a backup which was saved to my array nightly. Is it possible that the drive was failing over some time, and that some if its data was corrupted when backed up, and now that corrupted data on the new USB is causing wonky behaviour? Quote Link to comment
Lexicon Posted August 11, 2022 Author Share Posted August 11, 2022 I downgraded Unraid to 6.9.2 and all stability issues seem to have been resolved. I still get a high number of errors during a parity check. What could be causing stability errors under 6.10.3, and what could be the cause of the parity errors if my memory checks out fine? Diagnostic reports from 6.9.2 attached. tower-diagnostics-20220811-1400.zip Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 Successive parity checks are correcting different sectors, along with On 8/5/2022 at 8:08 PM, trurl said: GPFs and btrfs csum errors, make me suspect RAM Try the newer memtest at memtest86.com Quote Link to comment
Lexicon Posted August 11, 2022 Author Share Posted August 11, 2022 The newer version of Memtest is going to be a no-go, as it doesn't support legacy BIOS, and my motherboard doesn't support UEFI. The newer versions of Memtest won't boot in legacy BIOS. Any other suggestions would be appreciated. Quote Link to comment
trurl Posted August 11, 2022 Share Posted August 11, 2022 run builtin memtest overnight Quote Link to comment
Lexicon Posted August 12, 2022 Author Share Posted August 12, 2022 Found two memory errors after running memtest for 20+ hours. Is there an easy way to figure out which of the 4 sticks is bad? Could this actually be a motherboard error or some other component? I was planning a server rebuild at some point in the near future, so not sure if it's worth replacing memory in this one or just scrap the whole thing and build new? Thanks for your tips so far! Quote Link to comment
trurl Posted August 12, 2022 Share Posted August 12, 2022 5 minutes ago, Lexicon said: I was planning a server rebuild at some point in the near future, so not sure if it's worth replacing memory in this one or just scrap the whole thing and build new? Unless you don't plan to use this one at all, you need to do something about the memory. You don't even want to try to run any computer unless memory is perfect. Everything goes through RAM, the OS and other executable code, your data, everything. The CPU can't do anything with anything until it is loaded into RAM. Could be stick, could be slot. I guess other parts of motherboard could be to blame but you probably can't do anything about that. You will have to memtest different combinations. You could try to run on 2 sticks, but no VMs or many dockers. Quote Link to comment
Lexicon Posted August 12, 2022 Author Share Posted August 12, 2022 Ugh. I'll start the fun exercise of trying various combinations of sticks and slots. 🙂 Quote Link to comment
Lexicon Posted August 14, 2022 Author Share Posted August 14, 2022 (edited) I noticed some strange behaviour. My G.Skill DDR3 RAM ships with an XMP profile of 1600 MT/s CL9-9-9-24 at 1.5v. Its SPD rating is 1333 MT/s at 1.5v. Even though I ran the previous memtests with XMP disabled in the BIOS, it still detected and ran the RAM at 1600 and didn't step down to 1333, and used weird timings of 9-9-9-28, so I guess it was still technically overclocked? I removed some resistors from my Noctua fans to make them run at full speed, I swapped one stick of RAM from Channel A with one from Channel B in case my memory kits were mismatched (they came as 2x2GB and I have 8 GB total), I re-seated each stick and I manually set the memory at 1333 and latency of 9-9-9-24. I ran another memtest for 33 hours and not a single error. Does that mean that the memory is fine and it just doesn't like to run at 1600 in my system? If the memory is now fine and I continue to see issues with parity check errors, where do I look next? Also, when I manually set it at 1333 in BIOS and leave timings on auto, the BIOS actually wants to use something like 8-8-8-24. Should I let it do that or leave them at 9-9-9-24? Thanks! Edited August 14, 2022 by Lexicon Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.