November 24, 20241 yr New Unraid user. Unable to get my server stable. Been trying many things over a few weeks, but just cannot figure it out. I consistently get smartctl segfaults no matter what I try, they can be every few minutes, or an hour or more apart. Eventually the server becomes unresponsive via webui, and any docker that might be running stops. Typically within 12-24hrs. Was a struggle to get the parity check done for 8tb, without going down. - I have tried 5 different sticks of RAM. Individually and in their matched pairs. And memtested them OK - I have replaced the CPU (i5-7600k to i7-7700). Latest BIOS on board, unchanged for many years. - Ran in safe mode, with array offline - Replaced cache drive from a very old (with smart warning) 120gb SSD to new 64Gb Nvme (from a steam deck). - Removed PCIe SATA board, just using motherboard ports instead - Removed GPU - Ensured CPU is not overclocked or running XMP - Removed all non essential plugins - Has a brand new PSU Only thing left undiagnosed are my drives. the motherboard and USB flash drive. 3x 4tb drives are carried over from my previous windows server, I added a new 8tb parity. I did have a SMART error on two of those 4tb's several weeks back in windows. It kept popping up until I told windows to ignore the error and rebuild the array. I haven't had a problem since. I believe it errored due to a power failure, or shock (my 2 year old son slammed the cupboard door the server was in!) I think the drives are reporting OK now. Just started another full SMART test on all three, in case it shows anything. This is an example of the error I see in the logs: It's taken from the server right now, whilst running in safe mode. Diagnostics from last night attached. Please let me know if I can provide any further information. Thank you! unraidserver-diagnostics-20241123-2320.zip Edited November 24, 20241 yr by Toothy
November 24, 20241 yr Author Quick update: All SMART tests completed without error. I did notice a zpool segfault error every time I switched between drives on the self testing tab. Never been able to induce an error before now. But it's different to the one in my OP.
November 25, 20241 yr Community Expert I seem to remeber someone recently having only smartctl segfaults, in the end they found what was causing the issue, but I don't remember what the problem was, it wasn't an obvious issue, I tried a quick forum search but didn't find it, you may have better luck.
November 29, 20241 yr Author I've not been able to find anything by searching. I've ended up buying a different motherboard now, and I STILL have they same faults. I've basically bought a whole new system at this point, when the idea was to reuse my old hardware. I've also bought an unraid licence because my trial was coming to an end. Not knowing if I can resolve the issue. I've spent so many hours at this point tying different things and spent a fair amount of money to do it. The only thing I've not done is replace SATA cables and the drives themselves (however I still get the faults when the array isn't running), and the flash drive. Oh and the power supply, which was brand new for the build. Everything else I've either temporarily removed and eliminated as fault or totally replaced. I've tried lots of settings in the bios too, but maybe I've missed something there. Very frustrating. Edited November 29, 20241 yr by Toothy
November 29, 20241 yr Community Expert Try booting with a different flash drive using a stock install to see if it config related.
November 29, 20241 yr Author Tried doing as you suggested and it threw 3 errors within 10 mins of the server being up. I'll pull the power to all the drives next, even though they're not in an array, and see if that helps. Not sure I've done that yet.
November 30, 20241 yr Community Expert 13 hours ago, Toothy said: I'll pull the power to all the drives next, Worth a try, it appears to confirm it's something related to the hardware.
November 30, 20241 yr Author Some success, finally. Removing all the HDD's didn't make a difference, but removing the NVME cache drive did!! Server has been up for 4+ hours with zero errors. The drive I was using is a "Foresee E2M2 064G", which is spare from a Steam Deck. I am waiting a new NVME drive to arrive, so can run without cache for now. Previous to this I was using an old 120Gb SSD, but that did have a SMART error. So I swapped it for the Steam Deck one. I am wondering if I introduced some error from the SSD that carried over to the NVME. I just hope I don't get the errors again when I install the new NVME drive.
December 1, 20241 yr Community Expert Solution 15 hours ago, Toothy said: but removing the NVME cache drive did!! Yes, now that you mention it, I think this was also the case for the other user I mentioned, thanks for testing, I'll take note of this for any future cases.
December 7, 20241 yr Author Swapped in a 1TB SN770 NVMe drive, and have yet to see any errors. Potentially some incompatibility with the Steam Deck NVMe. Or an error carried over from swapping the cache drive from a failing one. Thank you for the support.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.