dhomas Posted April 10 Share Posted April 10 Hello there! I am still relatively new to unraid, having build my machine and installed it about December 2023. In my time using unraid, it has been quite unstable and I don't know why. I tried many things to stabilize it, but it's only gotten worse. Here is a timeline of events: Install on 6.2.18 around December 2018 Imported about 60TB from old NAS Started installing some containers in parallel Some containers may have been setup incorrectly as the system became unstable after installing them; uninstalled all containers System was stable-ish Upgraded to 6.2.19 Every so often (maybe once a week), the system would lock up: WebUI unavailble, no response from keyboard presses, no activity on attached monitor, single-press power button did not trigger shutdown Tried to update BIOS, which resulted in my RAM no longer being correctly recognized by my motherboard (issue raised to Asus) Downgraded BIOS; tested RAM extensively (8 passes on MemTest86+, see attached) Installed Immich Upgraded to 6.2.20 Now extremely unstable; cannot even stay up long enough to complete parity check The WebUI remains accessible, but all shares disappear. I can shut down via the WebUI. This is also the case if I start in safe mode Diagnostics are attached. Thanks for any help! This is driving me bonkers! undom-diagnostics-20240410-0833.zip Quote Link to comment
JorgeB Posted April 10 Share Posted April 10 Enable the syslog server and post that after a crash. Quote Link to comment
dhomas Posted April 10 Author Share Posted April 10 Syslog is already enabled, but I could not retrieve it as the filesystem was no longer accessible and SCP was not working. I will turn it on again and retrieve the syslog and post it here. Quote Link to comment
dhomas Posted April 10 Author Share Posted April 10 Here is the syslog leading up to the crash (syslog.log) as well as the syslog created upon startup to recover the files (syslog-new.log). Thanks for any help! syslog.log syslog-new.log Quote Link to comment
JorgeB Posted April 10 Share Posted April 10 Apr 10 07:42:39 unDOM kernel: shfs[9683]: segfault at 151875514770 ip 0000151875514770 sp 000015187421feb0 error 14 likely on CPU 10 (core 20, socket 0) SHFS crashed almost right after array start, you will need to reboot to get the shares back, if this keeps happening there could be an underlying issue. Quote Link to comment
dhomas Posted April 10 Author Share Posted April 10 I started up again and so far no errors after about 2 hours. SHFS crashed within about 1h20m last time. I hope it will complete the parity check this time around (only 22 hours to go! ). I noticed another segfault at startup for a process that I'm unfamiliar with (and for which searching didn't provide much of use): Apr 10 06:23:29 unDOM kernel: update-mime-dat[1575]: segfault at 150000 ip 00001517c54af159 sp 00007ffc1f731778 error 4 in libc-2.37.so[1517c536c000+169000] likely on CPU 12 (core 24, socket 0) Apr 10 06:23:29 unDOM kernel: Code: 77 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 89 f8 48 89 fa c5 f9 ef c0 25 ff 0f 00 00 3d e0 0f 00 00 0f 87 37 01 00 00 <c5> fd 74 0f c5 fd d7 c1 85 c0 74 5b f3 0f bc c0 c5 f8 77 c3 0f 1f In any case, I'll keep monitoring. When it works, unraid is so very powerful. Coming from a Drobo that only just barely served files (their "DroboApps" were laughable), it's been a blast to use. I just really need it to be stable. Thanks for your help! Quote Link to comment
JorgeB Posted April 11 Share Posted April 11 Keep monitoring, but different apps segfaulting usually points to a hardware issue, like bad RAM Quote Link to comment
dhomas Posted April 11 Author Share Posted April 11 I really thought it was bad RAM, too. But I tested it for over 13 hours and 8 passes. Could a bad USB flash drive cause something like this? It shouldn't, I think, since unraid is loaded to RAM on boot from my understanding. But I think I've had a crash before when I bumped the USB boot drive, so I'm not sure. Quote Link to comment
Solution JorgeB Posted April 12 Solution Share Posted April 12 Bad flash drive should not cause this, memtest is only definitive if it finds errors, if you have multiple sticks try with just one, if the same try with a different one, that will basically rule out bad RAM. Quote Link to comment
dhomas Posted April 12 Author Share Posted April 12 I've got 2 sticks of 48GB. I'll plan some downtime (this is my "production" Plex server) to test the sticks individually. Right now, I've been up for 1 day 16 hours and parity check has completed (after the unclean shutdowns). Thanks again for your support! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.