February 16, 20233 yr Unraid 6.11.5 Server had been running (mostly) flawlessly since 2016. About a month ago, I decided to update my firmware/bios for my motherboard (had lots of Spectre/Meltdown fixes and such). Things were going okay for a about a day, then I started getting full system freezes (dead UI, no SSH access, no docker/vm access, smb shares inaccessible). Requires going into IPMI to force a shutdown/reboot. Initially thought it was tied to VM (Nvidia passthrough), as the system would die the moment I spun a particular VM up, but I've since had the issue with all VMs and Docker service disabled (freeze can be triggered by SMB transfer). Troubleshooting: Physically Removed GPU and a cache pool drive that had some Smart errors Memtest on RAM SMART drive tests btrfs scrubs, xfs-repairs, parity checks Disabled everything except SMB shares Toggled P and C states in BIOS, looked for other relevant settings that might have changed Syslog server enabled, no entries during crash CPU/Disk temps are fine Changed network cables Changed switch ports Rolled back bios/firmware (rolled it forward again after no change) Currently, the system can be stable for a couple of days of light usage (Home Assistant, Plex, casual VMs), but eventually if I try to transfer files over SMB it might freeze. Once I reboot the system I can transfer the same file(s) (linux ISOs) to the same share(s) just fine (shares are set to NO cache). It has also triggered adding a few files to torrent managers (I've used deluge and transmission, both have caused freeze). I still have a few things I am going to try (removing RAM sticks, substitute hardware) but it seems really frustrating to have very little visibility to what is causing the freezes. Is there anything I'm missing? I've attached diags and a small snippet of syslog that contained a freeze. quiet-server-diagnostics-20230215-2338.zip syslog_snippet.txt
February 16, 20233 yr Community Expert Nothing relevant logged, this and the symptoms suggest a hardware problem.
January 31, 20242 yr Author For anybody that happens on this thread, I just kept changing hardware, eventually switching out the NIC and it seems to have been resolved since then (put GPU and other hardware back in without issue). I've since used that bad NIC in other machines without issue so no clue if there was something else causing a problem, but Unraid has been working flawlessly since changing it out.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.