[6.12.x] Completely inexplicable, random crashes for 2+ months

wug · October 27, 2023

I rolled back to 6.12.2, and it's already crashed. If I recall correctly, I actually upgraded to 6.12.3 from a lower version (6.12.0 or 6.12.1), so just to be thorough, I've now rolled back all the way to 6.11.5. I have a parity check in progress now, so let's see if I can successfully complete one of these for the first time in almost three months!

bastl · October 28, 2023

I'am kinda in the same boat as you. Random crashes on all 6.12 releases I tested. I also tried all sorts of combination with Dockers started or stopped, same with VMs. 14h Memtest with 0 errors. Smart values from the disks show no errors. There is no clear indication what causes the crash for me. Sometimes during the night when idle, sometimes during the day on low load or even when transcoding a video with tdar. 30min after a fresh reboot it crashes/freezes on the next the run the server is stable for 3-4 days and as you experienced nothing in the logs. It's kinda frustrating. I'am back on 6.11.5 and it was stable for 11 days. I had a power outage 3 days ago and since than also no crash.

EofChris · October 28, 2023

I just want to leave a possible solution here, that worked for me after days of struggling. I don't know if this is an OS or hardware problem on my side, but turning off the cpu graphic unit in the BIOS of my Supermicro X11SSH-LN4F completly solved all problems.

This however means that you can no longer pass the gpu to your virtualization and in my case I also encountered crashes on 6.11.5, so this might not be the same problem.

schale01 · October 28, 2023

6 hours ago, EofChris said:

I just want to leave a possible solution here, that worked for me after days of struggling. I don't know if this is an OS or hardware problem on my side, but turning off the cpu graphic unit in the BIOS of my Supermicro X11SSH-LN4F completly solved all problems.

This however means that you can no longer pass the gpu to your virtualization and in my case I also encountered crashes on 6.11.5, so this might not be the same problem.

Interesting I also have a similar super micro board. The last stable system version that worked for me was 6.10.4. That version was rock solid and I never had any hangs. I've reported this hanging issue myself and seen it reported multiple times elsewhere. At this point I've just moved off of unRaid as it seems that I will be unable to upgrade to a stable version for my system configuration anytime soon.

Edited October 28, 2023 by schale01

wug · October 28, 2023

It's crashed three more times since yesterday. Each time, it crashed not too long after starting a parity check.

As part of my next debugging step, I reset the motherboard to default settings, and I now have an interesting clue. The system booted up just fine and the parity disk was missing. It remained missing after another reboot. All other disks are present and available:

It's not super likely to be a bad cable, because it's a SAS to SATA cable and the other three connected drives are available. I pulled the parity disk out, stuck it in the external USB drive caddy, and it shows up just fine. But now, another disk is missing:

Baffling. When I switched the Disk 7 to the SATA connector that parity was connected to, it was still missing.

I wonder if some BIOS setting was causing the system to miss that certain disks are having issues... somehow? But that barely makes sense. I'm gonna do some more fiddling with it to see if I can get it behaving.

wug · October 28, 2023

OKAY. I got all disks to be available again.

I'm now with a mobo that was reset to default settings (and it hasn't been updated away from the version that was stable for a year) and a nearly-fresh install of Unraid 6.11.5.

I'll start a new parity check and report back if things have been resolved or if further debugging is needed.

wug · October 29, 2023

It crashed again with the mobo reset to default settings.

I think one interesting clue is that after I rolled back to 6.11.5, when the server crashes, it actually seems to do a full system reset. I keep opening it up to find that it's crashed but is now waiting ready to go once again. This is in contrast to how it would require a long press on the power button to forcibly shut it down because it was TOTALLY unresponsive.

I'm going to try to graphics settings someone else described above, and if it crashes again, I'll try upgrading the BIOS.

wug · October 30, 2023

It crashes again after upgrading the mobo firmware. I'm officially out of ideas.

Interestingly, when it has crashed during these parity checks, according to the hourly array health notifications, it seems to crash in 4-5 hours, but then I'll log in the next day to find that the system has restarted and has had uptime for 12+ hours (with the array offline).

I'll note that the last few parity checks were in maintenance mode, so it's unlikely to be a filesystem-related issue.

I think this evening, I'll run a live Debian environment on this system instead of Unraid, so that I can make sure all of my disks are backed up to tape. I'll report back on system stability when it's just running Debian, because if it's still crashing, then presumably, it's hardware. If it's not crashing, then we can assume that it's Unraid being incompatible with this particular hardware for some reason.

[6.12.x] Completely inexplicable, random crashes for 2+ months

Debugging Process

Logging Process

Fresh Install

Unraid 6.12.4 is Fundamentally Broken on Some Systems?

User Feedback

Recommended Comments

wug 0

Link to comment

bastl 208

Link to comment

EofChris 0

Link to comment

schale01 2

Link to comment

wug 0

Link to comment

wug 0

Link to comment

wug 0

Link to comment

wug 0

Link to comment

Join the conversation