Unraid crashes randomly, takes a few restarts to come back up

March 14, 20242 yr

Hey,

I have been getting random crashes and then usually it keeps crashing until i pull out the usb and manually edit the config to disable docker on boot, sometimes i will have to do this 2 or 3 times and then the system lets me turn the docker service back on.

i also had 1 usb become corrupted (according tot he unraid ui) due to the hard shutdowns i have to perform due to the entire system locking up.,i assume

So i have been google these issues and the most common suggestions i see are
* make sure your not using macvlan (im not)

* a dodgy container - if this is the case is there really no other way to test besides just not running containers for a while?

The other thing to note is this started occuring alot more when i replaced my mobo and cpu (i7-14700K, Z790)

attached is a snippet of the panic when this lockup occurs, one thing i noticed was `kernel tried to execute NX-protected page - exploit attempt? (uid: 99)` should i be worried

Also i am being spammed by port reallocations and ipv6 reallocations for 1 port? reading up on this most people say its normal and itll just be 1 container causing pain, but is it normal to do this many logs non stop were talking 3 eth interface renames every second? (also attached)
This ended up being a docker container stuck in a reboot loop opps...

i was getting an actual cpu panic a few weeks ago, and moving my appdata to a exclusive share seemed to fix that issue.

Thanks!

crash1.txt

Explore-logs-2024-03-15 01 49 12.txt

Edited March 14, 20242 yr by phyzical

Quote

March 14, 20242 yr

Community Expert

The sinlge call trace doesn't give a solid clue, at least not to me, if you haven't yet run memtest, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Quote

March 14, 20242 yr

Author

Thanks for the suggestion, it is new ram and i have never mem tested it so ill give that a go next time it crashes to rule out.

Its just like all the other threads like this ofc, its almost impossible to replicate, like for example the server had been running for around a week and me spinning down the array is what locked it up . the even crazier part was i just kept getting served the boot screen i thought i had lost another usb within a week, but once i manually edited the config in a windows machine it started booting unraid again. but with crashes until i think the 3rd boot.

Quote

March 14, 20242 yr

Please post in the proper section of the forum.

Not there :

image.png.bd17d573c70b55c08673dfffb4d47d66.png

I'll move it.

Quote

March 15, 20242 yr

Author

another crash but there is about 4 in this one, they dont even seem to be the same issue.. so i am suspecting hardware more now...

i updated the bios this morning just incase

Explore-logs-2024-03-15 09 06 18.txt

Quote

March 15, 20242 yr

I'm having a similar problem, and I'm running on an i9-14900k with an Asus TUF Gaming Z790-PRO. I've tried narrowing down the scope of the problem, and for me it seems to be happening when I have docker running. Mine seems to have problems whenever I start my Immich container, but I thought the same about a MusicBrainz container and its persisted anyway, so I'm entirely unsure if that's the real culprit.

Quote

March 15, 20242 yr

Author

im not using either of those containers, but it is "isolated" to something docker adjacent

im more suspicious of heavy network + harddrive activity, i.e i find it seems to occur when im transcoding via tdarr but not always, like theres something fishy in a media file that causes a cpu panic?

The weird part is the main server where tdarr and the shares live doesnt actually do any of the transcoding, but it does host a shared cached for the slave pcs to use and the media it will transcode. Though as files get changed it would incurr load for the raid system and also any containers that are watching for file changes i.e jellyfin, sonarr, radarr

So this would incur lots of back n forth between 3 pcs, this would also cause load on multiple hardrives at once and the cache drive.

i also have two pci devices to support additional hardrives (maybe that plays into things)

i wouldnt be suprised if theres just alot going on at once and it gives up, though i had no issue on my old mobo until it was fried one morning (maybe my old mobo got fried for something that is now crashing instead? 😆)

Edited March 15, 20242 yr by phyzical

Quote

March 18, 20242 yr

Author

another small update,

I changed my cache disk from btfs to zfs and now instead of the cpu panics crashing the entire unraid system. they instead seem to cause certain containers to fail for example, the last crash i got it seemed to kill jellyfin spitting ffmpeg errors, but idk if its container specific as the entire docker processes refuses to kill the container even with an docker rm.

then if i try to shutdown the system it gets stuck after unmounting everything saying "clean shutdown" "mounting /boot readonly"

Quote

March 18, 20242 yr

Community Expert

If you have issues with two different filesystems it suggests to me a hardware problem, start by running memtest, if no errors are found try using just one stick of RAM, if the same try the other one, that will basically rule out a RAM issue.

Quote

March 22, 20242 yr

Author

small update:

ran mem tests for both sticks all passed after 4 hours, ran it with 1 stick all passed in about 2.5 hours.

Then unraid took 6 reboots with constant crashes as soon as docker started, manually disabled the docker service and it came alive again. One thing i did notice is that my docker img was actually still a btfs img, so i figured screw it ill try the directory filesystem instead, so its now using the zfs cache disk instead of a btfs docker img file.

As soon as i started installing my images i noticed similar seg faults occurring but it all came up fine, has been running for 2 days now, i have noticed more seg faults but instead of it eventually killing/ locking up the entire docker process its letting the containers that are crashing gracefully release the cpu threads?

Ill post back next week if i dont need to intervene this time. so far atleast this avoids the problem. why im getting segfaults idk hardware issues? but if it keeps chugging along what do i care 🤞

Edited March 22, 20242 yr by phyzical

Quote

March 22, 20242 yr

Community Expert

9 minutes ago, phyzical said:

why im getting segfaults idk hardware issues?

Possibly.

Quote

March 26, 20242 yr

Author

Okay has now been stable for almost a week which i have not had since getting the new hardware, again the true cause of the panics i havnt found.

But if you feel like your experiencing something similar the following should hopefully keep it chugging along

* make sure your cache drive is not btfs

* make sure your docker image is not btfs

* make sure your not using maclan

* run memtest just incase

in my case im 99% sure im avoiding the lockups just by moving off btfs.

Quote

March 26, 20242 yr

Author

ah actually spoke too soon... stopped the array juuust incase and again system locked up and took 3 restarts to come back on so i am kinda leaning towards it being bad data in the array casuing issues?

i have since noticed this image.png.68fd1022c278e1036089129ac1b07842.png

and if i use tab completion image.png.5cae2bdf764dfcf1626af7bbdf8d7e3f.png

it comes up as pokmon but doesn't exist apparently? im wondering if i just have some bad blocks in the array? but im not sure how to remove these as they technically don't exist?

has anyone experienced this befoe

Quote

March 26, 20242 yr

Community Expert

Post new diags.

Quote

March 27, 20242 yr

Author

sure, nothing i can see that different to the others but attached.

Explore-logs-2024-03-27 10 57 31.txt

the other reason i am suspicious around this directory is it could have todo with the startup loop issue as that panic is right after jellyfin starts up and jellyfin has this in its logs

Edited March 27, 20242 yr by phyzical

Quote

March 27, 20242 yr

Community Expert

There are multiple call traces logged, but can't see if they are hardware or software, you can try to run a while without any containers, and if stable, start them one by one and re-test.

Quote

March 27, 20242 yr

Author

yeah thanks, i was about to do exactly this i also noticed the logs i posted were missing i think due to the timing around the syslog server starting up and what ends up in the saved filesyslog.txt attached is the same log but with the full startup of stuff incase it leads to anything else.

Quote

April 3, 20242 yr

Author

another small update,

i started having it crash once an hour performing heavy cpu intensive workloads, were talking 95-100c for 3 mins straight which made me think i might have a cooling issue.

Turns out the i7-14700K just runs realllly hot.

Anyway that then led me to this intel thread https://community.intel.com/t5/Processors/Unstable-i7-14700k/m-p/1569028

After applying method 2 which is
```

Method 2

Access BIOS

select "Tweaker"

select "Advanced Voltage Settings"

select "CPU/VRAM Settings"

adjust "CPU Vcore Loadline Calibration"

recommend starting from "Low" to "Medium" until system is stable.
```

the same intense task run for 12 hours straight without a crash... so my issues may have just been the cpu freaking our about not getting enough oomph due to a shitty mobo default..

Will post back and close if it says up for a week 🤞

Quote

1

April 8, 20242 yr

Author
Solution

hasn't crashed,

if you think your running into this try all the generic stuff first

changing the FS from btrfs to xfz/zfs seemed to reduce the issues as the cpu panics would avoid system lockups but was not the root cause

it just looks like my new cpu just kept cpu panicking due to default power restrictions. even without any cpu boosting on... its just a greedy mofo.

Quote

July 23, 20242 yr

Author

quick update for future readers,t he latest bios update for my mobo almost removed these issues, looks like they apply similar bios changes to the above to keep it "less greedy"

Quote

July 28, 20242 yr

On 7/23/2024 at 4:30 AM, phyzical said:

quick update for future readers,t he latest bios update for my mobo almost removed these issues, looks like they apply similar bios changes to the above to keep it "less greedy"

I think you and I are having the same problem. If you haven't already seen the statement from intel on 13/14th gen cpus, I would recommend checking it out. They are going to be releasing a microcode update in mid August. I have also been having the same problem, and I just recently updated my bios. This has seemingly stabilized things for me, too. However, I would recommend keeping an eye out for another bios update in August. https://community.intel.com/t5/Processors/July-2024-Update-on-Instability-Reports-on-Intel-Core-13th-and/m-p/1617113

Additionally, you may want to consider an RMA for your CPU just to eliminate any concerns that your CPU could have permanent damage. I'm weighing whether I am going to attempt to RMA mine. I think I'm going to wait for the bios update in August for now. This article touches more on the potential permanent damage to affected CPUs. https://www.techradar.com/computing/cpu/intel-admits-damage-to-unstable-14th-gen-and-13th-gen-cpus-is-permanent-incoming-patch-is-a-preventative-not-a-cure

Quote

July 28, 20242 yr

Author

@Zacaroniithanks for the links, yeah idk if ill bother with rma either. just know that if they start everyone will and they will do everything they can to delay and avoid.

fingers crossed the next patch does the job,

Quote

September 11, 20241 yr

Author

can confirm since the latest wave of bios updates it hasnt died once

Quote

Unraid crashes randomly, takes a few restarts to come back up

Featured Replies

Solved by phyzical

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)