Strange Unraid crashes possibly linked to a docker


Recommended Posts

Late september ish I started having unraid (6.9 latest beta, currently beta30) lock up and it requires a hard reboot to fix. Initially, docker shuts down and most CPU cores go to 100%. Within 1-5 minutes, the Unraid ui stops responding and the server no longer responds to pings or ssh. Attempting to reboot/shutdown from the UI while it's still responsive does not work and just enters the unresponsive state. A hard reset is the only way to fix this.

 

I've determined it is extremely likely that it only happens while the organizr docker is running. Possibly only happens while a browser has organizr open but I'm not 100% sure about that. I was having near daily unraid crashes so I spent the last week with organizr not running crash free and two nights ago turned it back on (although wasn't using it) and yesterday when I started using it almost immediately I had another crash. In the syslog, crashes always start with the following message or something very similar:

 

Oct 29 10:25:30 Mercury kernel: BUG: kernel NULL pointer dereference, address: 0000000000000402 
Oct 29 10:25:30 Mercury kernel: #PF: supervisor read access in kernel mode 
Oct 29 10:25:30 Mercury kernel: #PF: error_code(0x0000) - not-present page 
Oct 29 10:25:30 Mercury kernel: PGD 0 P4D 0 
Oct 29 10:25:30 Mercury kernel: Oops: 0000 [#1] SMP NOPTI 
Oct 29 10:25:30 Mercury kernel: CPU: 6 PID: 118105 Comm: php-fpm7 Tainted: P O 5.8.13-Unraid #1 
Oct 29 10:25:30 Mercury kernel: Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS Gaming 7/X399 AORUS Gaming 7, BIOS F12 12/11/2019 
Oct 29 10:25:30 Mercury kernel: RIP: 0010:fuse_readahead+0x124/0x352

 

Does anyone have any ideas what could be causing this and any suggestions for how I could fix this so I can keep using organizr? It's possible the issue is something to do with one of my other dockers being in an iframe but I don't know why that would be an issue. Yesterday the crash happened while I was looking at nzbget, nzbhydra, and radarr v3.

 

I posted this on the Organizr discord but they seem to think it's an unraid issue since there are no other reports of similar behaviour. I have a number of unraid plugins and other dockers running although I've managed to trigger a crash with most dockers and some plugins disabled. I've confirmed it's not the unraid Nvidia build (crashes happen on stock). I've also disabled the cachedir plugin which may have been causing some other issues but crashes still happen. 

 

If it is Organizr causing the crashes, how can I prevent a docker from taking down my whole system? Is there perhaps some obscure conflict I'm not aware of?

 

I appreciate any suggestions and can provide any addition info I missed. Thanks so much for any help!

 

I've attached diagnostics and the full syslog for yesterday. I've also run several memtests without error. Also attached a list of hardware and plugins.

Tagging per request from organizr discord: @Roxedus @tronyx 

mercury-diagnostics-20201030-1740.zip syslog2020-10-29 copy.txt hardware.txt plugins.txt

  • Thanks 1
Link to comment
  • 3 weeks later...

I'm not crazy! I've been testing more and it's not related to my ram. I've narrowed it down to only happening when NZBGet is open, any chance you use that as well? Specifically when flipping back and forth between sonarr/radarr and nzbget. I can't seem to get it to trigger without NZBget open.

Link to comment
16 hours ago, Roxedus said:

I will try a few hours on cache

I have been running it the whole day with cache as /config. Wasnt that fuse stuff sorted out? As i said, this is the first time i have seen using the user mount has caused any issues for me (i know other peaoples trackrecord are way worse in this area)

Link to comment
41 minutes ago, emnclarke said:

I set my /config to /mint/cache/... and I haven’t had a crash since. Try that I’d yours isn’t.

Are you talking about setting /config Container Path to /mnt/cache instead of /mnt/user?

f.e.

default for organzir

/mnt/user/appdata/organizrv2

change it to:

/mnt/cache/appdata/organizrv2

 

did you set it for specific container or all of them?

 

EDIT: changed for all of them. Let's hope it helps

Edited by ddozen
Link to comment
3 hours ago, ddozen said:

Are you talking about setting /config Container Path to /mnt/cache instead of /mnt/user?

f.e.

default for organzir

/mnt/user/appdata/organizrv2

change it to:

/mnt/cache/appdata/organizrv2

 

did you set it for specific container or all of them?

 

EDIT: changed for all of them. Let's hope it helps

You should only need to change it for Organizr if you're just having issues with Organizr.

Link to comment
  • 2 weeks later...

After searching deeper....I believe I am affected by this too! Here is link to associated syslog showing the CPU Taint related to php. I have Organizr active all the time. 6.9 beta 35.

I've been getting crashes almost daily!

I already tried all the idle power and global C-States Ryzen BIOS settings and didn't help

 

I will try turning Organizr off first....then if stable a while...I'll flip the appdata from /user to /cache

 

https://forums.unraid.net/topic/99741-unraid-crashing-frequently/

Link to comment
8 minutes ago, Stupifier said:

After searching deeper....I believe I am affected by this too! Here is link to associated syslog showing the CPU Taint related to php. I have Organizr active all the time. 6.9 beta 35.

I've been getting crashes almost daily!

I already tried all the idle power and global C-States Ryzen BIOS settings and didn't help

 

I will try turning Organizr off first....then if stable a while...I'll flip the appdata from /user to /cache

 

https://forums.unraid.net/topic/99741-unraid-crashing-frequently/

I’m doing the the exact same. Something to note too. I leave my unRAID web UI on all day and I’ve been getting non stop nginx worker process messages filling up my logs. Supposedly closing out of the UI will stop this do I’ve also done that. 

Link to comment
6 minutes ago, jungle said:

I’m doing the the exact same. Something to note too. I leave my unRAID web UI on all day and I’ve been getting non stop nginx worker process messages filling up my logs. Supposedly closing out of the UI will stop this do I’ve also done that. 

I never have nginx worker process messages spamming my syslog
......but being on 6.9 beta 35.....the syslog server is broken so only way to capture logs is to tail the syslog in a terminal or keep the syslog ui window open forever capturing. Kind of a pain but ok....

Link to comment
2 minutes ago, jungle said:

Do you leave the GUI open and logged in for days on end?

Typically yes.....and for this troubleshootings sake....I also have the GUI syslog window open so I can capture a syslog when the crash occurs. I have to do this because the syslog server is broken in 6.9 beta right now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.