emnclarke Posted October 30, 2020 Share Posted October 30, 2020 Late september ish I started having unraid (6.9 latest beta, currently beta30) lock up and it requires a hard reboot to fix. Initially, docker shuts down and most CPU cores go to 100%. Within 1-5 minutes, the Unraid ui stops responding and the server no longer responds to pings or ssh. Attempting to reboot/shutdown from the UI while it's still responsive does not work and just enters the unresponsive state. A hard reset is the only way to fix this. I've determined it is extremely likely that it only happens while the organizr docker is running. Possibly only happens while a browser has organizr open but I'm not 100% sure about that. I was having near daily unraid crashes so I spent the last week with organizr not running crash free and two nights ago turned it back on (although wasn't using it) and yesterday when I started using it almost immediately I had another crash. In the syslog, crashes always start with the following message or something very similar: Oct 29 10:25:30 Mercury kernel: BUG: kernel NULL pointer dereference, address: 0000000000000402 Oct 29 10:25:30 Mercury kernel: #PF: supervisor read access in kernel mode Oct 29 10:25:30 Mercury kernel: #PF: error_code(0x0000) - not-present page Oct 29 10:25:30 Mercury kernel: PGD 0 P4D 0 Oct 29 10:25:30 Mercury kernel: Oops: 0000 [#1] SMP NOPTI Oct 29 10:25:30 Mercury kernel: CPU: 6 PID: 118105 Comm: php-fpm7 Tainted: P O 5.8.13-Unraid #1 Oct 29 10:25:30 Mercury kernel: Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS Gaming 7/X399 AORUS Gaming 7, BIOS F12 12/11/2019 Oct 29 10:25:30 Mercury kernel: RIP: 0010:fuse_readahead+0x124/0x352 Does anyone have any ideas what could be causing this and any suggestions for how I could fix this so I can keep using organizr? It's possible the issue is something to do with one of my other dockers being in an iframe but I don't know why that would be an issue. Yesterday the crash happened while I was looking at nzbget, nzbhydra, and radarr v3. I posted this on the Organizr discord but they seem to think it's an unraid issue since there are no other reports of similar behaviour. I have a number of unraid plugins and other dockers running although I've managed to trigger a crash with most dockers and some plugins disabled. I've confirmed it's not the unraid Nvidia build (crashes happen on stock). I've also disabled the cachedir plugin which may have been causing some other issues but crashes still happen. If it is Organizr causing the crashes, how can I prevent a docker from taking down my whole system? Is there perhaps some obscure conflict I'm not aware of? I appreciate any suggestions and can provide any addition info I missed. Thanks so much for any help! I've attached diagnostics and the full syslog for yesterday. I've also run several memtests without error. Also attached a list of hardware and plugins. Tagging per request from organizr discord: @Roxedus @tronyx mercury-diagnostics-20201030-1740.zip syslog2020-10-29 copy.txt hardware.txt plugins.txt 1 Quote Link to comment
emnclarke Posted November 1, 2020 Author Share Posted November 1, 2020 Thanks for the link! The symptoms seem similar enough that it may be related. I'll update after I've tried disabling xmp. It seems strange that it would only be an issue with one docker though. Quote Link to comment
Roxedus Posted November 17, 2020 Share Posted November 17, 2020 Just visiting to say I see this too, when said container is running, after jumping on the beta. (embarrassing, since I made the container) I did manage to get a dump of the syslog as it happened. It makes me believe its something with the php we ship in the container. https://pastebin.com/ji6Ph0cE Quote Link to comment
emnclarke Posted November 18, 2020 Author Share Posted November 18, 2020 I'm not crazy! I've been testing more and it's not related to my ram. I've narrowed it down to only happening when NZBGet is open, any chance you use that as well? Specifically when flipping back and forth between sonarr/radarr and nzbget. I can't seem to get it to trigger without NZBget open. Quote Link to comment
Roxedus Posted November 18, 2020 Share Posted November 18, 2020 Just leavng it open on the homepage does it for me Quote Link to comment
Squid Posted November 18, 2020 Share Posted November 18, 2020 5 hours ago, Roxedus said: Just visiting to say I see this too, when said container is running, after jumping on the beta. ( If only to rule it out, what is the /config mapping set to? /mnt/cache/... or /mnt/user/... Quote Link to comment
Nimrad Posted November 18, 2020 Share Posted November 18, 2020 5 hours ago, emnclarke said: I'm not crazy! I've been testing more and it's not related to my ram. I've narrowed it down to only happening when NZBGet is open, any chance you use that as well? It seems I have the same issue as you guys. Just to elaborate, I don't use NZBGet. Quote Link to comment
Roxedus Posted November 18, 2020 Share Posted November 18, 2020 If only to rule it out, what is the /config mapping set to? /mnt/cache/... or /mnt/user/...My config is on user, as it has been the last 3 years. I will try a few hours on cache (it took 15 minutes to crash after starting it the last time) Quote Link to comment
Roxedus Posted November 19, 2020 Share Posted November 19, 2020 16 hours ago, Roxedus said: I will try a few hours on cache I have been running it the whole day with cache as /config. Wasnt that fuse stuff sorted out? As i said, this is the first time i have seen using the user mount has caused any issues for me (i know other peaoples trackrecord are way worse in this area) Quote Link to comment
Squid Posted November 19, 2020 Share Posted November 19, 2020 In theory it has. However there still are sometimes for some users for some unknown reason that it doesnt work. Thus far no author has ever been able to state why. Quote Link to comment
Squid Posted November 19, 2020 Share Posted November 19, 2020 Are you running 6.9 or 6.8? Quote Link to comment
Roxedus Posted November 19, 2020 Share Posted November 19, 2020 Are you running 6.9 or 6.8?Both me and OP are running 6.9, I am on beta35. Quote Link to comment
ddozen Posted November 25, 2020 Share Posted November 25, 2020 Seems im also affected. Getting crashes every few 1-4 days. I'm trying to narrow down if its related to any specific container. I am on beta35. No NZBGet syslog: https://pastebin.com/vGkTwG2U Quote Link to comment
emnclarke Posted November 25, 2020 Author Share Posted November 25, 2020 3 minutes ago, ddozen said: Seems im also affected. Getting crashes every few 1-4 days. I'm trying to narrow down if its related to any specific container. I am on beta35. No NZBGet syslog: https://pastebin.com/vGkTwG2U I set my /config to /mint/cache/... and I haven’t had a crash since. Try that I’d yours isn’t. 1 Quote Link to comment
ddozen Posted November 25, 2020 Share Posted November 25, 2020 (edited) 41 minutes ago, emnclarke said: I set my /config to /mint/cache/... and I haven’t had a crash since. Try that I’d yours isn’t. Are you talking about setting /config Container Path to /mnt/cache instead of /mnt/user? f.e. default for organzir /mnt/user/appdata/organizrv2 change it to: /mnt/cache/appdata/organizrv2 did you set it for specific container or all of them? EDIT: changed for all of them. Let's hope it helps Edited November 25, 2020 by ddozen Quote Link to comment
emnclarke Posted November 25, 2020 Author Share Posted November 25, 2020 3 hours ago, ddozen said: Are you talking about setting /config Container Path to /mnt/cache instead of /mnt/user? f.e. default for organzir /mnt/user/appdata/organizrv2 change it to: /mnt/cache/appdata/organizrv2 did you set it for specific container or all of them? EDIT: changed for all of them. Let's hope it helps You should only need to change it for Organizr if you're just having issues with Organizr. Quote Link to comment
jungle Posted December 6, 2020 Share Posted December 6, 2020 Same issue from my post on. You guys all stable after moving to mnt/cache? Quote Link to comment
Roxedus Posted December 6, 2020 Share Posted December 6, 2020 Same issue from my post on. You guys all stable after moving to mnt/cache? I am, yes Quote Link to comment
emnclarke Posted December 6, 2020 Author Share Posted December 6, 2020 2 hours ago, jungle said: Same issue from my post on. You guys all stable after moving to mnt/cache? No crashes since! I am confident that mnt/user was causing the issue. Quote Link to comment
Stupifier Posted December 7, 2020 Share Posted December 7, 2020 After searching deeper....I believe I am affected by this too! Here is link to associated syslog showing the CPU Taint related to php. I have Organizr active all the time. 6.9 beta 35. I've been getting crashes almost daily! I already tried all the idle power and global C-States Ryzen BIOS settings and didn't help I will try turning Organizr off first....then if stable a while...I'll flip the appdata from /user to /cache https://forums.unraid.net/topic/99741-unraid-crashing-frequently/ Quote Link to comment
jungle Posted December 7, 2020 Share Posted December 7, 2020 8 minutes ago, Stupifier said: After searching deeper....I believe I am affected by this too! Here is link to associated syslog showing the CPU Taint related to php. I have Organizr active all the time. 6.9 beta 35. I've been getting crashes almost daily! I already tried all the idle power and global C-States Ryzen BIOS settings and didn't help I will try turning Organizr off first....then if stable a while...I'll flip the appdata from /user to /cache https://forums.unraid.net/topic/99741-unraid-crashing-frequently/ I’m doing the the exact same. Something to note too. I leave my unRAID web UI on all day and I’ve been getting non stop nginx worker process messages filling up my logs. Supposedly closing out of the UI will stop this do I’ve also done that. Quote Link to comment
Stupifier Posted December 7, 2020 Share Posted December 7, 2020 6 minutes ago, jungle said: I’m doing the the exact same. Something to note too. I leave my unRAID web UI on all day and I’ve been getting non stop nginx worker process messages filling up my logs. Supposedly closing out of the UI will stop this do I’ve also done that. I never have nginx worker process messages spamming my syslog ......but being on 6.9 beta 35.....the syslog server is broken so only way to capture logs is to tail the syslog in a terminal or keep the syslog ui window open forever capturing. Kind of a pain but ok.... Quote Link to comment
jungle Posted December 7, 2020 Share Posted December 7, 2020 Do you leave the GUI open and logged in for days on end? Quote Link to comment
Stupifier Posted December 7, 2020 Share Posted December 7, 2020 2 minutes ago, jungle said: Do you leave the GUI open and logged in for days on end? Typically yes.....and for this troubleshootings sake....I also have the GUI syslog window open so I can capture a syslog when the crash occurs. I have to do this because the syslog server is broken in 6.9 beta right now. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.