Jump to content

Docker unresponsive, Unraid at 100% cpu, eventual system crash.


Dreytac
Go to solution Solved by Dreytac,

Recommended Posts

I shouldn't have said anything... Docker has crashed again... The only hardware from the original build still there is the H200 and ServerRAID so it must be one of those causing an issue somehow... Those 2 100% cores just stay pinned to 100%.

docker-crash-2023-06-30.png

Edited by Dreytac
Link to comment
12 hours ago, ljm42 said:

Unraid 6.12.2 has a different version of Docker, recommend upgrading

 

Unfortunately the crashing was occurring before the upgrade to 6.12 on 6.11 so the Docker version isn't the issue (I was hoping the upgrade to 6.12 would fix the issue). The only issue I've had with 6.12 was the macvlan issue but that's fixed. I upgraded to 6.12.2 just after this crash but that's unrelated to this problem.

Link to comment
  • 3 weeks later...
  • Solution

I finally seem to have fixed this issue! It had NOTHING to do with hardware after all that. I found an obscure Reddit comment from a year ago that mentioned Docker consistently crashing and shares becoming unresponsive after a couple of days. The user worked out that there was a bug in either FuseFS, Docker or Unraid (or a combination) where there is a chance for FuseFS to crash when a Docker container uses a FuseFS share (usually /mnt/user/appdata). Switching all my containers to use /mnt/cache/appdata fixed the problem and they no longer crashed.

 

I found an option in Unraid under Global Share Settings that enables "exclusive shares" so as a test I set my containers back to using /mnt/user/appdata and sure enough, after 2 days, it crashed. I restarted, enabled the exclusive shares option (and made sure to change nothing else) and it's now been 8 days (the longest uptime I've had in over a year) since my last crash, seemingly confirming it's an issue with FuseFS, Docker and Unraid.

 

Unfortunately neither myself or the Reddit poster have been able to find or properly report the issue as it's not consistently reproducible and doesn't generate any logs.

Edited by Dreytac
  • Like 2
Link to comment
  • 3 months later...
On 7/16/2023 at 4:52 AM, Dreytac said:

I finally seem to have fixed this issue! It had NOTHING to do with hardware after all that. I found an obscure Reddit comment from a year ago that mentioned Docker consistently crashing and shares becoming unresponsive after a couple of days. The user worked out that there was a bug in either FuseFS, Docker or Unraid (or a combination) where there is a chance for FuseFS to crash when a Docker container uses a FuseFS share (usually /mnt/user/appdata). Switching all my containers to use /mnt/cache/appdata fixed the problem and they no longer crashed.

 

I found an option in Unraid under Global Share Settings that enables "exclusive shares" so as a test I set my containers back to using /mnt/user/appdata and sure enough, after 2 days, it crashed. I restarted, enabled the exclusive shares option (and made sure to change nothing else) and it's now been 8 days (the longest uptime I've had in over a year) since my last crash, seemingly confirming it's an issue with FuseFS, Docker and Unraid.

 

Unfortunately neither myself or the Reddit poster have been able to find or properly report the issue as it's not consistently reproducible and doesn't generate any logs.

 

I'm having these same symptoms, after being stable for ~6 months, for the past ~2 months I've been crashing daily (or multiple times a day!) and requiring hard boots.  I also notice CPU pegging randomly in the window just before the crash, and the NetData docker shows 100% cpu very consistently even if I don't happen to notice it happening. 

 

Happened on 6.12.* and when I downgraded back to 6.11.5

 

I also downgraded plex and a bunch of other dockers back to older versions to see if they were related. 

 

I'm switching all of my appdata shares over to /cache/ to see if that helps. 

 

Link to comment
20 hours ago, Terebi said:

 

I'm having these same symptoms, after being stable for ~6 months, for the past ~2 months I've been crashing daily (or multiple times a day!) and requiring hard boots.  I also notice CPU pegging randomly in the window just before the crash, and the NetData docker shows 100% cpu very consistently even if I don't happen to notice it happening. 

 

Happened on 6.12.* and when I downgraded back to 6.11.5

 

I also downgraded plex and a bunch of other dockers back to older versions to see if they were related. 

 

I'm switching all of my appdata shares over to /cache/ to see if that helps. 

 

I haven't had a single related crash since I made the change of enabling exclusive shares. I feel the issue is with the underlying FuseFS system combined with Docker. Enabling exclusive shares, or changing /mnt/user/appdata to /mnt/cache/appdata, bypasses that system. I still have no idea what is causing the issue in the first place, or if the issue is fixed with current updates (I'm not willing to break something that isn't currently broke to test). I can confirm it started happening during the 6.11 update releases but unsure which one.

 

I didn't end up finding any related Dockers as mine crashed with ANY Docker container accessing /mnt/user/appdata. Interestingly it was only the /mnt/user/appdata share that caused the problem. Other /mnt/user shares operated fine.

Link to comment
5 hours ago, Dreytac said:

I haven't had a single related crash since I made the change of enabling exclusive shares. I feel the issue is with the underlying FuseFS system combined with Docker. Enabling exclusive shares, or changing /mnt/user/appdata to /mnt/cache/appdata, bypasses that system. I still have no idea what is causing the issue in the first place, or if the issue is fixed with current updates (I'm not willing to break something that isn't currently broke to test). I can confirm it started happening during the 6.11 update releases but unsure which one.

 

I didn't end up finding any related Dockers as mine crashed with ANY Docker container accessing /mnt/user/appdata. Interestingly it was only the /mnt/user/appdata share that caused the problem. Other /mnt/user shares operated fine.

 

I had already had exclusive shares on, but I also went through and changed to /mnt/cache/appdata. It did not appear to resolve my issue.  I re-upgraded back up to 6.12.4 since 6.11.5 still had the issues.  I reset my BIOS back to default settings (in particular disabling c-states) to see if that makes any difference. 

 

If I'm still crashing, my next step is going to install a new USB with unraid and attach it to my shares/containers  to make sure there is nothing I screwed up in unraid config itself somewhere along the line. 

Link to comment
  • 3 months later...

I have been pulling my hair out for nearly a year with the exact same symptoms as OP for nearly a year. It started after completely upgrading all hardware except for hard drives and raid cards. I assumed when it started crashing this was either a memory issue or a cpu issue as my cooling solution could only handle my 11700k 99% of the time. My crashes only seem to happen overnight. But after upgrading the cooler and running a long memtest with no errors, I continue to have seemingly random crashes every 24-120 hours.

 

I will make the change to Exclusive Shares today and see if this resolves the issue. 

On 10/18/2023 at 9:12 AM, Dreytac said:

I haven't had a single related crash since I made the change of enabling exclusive shares. I feel the issue is with the underlying FuseFS system combined with Docker. Enabling exclusive shares, or changing /mnt/user/appdata to /mnt/cache/appdata, bypasses that system. I still have no idea what is causing the issue in the first place, or if the issue is fixed with current updates (I'm not willing to break something that isn't currently broke to test). I can confirm it started happening during the 6.11 update releases but unsure which one.

 

I didn't end up finding any related Dockers as mine crashed with ANY Docker container accessing /mnt/user/appdata. Interestingly it was only the /mnt/user/appdata share that caused the problem. Other /mnt/user shares operated fine.

By "change /mnt/user/appdata to /mnt/cache/appdata" do you mean manually editing all my docker containers to point to that share, or is there a way to do it systemwide?

Link to comment

I

15 minutes ago, YiddySchlomo said:

By "change /mnt/user/appdata to /mnt/cache/appdata" do you mean manually editing all my docker containers to point to that share, or is there a way to do it systemwide?

If appdata is exclusive to cache, /mnt/user/appdata is the same thing as /mnt/cache/appdata.

 

If you want us to take a look to see if you have some other problem you aren't aware of

 

Attach Diagnostics to your NEXT post in this thread.

Link to comment
46 minutes ago, YiddySchlomo said:

I have been pulling my hair out for nearly a year with the exact same symptoms as OP for nearly a year. It started after completely upgrading all hardware except for hard drives and raid cards. I assumed when it started crashing this was either a memory issue or a cpu issue as my cooling solution could only handle my 11700k 99% of the time. My crashes only seem to happen overnight. But after upgrading the cooler and running a long memtest with no errors, I continue to have seemingly random crashes every 24-120 hours.

 

I will make the change to Exclusive Shares today and see if this resolves the issue. 

By "change /mnt/user/appdata to /mnt/cache/appdata" do you mean manually editing all my docker containers to point to that share, or is there a way to do it systemwide?

Changing the shares wasn't the fix for the problem. I have containers running on "/mnt/user/appdata" now and they're running fine.

 

I'm 95% positive that setting "Permit exclusive shares" to "Yes" is what fixed the problem for me. I haven't had a single crash related to this Docker issue since changing that setting.

Link to comment

I have also been having the exact same symptoms. Since I built my server and installed unraid, it has been crashing every few days. On average..24 to 120 hours sounds about right.. high CPU usage every time. (Not sure what's causing the high spikes in CPU) maybe that's normal.. I'm new to unraid so I'm not sure. I have just made the change for exclusive shares... will report back in a few days.. thank you very much. I've been pulling my hair out as well. 

Link to comment

I wanted to come back and update this thread. Again, I was having the EXACT same issue as OP, and like OP I made the change to Exclusive Shares. I made the change on 1/23 and have been up since without crashing. I definitely would have had a crash given the frequency for almost a year. 

 

Clearly there is something wrong with Unraid's FuseFS implementation as previously suspected. Luckily there is a workaround, and hopefully this helps anyone having this same issue in the future.

Link to comment
  • 4 weeks later...

just wanted to add i was having intermittent problems too, then one morning the docker service would just lock up the entire server with the cpu stalling related logs no matter how many times i rebooted, i had syslog server writing logs to flash due to it becoming unresponsive proably 2 years ago and forgot, this eventually corrupted the usb (it was 6 years old) ( well unraid kept saying it was buggered)

transferred over to a new usb, blatted my docker img file and bam. same issue started instantly occurring as soon as i started loading up all my existing container configs

 

Then i stumbled across this thread Thanks @Dreytac moving my appdata to exclusive has stopped the crashing.


 

Edited by phyzical
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...