-
Docker failed to start error, server occasionally crashes
Update on this, adding extra RAM is the solution and my server has been up for 42days without issues. Thanks JorgeB!
-
Docker failed to start error, server occasionally crashes
Hi JorgeB, thanks for the suggestion. I will install additional RAM this weekend to see if that helps.
-
Docker failed to start error, server occasionally crashes
After rebooting, my CPU is hovering at 45% but I am unable to see any container using the resource through Netdata. This is what I see in the terminal with "htop" command. Should I be concerned about the repeated "docker run --log-opt max-size=50m --log-opt max-file=1 --log-level=fatal --storage-driver=btrfs my-container"? Thanks.
-
Docker failed to start error, server occasionally crashes
Feb 16, it crashed. I rebooted the server and backed up appdata. Feb 17, I upgraded to unRAID 7.0.0 without any issues! Updated all plugins and containers, also followed the suggested method linked above and rebooted to use around 500 MB of RAM to ensure the OS files always stay in memory. Feb 18, I was unable to connect to the server while at work, and I confirmed that the webUI is unresponsive, but I was able to Putty into server despite being very slow. Logging in took about 5 mins and I was able to generate diagnostics before I reboot, please see the attached. Reading the syslog, I am getting a bunch of ngix alert, then followed by php-fpm. Any help is much appreciated. Feb 18 00:30:14 unRAID nginx: 2025/02/18 00:30:14 [alert] 11587#11587: worker process 17638 exited on signal 6 Feb 18 00:30:14 unRAID nginx: 2025/02/18 00:30:14 [alert] 11587#11587: worker process 2093576 exited on signal 6 Feb 18 00:30:18 unRAID nginx: 2025/02/18 00:30:18 [alert] 11587#11587: worker process 2093577 exited on signal 6 Feb 18 00:30:21 unRAID nginx: 2025/02/18 00:30:21 [alert] 11587#11587: worker process 2093754 exited on signal 6 Feb 18 00:30:21 unRAID nginx: 2025/02/18 00:30:21 [alert] 11587#11587: worker process 2093797 exited on signal 6 Feb 18 00:30:21 unRAID nginx: 2025/02/18 00:30:21 [alert] 11587#11587: worker process 2093798 exited on signal 6 Feb 18 00:30:22 unRAID nginx: 2025/02/18 00:30:22 [alert] 11587#11587: worker process 2093799 exited on signal 6 unraid-diagnostics-20250218-1807.zip
-
Docker failed to start error, server occasionally crashes
Hi JorgeB, I was hesitant to update to v7.0 as I thought it could bring up additional issues. I will update it to 7.0 and apply the suggested method this weekend. Thanks.
-
Docker failed to start error, server occasionally crashes
Update: After rebuilding the docker image with the previous containers restored, server seems much more stable! I had 6 days of uptime. Yesterday I mounted an external hard drive with Unassigned Disk Devices, and it was dismounted on its own shortly after. I was unable remount after multiple tries, and today I found my server very slow - almost unresponsive at 6pm. I am getting the similar out of memory error, and php-fpm error. There is a high read in cache drive and some read on flash, I cannot confirm processes is consuming the 100% CPU usage. Feb 11 18:13:53 unRAID php-fpm[9253]: [WARNING] [pool www] child 32653 exited on signal 9 (SIGKILL) after 427.466938 seconds from start Feb 11 18:13:55 unRAID php-fpm[9253]: [WARNING] [pool www] child 2861 exited on signal 9 (SIGKILL) after 287.056577 seconds from start Feb 11 18:14:07 unRAID php-fpm[9253]: [WARNING] [pool www] child 10615 exited on signal 9 (SIGKILL) after 13.046237 seconds from start Feb 11 18:14:08 unRAID php-fpm[9253]: [WARNING] [pool www] child 10694 exited on signal 9 (SIGKILL) after 12.878893 seconds from start Feb 11 18:14:21 unRAID php-fpm[9253]: [WARNING] [pool www] child 11005 exited on signal 9 (SIGKILL) after 13.019033 seconds from start Feb 11 18:14:22 unRAID php-fpm[9253]: [WARNING] [pool www] child 11073 exited on signal 9 (SIGKILL) after 13.837164 seconds from start Please see the attached diagnostics and screenshot. Will provide an update when if it becomes unresponsive again. unraid-diagnostics-20250211-1828.zip
-
Docker failed to start error, server occasionally crashes
Thanks for your suggestion. I have deleted my docker.img and reinstalled plex, prowlarr, sonarr and qbitorrent as I think what caused the crash last night was sonarr finished downloading a couple shows and plex started processing them and lead to 100% CPU usage. I will try to see if I can recreate a crash with only these containers.
-
Docker failed to start error, server occasionally crashes
Thanks for reviewing my diagnostics, JorgeB. Server became unresponsive starting at 10Pm Feb 4, and it eventually ran out of memory by 5am. Hence when I check the server the morning, although it has become responsive, docker has failed to start. This reddit thread describes my scenario and the same php-fpm error that I got, server suddenly maxes CPU and becomes unresponsive until a force restart. The solution suggested in the thread is to disable DLNA server in plex settings. However, my DLNA server is already disabled in plex. I disabled DLNA server timeline reporting as well, which was previously enabled.
-
Docker failed to start error, server occasionally crashes
The server is now responsive but I get the Docker Service failed to start in the docker tab. It crashed around 10pm last night and I see alot of php-fpm error. I will fresh boot server and see if plex settings have DLNA enabled. If so, I will disable it and see how it goes. Feb 4 21:48:31 unRAID php-fpm[8792]: [WARNING] [pool www] child 10646 exited on signal 9 (SIGKILL) after 62.877191 seconds from start Feb 4 21:49:01 unRAID php-fpm[8792]: [WARNING] [pool www] child 15641 exited on signal 9 (SIGKILL) after 29.963122 seconds from start Feb 4 21:49:38 unRAID php-fpm[8792]: [WARNING] [pool www] child 7604 exited on signal 9 (SIGKILL) after 270.698684 seconds from start Feb 4 21:49:50 unRAID php-fpm[8792]: [WARNING] [pool www] child 18168 exited on signal 9 (SIGKILL) after 12.032912 seconds from start Feb 4 21:50:02 unRAID php-fpm[8792]: [WARNING] [pool www] child 18512 exited on signal 9 (SIGKILL) after 11.723276 seconds from start Feb 4 21:50:34 unRAID webGUI: Successful login user root from 100.100.129.75 Feb 4 21:56:25 unRAID php-fpm[8792]: [WARNING] [pool www] child 21744 exited on signal 9 (SIGKILL) after 349.084727 seconds from start Feb 4 21:59:51 unRAID php-fpm[8792]: [WARNING] [pool www] child 2495 exited on signal 9 (SIGKILL) after 300.569327 seconds from start Feb 4 21:59:53 unRAID php-fpm[8792]: [WARNING] [pool www] child 9295 exited on signal 9 (SIGKILL) after 168.833708 seconds from start Feb 4 21:59:54 unRAID php-fpm[8792]: [WARNING] [pool www] child 12955 exited on signal 9 (SIGKILL) after 77.512407 seconds from start Feb 4 21:59:58 unRAID php-fpm[8792]: [WARNING] [pool www] child 13046 exited on signal 9 (SIGKILL) after 79.697170 seconds from start Feb 4 22:00:09 unRAID php-fpm[8792]: [WARNING] [pool www] child 17618 exited on signal 9 (SIGKILL) after 16.257545 seconds from start Feb 4 22:00:12 unRAID php-fpm[8792]: [WARNING] [pool www] child 17622 exited on signal 9 (SIGKILL) after 17.983878 seconds from start Feb 4 22:00:16 unRAID php-fpm[8792]: [WARNING] [pool www] child 17628 exited on signal 9 (SIGKILL) after 20.068387 seconds from start Feb 4 22:00:24 unRAID php-fpm[8792]: [WARNING] [pool www] child 17642 exited on signal 9 (SIGKILL) after 24.160942 seconds from start Feb 4 22:00:31 unRAID php-fpm[8792]: [WARNING] [pool www] child 17676 exited on signal 9 (SIGKILL) after 21.346093 seconds from start Feb 4 22:00:35 unRAID php-fpm[8792]: [WARNING] [pool www] child 17683 exited on signal 9 (SIGKILL) after 21.563003 seconds from start Feb 4 22:00:38 unRAID php-fpm[8792]: [WARNING] [pool www] child 17687 exited on signal 9 (SIGKILL) after 20.247445 seconds from start Feb 4 22:00:49 unRAID php-fpm[8792]: [WARNING] [pool www] child 17699 exited on signal 9 (SIGKILL) after 23.025699 seconds from start unraid-diagnostics-20250205-0806.zip
-
Docker failed to start error, server occasionally crashes
The server has become unresponsive again.. I will let it run overnight to see if it becomes responsive tomorrow morning. I limit my containers so it does not go above 60% memory. However I saw the iowait spiked in Netdata and the CPU is at 100% on dashboard. I am starting to think if I should upgrade to new CPU, mobo, RAM and SAS controller all together..
-
Docker failed to start error, server occasionally crashes
Hi Jorge, thanks for the quick response! The only way I can enable docker without it crashing is to disable autostart for all containers, then manually start them individually. So far the server has not yet crashed yet. If the server boots with docker disabled, manually enable docker, immediately stop all services still results in a crash, and I get the "Docker service failed to start" error, and also the out of memory error. I will provide an update if it crashes again while I limit my containers. I will also buy more ddr3 RAM, currently running 16gb.
-
-
Docker failed to start error, server occasionally crashes
Hi, my server has occasionally been unresponsive in the past 6 months or so, and it was temporary fixed with a reboot but it would re-occur every month or so. Starting last month, I started having a hard time (required multiple boots with hours of wait in between) to have a functional array and dockers. This week, my server is unresponsive again and I had to reboot, but I am unable to start docker. Previously, my workaround method is: 1.) Start server, pause parity check due to force restart 2.) Immediately stop all containers in dockers. Usually 20mins after pausing all containers, WebUI becomes unresponsive and I cannot Putty into unRAID. It may become responsive again if I wait 30-120mins. Wait 30-120 minutes until system stabilizes. 3.) Once it becomes responsive, then I start all containers. In previous months, I was able to get to step 3, but as of this week. the server does not become responsive after waiting a couple of hours with multiple reboots. I also tried booting with dockers enabled and let it start all containers. It starts all containers fine but will freeze ~10mins after and does not recover. In step 2, after stopping all containers, after 15mins or so the cache read spikes and CPU & memory goes to 100% before it becomes unresponsive. It does not seem like any particular container is causing the freezing issue as it freezes with docker disabled with fresh boot In the past two days I tried waiting it out overnight after enabling dockers and got the "docker service failed to start" error the next day. Please see the diagnostics attached, the Feb 3 is the latest diagnostics. Also attached is the Jan 9 diagnostic (enabled the syslog server and tried to capture the crash). Server became unresponsive around 12 noon and it recovered at around 14:30. Note that my cache SSD is crucial MX500, which has know problems, but I have the latest firmware as recommended here for over a year, so I don't think it's causing issues. Thank you in advance unraid-diagnostics-20250203-2114.zip unraid-diagnostics-20250109-2029.zip
-
Server Unresponsive after a while - multiple reboots
Apologies for the late response. I was unable to determine which container was causing problem by looking at the uptime. In a fresh boot, the server would freeze about 15-30mins after it boots when docker is enabled. If I wait for about an hour, it may become responsive again. Just moments before it becomes unresponsive, I can see CPU 100%, that the read on cache drive over 100mb/s and USB flash drive over 20mb/s. The UPS also shows the server is under load as the wattage increase from ~100W idle to ~160W under load. If I let it run for an hour or so, sometimes it becomes responsive. Please see the syslog attached. Dec 28, 2024 13:30 Unraid became unresponsive and load increases to 100%. Approx 14:30 it becomes responsive 14:40, I restarted server with docker disabled Pause Parity check After Unraid starts, I manually enable docker and stop all containers. 14:53 Unraid became unresponsive and load increases to 100%. Approx 15:30 I see htop CPU usage died down to 10% and I had to restart Nginx manually. “/etc/rc.d/rc.nginx restart” 5 minutes later, I was able to access Unraid’s webUI So far, the only method I am able to use my server is to work is to boot with docker disabled, then enable docker and stop all containers immediately, then wait 20-30mins while it hangs, if it becomes responsive I can start all my containers and Unraid has been stable for 12 days until today. Jan 9, 2025 I found out Unraid has become unresponsive again at 12pm noon. I sometimes can access webui if I wait long enough (30-60min) with web browser refreshing, but the dashboard, main,etc remains unresponsive. After waiting for ~2.5 hours, it became responsive again and there was a cache disk SMART message “Current pending ecc cnt returned to normal value”. I was able to download diagnostics after it become responsive, please see the attached. It does not seem like any particular container is causing the freezing issue as it freezes with docker disabled with fresh boot. Also, unraid was stable between Dec29 to Jan8th with the usual containers running. It became unresponsive on Jan 9 and I did not make any changes to it the day before. Your help is much appreciated! unraid-diagnostics-20250109-2029.zipsyslog
-
Server Unresponsive after a while - multiple reboots
Please see the syslog files attached. syslog The syslog stopped logging after it froze, I kept the server running until 20:17:00 but the last line in the log was recorded 19:20:44. I was able to boot and stay online when I disabled docker from settings. It seemed to be stable after I manually enable dockers. Initially it stalled for 30mins and it was okay after that, I was able to backup appdata, update dockers, etc. However, upon restarting the server as a test (with dockers enabled), I end up with the same issue. Unraid would freeze after 15-30minutes. syslog-previous
-
Server Unresponsive after a while - multiple reboots
Update: I pause all dockers and server would become unresponsive a couple minutes later. Just seconds before it becomes unresponsive I saw the read on Cache drive and flash drive read spiked and CPU increases. I was able to see the processes through putty for a minute or so, but unable to get diagnostics. See attached. There are multiple proccesses of: /usr/bin/dockerd -p /var/run/dockerd.pid --log-opt max-size=50m --log-opt max-file=1 --log-level=fatal --storage-driver=btrfs
belupig
Members
-
Joined
-
Last visited