Jump to content

Docker Not Working and Other Dire Issues


Go to solution Solved by ecnal.magnus,

Recommended Posts

Two days ago I logged into my server and realized the multiple Docker containers were stopped. When I tried to start them they wouldn't. A couple of the containers were still running, and would stop and start and run just fine, which was odd to me. I tried to rollback to multiple ZFS snapshots but nothing worked. Eventually I found a thread that talked about deleting the Docker image, so I did that and now I can't get the Docker service to start at all. Interestingly enough, the VM service also won't start, but I don't use any VMs, I just tried that to see what would happen. I have rebooted the server multiple times over the last couple of days, but right as everything was going really wrong I downloaded the diagnostics files prior to a reboot. I also have syslog running to an external server, so I have all my syslog files, as well.

I am wondering if there is any saving my install at this point, or if I am just better off starting with a fresh install of the OS itself? I believe that is possible without losing any of the data, is it not? Currently I have an Unraid server that will start, and as log as I don't have VMs or Docker enabled the array will start, but I cannot get it to stop and the only way I can rebooted it is to kill the power. I would like to get started on rebuilding the OS as soon as possible if that turns out to be my best approach? I still have all my ZFS snapshots and replication, but I don't know that they aren't completely corrupt at this point.

I just upgraded to 6.12.6 a day or so prior to these issues showing up. That I can tell all my data is still intact. I have 2 cache pools with ZFS replication going on between them. I have a 15 drive array with dual parity. I have the good portion of my data (and ALL of my media files) backed up to an external server, but there is still quite a bit of odds-and-ends data that resides ONLY on my Unraid array. I have done a lot of modification of the OS over the last 3 years since I moved to Unraid. I am genuinely wondering if just starting fresh (while retaining my data, of course) is my best option?

Any and all input would be greatly appreciated.

ecnal-diagnostics-20231208-1721.zip

Edited by ecnal.magnus
Link to comment

I just went ahead and reinstalled Unraid on my flash drive and moved over my key. It booted up just fine and the Docker and VM services started just fine. But, even with a fresh install, I am still having issues stopping the array. This is an issue I have experienced almost since I started using Unraid a couple of years ago, and I think may be leading to at least some of my issues since, ultimately, I am having to do a hard shutdown to get the array stopped. It is currently sitting in "Array stopping - stopping services..." and when I tail the syslog it shows the paste below. I don't know if there is some hardware that is causing this, since this is a completely fresh install, but it has been in this state for more than 30 minutes now.

root@ecnal:~# tail -f /var/log/syslog
Dec 10 10:47:16 ecnal emhttpd: Stopping services...
Dec 10 10:47:16 ecnal emhttpd: shcmd (1520): /etc/rc.d/rc.libvirt stop
Dec 10 10:47:16 ecnal root: Stopping libvirtd...
Dec 10 10:47:16 ecnal dnsmasq[12132]: exiting on receipt of SIGTERM
Dec 10 10:47:16 ecnal root: Network 2214f59d-018d-4270-9f9c-550be516a722 destroyed
Dec 10 10:47:16 ecnal root: 
Dec 10 10:47:17 ecnal root: Stopping virtlogd...
Dec 10 10:47:18 ecnal root: Stopping virtlockd...
Dec 10 10:47:19 ecnal emhttpd: shcmd (1521): umount /etc/libvirt
Dec 10 10:50:15 ecnal nginx: 2023/12/10 10:50:15 [error] 6821#6821: *8357 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.99, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "192.168.1.222", referrer: "http://192.168.1.222/Main"

Link to comment
  • Solution

I could never get it to stop, even in safe mode. I finally completely blew away and reformatted both of my cache pools and now it stops. I suspected corruption in the cache pools, as there were some files that, when I tried to delete them, gave me the error "Invalid or incomplete multibyte or wide character" and the file names had backslashes in them for some reason, but I had no idea that would keep the array from stopping. They are now freshly formatted ZFS and everything appears to be running correctly. I have both of those pools backed up, but I think I might just rebuild all my Docker containers from scratch, because I don't trust that data anymore. It will be a pain, but I didn't lose any data that was integral, only the stuff in appdata. What do you think? 

Edited by ecnal.magnus
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...