ecnal.magnus Posted December 9, 2023 Share Posted December 9, 2023 (edited) Two days ago I logged into my server and realized the multiple Docker containers were stopped. When I tried to start them they wouldn't. A couple of the containers were still running, and would stop and start and run just fine, which was odd to me. I tried to rollback to multiple ZFS snapshots but nothing worked. Eventually I found a thread that talked about deleting the Docker image, so I did that and now I can't get the Docker service to start at all. Interestingly enough, the VM service also won't start, but I don't use any VMs, I just tried that to see what would happen. I have rebooted the server multiple times over the last couple of days, but right as everything was going really wrong I downloaded the diagnostics files prior to a reboot. I also have syslog running to an external server, so I have all my syslog files, as well. I am wondering if there is any saving my install at this point, or if I am just better off starting with a fresh install of the OS itself? I believe that is possible without losing any of the data, is it not? Currently I have an Unraid server that will start, and as log as I don't have VMs or Docker enabled the array will start, but I cannot get it to stop and the only way I can rebooted it is to kill the power. I would like to get started on rebuilding the OS as soon as possible if that turns out to be my best approach? I still have all my ZFS snapshots and replication, but I don't know that they aren't completely corrupt at this point. I just upgraded to 6.12.6 a day or so prior to these issues showing up. That I can tell all my data is still intact. I have 2 cache pools with ZFS replication going on between them. I have a 15 drive array with dual parity. I have the good portion of my data (and ALL of my media files) backed up to an external server, but there is still quite a bit of odds-and-ends data that resides ONLY on my Unraid array. I have done a lot of modification of the OS over the last 3 years since I moved to Unraid. I am genuinely wondering if just starting fresh (while retaining my data, of course) is my best option? Any and all input would be greatly appreciated. ecnal-diagnostics-20231208-1721.zip Edited December 9, 2023 by ecnal.magnus Quote Link to comment
JorgeB Posted December 10, 2023 Share Posted December 10, 2023 Diags show a corrupt docker image, reboot and post new diags after array start. Quote Link to comment
ecnal.magnus Posted December 10, 2023 Author Share Posted December 10, 2023 I just went ahead and reinstalled Unraid on my flash drive and moved over my key. It booted up just fine and the Docker and VM services started just fine. But, even with a fresh install, I am still having issues stopping the array. This is an issue I have experienced almost since I started using Unraid a couple of years ago, and I think may be leading to at least some of my issues since, ultimately, I am having to do a hard shutdown to get the array stopped. It is currently sitting in "Array stopping - stopping services..." and when I tail the syslog it shows the paste below. I don't know if there is some hardware that is causing this, since this is a completely fresh install, but it has been in this state for more than 30 minutes now. root@ecnal:~# tail -f /var/log/syslog Dec 10 10:47:16 ecnal emhttpd: Stopping services... Dec 10 10:47:16 ecnal emhttpd: shcmd (1520): /etc/rc.d/rc.libvirt stop Dec 10 10:47:16 ecnal root: Stopping libvirtd... Dec 10 10:47:16 ecnal dnsmasq[12132]: exiting on receipt of SIGTERM Dec 10 10:47:16 ecnal root: Network 2214f59d-018d-4270-9f9c-550be516a722 destroyed Dec 10 10:47:16 ecnal root: Dec 10 10:47:17 ecnal root: Stopping virtlogd... Dec 10 10:47:18 ecnal root: Stopping virtlockd... Dec 10 10:47:19 ecnal emhttpd: shcmd (1521): umount /etc/libvirt Dec 10 10:50:15 ecnal nginx: 2023/12/10 10:50:15 [error] 6821#6821: *8357 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.1.99, server: , request: "POST /update.htm HTTP/1.1", upstream: "http://unix:/var/run/emhttpd.socket:/update.htm", host: "192.168.1.222", referrer: "http://192.168.1.222/Main" Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 There's nothing logged that I can see pointing to the problem, try booting in safe mode, then start and stop the array. Quote Link to comment
Solution ecnal.magnus Posted December 11, 2023 Author Solution Share Posted December 11, 2023 (edited) I could never get it to stop, even in safe mode. I finally completely blew away and reformatted both of my cache pools and now it stops. I suspected corruption in the cache pools, as there were some files that, when I tried to delete them, gave me the error "Invalid or incomplete multibyte or wide character" and the file names had backslashes in them for some reason, but I had no idea that would keep the array from stopping. They are now freshly formatted ZFS and everything appears to be running correctly. I have both of those pools backed up, but I think I might just rebuild all my Docker containers from scratch, because I don't trust that data anymore. It will be a pain, but I didn't lose any data that was integral, only the stuff in appdata. What do you think? Edited December 11, 2023 by ecnal.magnus Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 If all is apparently working I would leave it for now. Quote Link to comment
ecnal.magnus Posted December 12, 2023 Author Share Posted December 12, 2023 Thank you for the insight. I really appreciate all the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.