Jump to content

Docker Service won't Start -- working fine and then I botched it


Recommended Posts

Ugh. So basically I had a bunch of dockers set up. I moved appdata and docker.img to my cache drive. Cache drive was small so I upgraded to a bigger drive. I severely botched the process and lost my docker. Docker told me I needed to recreate docker.img. I did so and then started setting up my dockers again (mostly binhex radarr, sonarr, sabnzbdvpn and plexinc's plex docker). I never managed to get them working. For some reason the cache drive kept flipping to read only. I was monitoring this by running "cat /proc/mounts | grep cache" I finally gave up on the cache drive and moved appdata and docker.img back to the array. I deleted docker.img many many times. I've tried rebuilding docker containers from user templates and from scratch. The ones from scratch were working for a bit (hours). Now it seems that I'm getting similar problems as I was getting when the "read only filesystem" issue came up. Now when I toggle the docker service on and off, it sometimes won't start the docker service. The menu option shows up but then it says "Docker service failed to start." Rebooting fixes it but it has an almost impossible time unmounting a share/drive... Not sure which share. It was the cache drive that was mounted read-only before but now that i moved back to the array I have no what is causing it. The syslog shows "retrying unmount" over and over. Now when I try to start a docker I get "Error 403." Any help at all is appreciated.

tower-diagnostics-20201127-1043.zip

Link to comment
23 minutes ago, JorgeB said:

Cache device drooped offline, it's connected to a Marvell controller and a SATA port multiplier, both are not recommended, especially together.

Interesting... okay, sooo, forgive me for being slow. I have a PCI-E sata expansion card. Is the Marvell controller on the card? Is my card considered a "port multiplier" but there would be a different version of a similar expansion card that would work? If the answer is yes, I will search for compatible unraid expansion cards. Thanks!

Link to comment
10 minutes ago, mysteriouscrag said:

I have a PCI-E sata expansion card. Is the Marvell controller on the card?

Yes.

 

10 minutes ago, mysteriouscrag said:

Is my card considered a "port multiplier" but there would be a different version of a similar expansion card that would work?

If the controller has more than 4 ports it has a built in port multiplier, it can also be connected to an external multiplier, no way to know by the diags.

 

For good reliability we recommend LSI HBAs in IT mode, like the 9211-8i and similar.

 

 

Link to comment

Okay so I ended up moving everything back to the array and completely removing the cache drive then I went through with a previously planned upgrade. New motherboard, processor and memory. Everything went flawlessly. After that I removed the port multiplier and put in a previous expansion card that I was using (just 2 ports vs the 8 that the other card had). I have no idea if the current card is a port multiplier or not but now I have my cache drive plugged direct into the (new) motherboard. I moved the docker.img and appdata to the cache drive. Immediately all dockers missing. I did a docker restore and it went mostly okay. BUT, now I'm getting the same errors. The cache drive is flipping to read only and causing havoc (I'm checking using "cat /proc/mounts | grep cache". CPU starts maxing out and the Web GUI becomes impossible to navigate. I'm attaching a copy of my diagnostics report again. Thanks in advance!

 

EDIT: According to "top" it appears that shfs process is floating around 80% - 90% and the Web UI is pretty much impossible to use (list of docker containers won't load).

tower-diagnostics-20201130-0936.zip

Edited by mysteriouscrag
New diagnostics report that shows the read-only status
Link to comment

Cache device dropped offline again:

 

Nov 30 09:31:45 Tower kernel: ata6: COMRESET failed (errno=-16)
Nov 30 09:31:45 Tower kernel: ata6: reset failed, giving up
Nov 30 09:31:45 Tower kernel: ata6.00: disabled
Nov 30 09:31:45 Tower kernel: ata6: EH complete

 

If you haven't yet replace both cables, if it continues to drop it could just be a bad SSD.

Link to comment
13 minutes ago, JorgeB said:

Cache device dropped offline again:

 


Nov 30 09:31:45 Tower kernel: ata6: COMRESET failed (errno=-16)
Nov 30 09:31:45 Tower kernel: ata6: reset failed, giving up
Nov 30 09:31:45 Tower kernel: ata6.00: disabled
Nov 30 09:31:45 Tower kernel: ata6: EH complete

 

If you haven't yet replace both cables, if it continues to drop it could just be a bad SSD.

I just put in a different power supply at the same time that I did the other upgrades. Power supply is old to me, new to this computer. 1000 watts of overkill (previous one was 430W so I thought maybe I was having power issues). So, power cable has been changed out. SATA cable also swapped out. Ugh, I guess I try a different cache drive? I'll try it because it makes sense as a next best troubleshooting step but I do think there's something else I'm doing wrong that's causing the problem. Thanks!

Edited by mysteriouscrag
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...