Docker will not start

zAdok · April 21, 2017

System became unresponsive, was able to reboot cleanly. System booted but web interface was extremely slow, cache SSDs were missing. Stopped array, clean reboot again. Cache drives returned but docker will not start.

zadok-nas-diagnostics-20170421-2050.zip

JorgeB · April 21, 2017

Docker image is corrupt, delete and recreate.

zAdok · April 21, 2017

Yeah thats what I was thinking. I've done this a few times in the past, replaced the SSDs as I thought they might have been causing it. Any ideas to what else could be causing this to happen?

JorgeB · April 21, 2017

These are usually the result of device errors, on SSDs mostly caused by cable issues, logs are very new so there's nothing else there, but you can use btrfs device stats to find out if there were previous problems, use:

btrfs device stats -z /mnt/cache

All values should be 0, if not there were previous issues, -z will show current values and reset all values to 0, so keep monitoring for a few weeks (without -z), and if errors appear again grab the diags before rebooting.

zAdok · April 21, 2017

Thanks, heaps of write, read and flush IO errors on one of the SSDs. I'll replace that cable and keep an eye on it.

zAdok · April 24, 2017

OK so this just happened again, this time I was able to run diagnostics. Gui reported that one cache disk was missing and that the remaining cache disk was running hot. The one that was missing was NOT the one with the reported errors last time.

btrfs device stats /mnt/cache reports no errors at all since I last cleared them a few days back.

Docker didn't corrupt this time.

zadok-nas-diagnostics-20170424-2019.zip

JorgeB · April 24, 2017

SSD dropped offline, on SSDs this is usually caused by a cable issue, since the SSD in on an LSI controller that doesn't support trim I would recommend swapping cables with one of the disks connected on the onboard controller.

Edited April 24, 2017 by johnnie.black

zAdok · April 24, 2017

OK cheers. Thanks for the helpful feedback.

zAdok · April 25, 2017

So this morning I changed the SSDs onto the onboard controller as advised. After a few hours of running I started getting temp alarms from the SSDs again. I stopped the array and the temps immediately dropped 20 degrees celsius. Any ideas's why this is happening?

zadok-nas-diagnostics-20170425-1055.zip

JorgeB · April 25, 2017

Just a quick look but don't see any issues in the logs, what temps are the SSDs set to? it's normal for them to get over the 45C default for all devices when they are under heavy writes, it varies from model to model but I usually set mine to 55 warn / 60 critical.

zAdok · April 25, 2017

Cache2 hit 71, Cache1 was at 63 i think.

JorgeB · April 25, 2017

That's too high, especially cache2, it's above max temp for those models:

Operating temperature32ºF to 158ºF (0ºC to 70 ºC)

That has nothing to do with the controller they're on though, and maybe not the reason for the earlier drop outs, but you want to improve cooling to keep them below 60C.

Docker will not start

Recommended Posts

zAdok

Link to comment

JorgeB

Link to comment

zAdok

Link to comment

JorgeB

Link to comment

zAdok

Link to comment

zAdok

Link to comment

JorgeB

Link to comment

zAdok

Link to comment

zAdok

Link to comment

JorgeB

Link to comment

zAdok

Link to comment

JorgeB

Link to comment

Join the conversation