April 21, 20179 yr System became unresponsive, was able to reboot cleanly. System booted but web interface was extremely slow, cache SSDs were missing. Stopped array, clean reboot again. Cache drives returned but docker will not start. zadok-nas-diagnostics-20170421-2050.zip
April 21, 20179 yr Author Yeah thats what I was thinking. I've done this a few times in the past, replaced the SSDs as I thought they might have been causing it. Any ideas to what else could be causing this to happen?
April 21, 20179 yr Community Expert These are usually the result of device errors, on SSDs mostly caused by cable issues, logs are very new so there's nothing else there, but you can use btrfs device stats to find out if there were previous problems, use: btrfs device stats -z /mnt/cache All values should be 0, if not there were previous issues, -z will show current values and reset all values to 0, so keep monitoring for a few weeks (without -z), and if errors appear again grab the diags before rebooting.
April 21, 20179 yr Author Thanks, heaps of write, read and flush IO errors on one of the SSDs. I'll replace that cable and keep an eye on it.
April 24, 20179 yr Author OK so this just happened again, this time I was able to run diagnostics. Gui reported that one cache disk was missing and that the remaining cache disk was running hot. The one that was missing was NOT the one with the reported errors last time. btrfs device stats /mnt/cache reports no errors at all since I last cleared them a few days back. Docker didn't corrupt this time. zadok-nas-diagnostics-20170424-2019.zip
April 24, 20179 yr Community Expert SSD dropped offline, on SSDs this is usually caused by a cable issue, since the SSD in on an LSI controller that doesn't support trim I would recommend swapping cables with one of the disks connected on the onboard controller. Edited April 24, 20179 yr by johnnie.black
April 25, 20179 yr Author So this morning I changed the SSDs onto the onboard controller as advised. After a few hours of running I started getting temp alarms from the SSDs again. I stopped the array and the temps immediately dropped 20 degrees celsius. Any ideas's why this is happening? zadok-nas-diagnostics-20170425-1055.zip
April 25, 20179 yr Community Expert Just a quick look but don't see any issues in the logs, what temps are the SSDs set to? it's normal for them to get over the 45C default for all devices when they are under heavy writes, it varies from model to model but I usually set mine to 55 warn / 60 critical.
April 25, 20179 yr Community Expert That's too high, especially cache2, it's above max temp for those models: Operating temperature32ºF to 158ºF (0ºC to 70 ºC) That has nothing to do with the controller they're on though, and maybe not the reason for the earlier drop outs, but you want to improve cooling to keep them below 60C.
Archived
This topic is now archived and is closed to further replies.