July 1, 20251 yr Hello,I have an intermittent issue. It went away when I updated to v7 but now on 7.1.4 it is back. My server doesn't crash, but it does become completely unreachable (including docker containers, and VMs) for a period of time. Last night this happened from 8:50:33 pm to 9:05:29pm (logs from Home Assistant VM pinging unraid server attached as screenshot).During this time (including one log before and one log after), these are the logs:Jun 30 16:05:17 MegaHub sSMTP[3406899]: Sent mail for [email protected] (221 2.0.0 Bye) uid=0 username=root outbytes=788 Jun 30 20:59:01 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 540.2 seconds. Jun 30 20:59:36 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 31.7 seconds. Jun 30 21:03:27 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 206.8 seconds. Jun 30 21:10:31 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 330.5 seconds. Jun 30 21:14:26 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 231.8 seconds. Jun 30 21:17:17 MegaHub vnstatd[6004]: Warning: Writing cached data to database took 136.9 seconds. Jun 30 21:45:33 MegaHub emhttpd: spinning down /dev/sdeWhen the server crashed, it was reporting ram usage at 86.2%, which I can't imagine is enough to crash the server. I have attached a photo of the graph of RAM at this time. Also, none of my drives seemed to be under heavy load. I do have 2 SMR drives, which are in the process of being phased out, so I am wondering if they could be the cause. However, they have no issues with SMART tests, and run perfectly fine during parity checks and mover operations. As of now, I have moved the vnstat.db to a specific share I hold on the cache, because idk.... seems like it's a smart thing to do if it is the SMR drives causing it. I'm not sure if there is anything else I can do/look into to see what is causing the issues. Any help confirming the issue/confirming the fix would be much appreciated.
July 1, 20251 yr Community Expert this looks like the cache of the disk drives is filling up. I would recommend you run the unraid swap plugin. this requires a disk formatted as btrfs
July 1, 20251 yr Author 1 minute ago, bmartino1 said:this requires a disk formatted as btrfsMy array is formatted as xfs. Cache is btrfs though.Also the plugin says its for 6.9 but I'm now on 7.1.4.Also: Just a gut check - I download a lot of stuff through qbittorrent and never have an issue so, although I really don't know, I'd be surprised in vnstat was killing it when everything else isn't. Edited July 1, 20251 yr by PartyingChair Clarity
July 1, 20251 yr Community Expert while labeld 6.9 it meant for any unraid version 6.9 and up. its min os is 6.9 confusing I know... But is swap doesn't touch it during the werid times then its a docker/server resource task with a potential leak. in whcih case vnstat loks to be the culprit as it ran out of cache a place to write. Swap should assit and fix that.
July 1, 20251 yr Author Unfortunately, swap isn't an option if it requires btrfs. Without significant downtime, I have no way to switch it from xfs to btrfs to my knowledge. I'll look into options but it seems it's gonna be a pain, if possible at all.
July 1, 20251 yr Community Expert Enable remote syslog server and post diagnostics if it occurs again.
July 1, 20251 yr Author 10 minutes ago, MowMdown said:Enable remote syslog server and post diagnostics if it occurs again.It's now been over 12 hours, but 1) remote syslog server is already enabled and 2) I can pull the Diagnostics (and the server never actually went down - I just couldn't access the WebUI, docker containers, or VMs, They became unreachable but as far as I can tell never actually turned off). I have attached them here. I just don't know if it's too late since it was 12 hours ago. If it is, no worries, next time it happens I will absolutely do it quicker. megahub-diagnostics-20250701-1212.zip
July 1, 20251 yr Community Expert I don't personally see anything in the logs that points to an issue. You can try using the Netdata docker container to try and see what is using up the specific resources but the unresponsiveness is likely to do running low on memory, high IOWAIT, etc.This is purely speculation.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.