MrSliff Posted October 21, 2023 Share Posted October 21, 2023 (edited) Hi together, i encounter some problems in the last weeks with the connectivity of my Unraid server. I recognized mostly in the mornings, that the CPU goes crazy (tried to watch htop and the most demanding was sshfs, but it was hard to get a stable connection). When this happens, i also have problems connecting to ssh, gui and smb. Yesterday the server was unresponsive the whole day. However, i still could connect to my VMs and all Docker services. Only Unraid related services were unresponsive. Today, everything is fine again, i did not restart the server so it fixed by itself, which leads me to assume, its something like a backup or something else. What i changed recently: - I recently converted my Cache pools to ZFS and added one ZFS Disk to the Array to have some ZFS functionality like Snapshots and replication for my unprotected Cache, also set up replication to run in the night time. - Accordingly set up duplicati to back up the replicated data daily (i assume its only backing up the changes and not the whole data) - I changed the Servarr Stack including qBittorrent to use hard links, with qbittorrent set to seed for 30 Days (i know the disks are spinning 24/7 now due to this, currently seeding like around 280-300 torrents) I sadly dont know where to start searching, because i could not find that one docker or service which causes this. Maybe there is some service with which i can record htop-like logs to see whats going on, also maybe some unraid log recording. Not a problem to record for a week or so, disk space is not a problem. Thanks for helping out. unraid-diagnostics-20231021-0955.zip Edited October 21, 2023 by MrSliff Quote Link to comment
JorgeB Posted October 21, 2023 Share Posted October 21, 2023 Enable the syslog server and post that after a crash. Quote Link to comment
MrSliff Posted October 21, 2023 Author Share Posted October 21, 2023 (edited) 10 minutes ago, JorgeB said: Enable the syslog server and post that after a crash. Apparently the server does not crash, its just unresponsive. I did not reset the server over night, but its available again without a hang/rash or similar. So there may be something very demanding, which makes everything unresponsive. Anyways, i enabled the syslog server now. Edited October 21, 2023 by MrSliff Quote Link to comment
MrSliff Posted October 24, 2023 Author Share Posted October 24, 2023 (edited) So, here the syslog you asked for. What is quite interesting is the fact that the file integrity plugin started over night (24th of october at 04:31 am) and after that multiple running PIDs became unresponsive. Maybe that plugin is the reason. This morning i had quite some problems. Could not reboot unraid and i could not stop the array manually. After stopping the docker process and after a hard reset of the machine, my docker image was corrupted. had to delete it and reinstall all docker containers. Some other weird thing was one of the network interfaces was high on cpu with a ping script. Also saw quite many tasks responding to WAN IP addresses (mostly turkish ip range) on Port 445 (SMB). Weird that, i dont know what could cause this. syslog-192.168.20.3.log Edited October 24, 2023 by MrSliff Quote Link to comment
JorgeB Posted October 24, 2023 Share Posted October 24, 2023 Macvlan related call traces will usually end up crashing the server, try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). Quote Link to comment
MrSliff Posted October 24, 2023 Author Share Posted October 24, 2023 1 hour ago, JorgeB said: Macvlan related call traces will usually end up crashing the server, try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). Ok, did that. Lets see how it is going in the next days. Quote Link to comment
MrSliff Posted October 26, 2023 Author Share Posted October 26, 2023 (edited) So, this night the system crashed. No response now. But i have a Syslog from the whole night. Maybe a memory issue? One thing to maybe mention: I switched to ZFS on my cache disks some weeks ago and also put one ZFS disk into my array for replication and snapshotting my cache disks since they are not protected. Maybe theres the reason for crashes recently. Set up Spaceinvaderones Scripts for Snapshots and replication on every dataset. So there are about 6 scripts running one after another overnight. (Would love to see some kind of GUI plugin to handle this ) syslog-192.168.20.3.log Edited October 26, 2023 by MrSliff Quote Link to comment
JorgeB Posted October 26, 2023 Share Posted October 26, 2023 There are a lot of different call traces, looks more like a hardware problem, a lot of them mention zfs, start by running memtest. Quote Link to comment
MrSliff Posted October 26, 2023 Author Share Posted October 26, 2023 (edited) Ok, i will try something else first, since i have the feeling it may be because of different tasks accessing the same data at the same time: I have multiple daily ZFS snapshots and replication tasks which run overnight, but at the same time i also have a duplicati backup task running, which does an offsite backup of my replication target. I will first stop the duplicati task and see if its better then. If thats hopefully the reason, i will switch to another solution like borg backup or something else. If not, i will do that memtest when i have a 24h time slot. Edited October 26, 2023 by MrSliff Quote Link to comment
IISanitariumII Posted November 27, 2023 Share Posted November 27, 2023 Any update? Quote Link to comment
Solution MrSliff Posted November 28, 2023 Author Solution Share Posted November 28, 2023 (edited) 10 hours ago, IISanitariumII said: Any update? Hi, so at the end it seems the problem was running Duplicati with external Backup in parallel to any other Copying/Backup inside of the array. It was causing long CPU and RAM spikes which made my system unresponsive. Docker Apps and VMs were still reachable and responsive, but Unraid itself was not. I changed to Borg Backup now for the external Backups, did not have any error or hiccup anymore since then. CPU and RAM spikes went away. Still have to do the MEMTEST, did not do that yet. Edited November 28, 2023 by MrSliff Quote Link to comment
IISanitariumII Posted November 28, 2023 Share Posted November 28, 2023 1 hour ago, MrSliff said: Hi, so at the end it seems the problem was running Duplicati with external Backup in parallel to any other Copying/Backup inside of the array. It was causing long CPU and RAM spikes which made my system unresponsive. Docker Apps and VMs were still reachable and responsive, but Unraid itself was not. I changed to Borg Backup now for the external Backups, did not have any error or hiccup anymore since then. CPU and RAM spikes went away. Still have to do the MEMTEST, did not do that yet. hmm, interesting. Yea mine has been running for 1 hour so far after the parity check. I replaced the USB so im going to monitor it for the day and see if it crashes. Connect plugin disconnected, new USB, new Config. I will keep you updated. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.