Michael Ganrer Posted August 7 Share Posted August 7 Recently I have migrated my unraid server from a desktop intel 10700 to an HP DL360 g10 server. Every thing seems to be working but I am getting a hiccup every couple minutes where all my dockers and basically entire server becomes unresponsive. If I have the dashboard open already the CPU usage will still update but I couldn't switch do the docker tab if I wanted to. It is a 48 core 96 thread system and I don't have any CPU pinning except for the VM that is limited to 8 cores 16 threads. When it is unresponsive, anywhere from 8-11 of the cpu threads are pegged at 100%. They drop back down to normal after about 30 seconds and the system returns to normal. Any ideas on what would cause this? jinks-diagnostics-20240807-0712.zip Quote Link to comment
Michael Ganrer Posted August 8 Author Share Posted August 8 (edited) Update..... I have tried to figure out what is using the cores but HTOP freezes as well. Edited August 8 by Michael Ganrer Quote Link to comment
JorgeB Posted August 9 Share Posted August 9 Does it still happen is you disable the docker service? Quote Link to comment
Michael Ganrer Posted August 23 Author Share Posted August 23 Sorry for the late response. I have been doing a bit more diag on this. It seems that it is happening whenever I write to ether cache drive. The more docker containers I stop the less frequent it happens. Same thing with the VM that I am running. The VM is windows 11 running Blueiris camera software that dumps to the cache drive once an hour or so to be written to the array overnight. The cache drives were originally install in the front of the 1u server while the HDD are installed in 2 SC200 SAS expansion racks. I moved the cache drives to the SAS bays because I don't have any problem writing to the array. The Cache drives are still doing the same thing there. Both cache drives are new with no SMART errors. 1 cache drive is btrfs while the other is xfs. Nothing is set to write between either cache drive(anymore). Please let me know if anymore info is needed are if I can try something else. Thanks everyone! Quote Link to comment
JorgeB Posted August 23 Share Posted August 23 Try using a disk path/share every time that is a possibility, or use exclusive shares, sometimes bypassing FUSE (/mnt/user) helps a lot with that type of symptom. Quote Link to comment
Michael Ganrer Posted August 23 Author Share Posted August 23 That is a change I made about a week ago. No change. I will attach a current diagnostic. jinks-diagnostics-20240823-1050.zip Quote Link to comment
Michael Ganrer Posted August 23 Author Share Posted August 23 Also, here is a screenshot of netdata. Quote Link to comment
JorgeB Posted August 23 Share Posted August 23 Unfortunately there's nothing relevant logged that I can see. Quote Link to comment
Michael Ganrer Posted August 23 Author Share Posted August 23 The netdata screenshot was to illustrate that when iowait is high is when everything becomes unresponsive. Hence the gap of unrecorded data. I was actively copying files(deluge was anyways) when this was taken. Quote Link to comment
JorgeB Posted August 23 Share Posted August 23 Yes, but there's nothing logged in the syslog to explain that, if you are already using disk/exclusive shares, I don't see another reason for that, could be just slow devices, that won't leave anything logged. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.