Jump to content

Unraid becomes unresponsive from time to time (No GUI/SSH/SMB, but still pingable)


MrSliff
Go to solution Solved by MrSliff,

Recommended Posts

Hi together,

 

i encounter some problems in the last weeks with the connectivity of my Unraid server.

 

I recognized mostly in the mornings, that the CPU goes crazy (tried to watch htop and the most demanding was sshfs, but it was hard to get a stable connection). When this happens, i also have problems connecting to ssh, gui and smb.

 

Yesterday the server was unresponsive the whole day. However, i still could connect to my VMs and all Docker services. Only Unraid related services were unresponsive. Today, everything is fine again, i did not restart the server so it fixed by itself, which leads me to assume, its something like a backup or something else.

 

What i changed recently:

- I recently converted my Cache pools to ZFS and added one ZFS Disk to the Array to have some ZFS functionality like Snapshots and replication for my unprotected Cache, also set up replication to run in the night time.

- Accordingly set up duplicati to back up the replicated data daily (i assume its only backing up the changes and not the whole data)

- I changed the Servarr Stack including qBittorrent to use hard links, with qbittorrent set to seed for 30 Days (i know the disks are spinning 24/7 now due to this, currently seeding like around 280-300 torrents)

 

I sadly dont know where to start searching, because i could not find that one docker or service which causes this.

 

Maybe there is some service with which i can record htop-like logs to see whats going on, also maybe some unraid log recording. Not a problem to record for a week or so, disk space is not a problem.

 

Thanks for helping out.

unraid-diagnostics-20231021-0955.zip

Edited by MrSliff
Link to comment
  • MrSliff changed the title to Unraid becomes unresponsive from time to time (No GUI/SSH/SMB, but still pingable)
10 minutes ago, JorgeB said:

Enable the syslog server and post that after a crash.

 

 

Apparently the server does not crash, its just unresponsive. I did not reset the server over night, but its available again without a hang/rash or similar. So there may be something very demanding, which makes everything unresponsive.

 

Anyways, i enabled the syslog server now.

Edited by MrSliff
Link to comment

So, here the syslog you asked for. What is quite interesting is the fact that the file integrity plugin started over night (24th of october at 04:31 am) and after that multiple running PIDs became unresponsive. Maybe that plugin is the reason.

 

This morning i had quite some problems. Could not reboot unraid and i could not stop the array manually. After stopping the docker process and after a hard reset of the machine, my docker image was corrupted. had to delete it and reinstall all docker containers.

 

Some other weird thing was one of the network interfaces was high on cpu with a ping script. Also saw quite many tasks responding to WAN IP addresses (mostly turkish ip range) on Port 445 (SMB). Weird that, i dont know what could cause this.

 

 

syslog-192.168.20.3.log

Edited by MrSliff
Link to comment
1 hour ago, JorgeB said:

Macvlan related call traces will usually end up crashing the server, try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).

Ok, did that. Lets see how it is going in the next days.

 

Link to comment

So, this night the system crashed. No response now. But i have a Syslog from the whole night. Maybe a memory issue?

 

One thing to maybe mention: I switched to ZFS on my cache disks some weeks ago and also put one ZFS disk into my array for replication and snapshotting my cache disks since they are not protected. Maybe theres the reason for crashes recently. Set up Spaceinvaderones Scripts for Snapshots and replication on every dataset. So there are about 6 scripts running one after another overnight. (Would love to see some kind of GUI plugin to handle this :) )

syslog-192.168.20.3.log

Edited by MrSliff
Link to comment

Ok, i will try something else first, since i have the feeling it may be because of different tasks accessing the same data at the same time:

 

I have multiple daily ZFS snapshots and replication tasks which run overnight, but at the same time i also have a duplicati backup task running, which does an offsite backup of my replication target. I will first stop the duplicati task and see if its better then. If thats hopefully the reason, i will switch to another solution like borg backup or something else. If not, i will do that memtest when i have a 24h time slot.

Edited by MrSliff
Link to comment
  • 1 month later...
  • Solution
10 hours ago, IISanitariumII said:

Any update?

 Hi, so at the end it seems the problem was running Duplicati with external Backup in parallel to any other Copying/Backup inside of the array.

 

It was causing long CPU and RAM spikes which made my system unresponsive. Docker Apps and VMs were still reachable and responsive, but Unraid itself was not.

 

I changed to Borg Backup now for the external Backups, did not have any error or hiccup anymore since then. CPU and RAM spikes went away.

 

Still have to do the MEMTEST, did not do that yet.

Edited by MrSliff
Link to comment
1 hour ago, MrSliff said:

 Hi, so at the end it seems the problem was running Duplicati with external Backup in parallel to any other Copying/Backup inside of the array.

 

It was causing long CPU and RAM spikes which made my system unresponsive. Docker Apps and VMs were still reachable and responsive, but Unraid itself was not.

 

I changed to Borg Backup now for the external Backups, did not have any error or hiccup anymore since then. CPU and RAM spikes went away.

 

Still have to do the MEMTEST, did not do that yet.


hmm, interesting. Yea mine has been running for 1 hour so far after the parity check. I replaced the USB so

im going to monitor it for the day and see if it crashes. Connect plugin disconnected, new USB, new Config. I will keep you updated. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...