Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Unraid Server Unresponsive - Can't Survive 2 days!

Featured Replies

For months now my server goes unresponsive about every 2 days and I've had to hard power down to get it back. When it's back it runs like a dream

Unresponsive means:

  1. I can't access GUI dashboard

  1. I can ssh. 'Reboot' with force or otherwise gives me a beep but not a shutdown. Docker Stop commands fail. HTOP suggests 3 CPU threads are stuck on 100%

  2. Docker stats suggest all my docker containers are running, but I can't access access them either locally or through reverse proxy

  3. There is nothing of any importance (error/warnings) that I see in the syslogs or via dmesg.

  4. Failure times are random (middle of the night, middle of the day, idle times, busy times) - I can't pin-point to anyone container or script

In troubleshooting

  1. I've changed/upgraded all of PCI cards, memory and processor

  2. Uninstalled many plugins

  3. Tried switching off the dockers (but not all yet)

  4. Rebuilt docker vdisk (twice)

  5. Switched off all VMs completely

  6. Prevented drives from spinning down

  7. Ensured BIOS Idle settings are as everyone suggests (C-states disabled, idle control typical, cool'n'quiet disabled)

  8. Removed all USB devices

  9. Changed UPS

  10. Use Docker ipvlan networks

  11. Temperatures are all ok

  12. Hammered the box with 90% cpu processing for 7 hours - no problem

  13. All disks have plenty of capacity / SMARTs are good

The server cannot survive for more than 48 hours....

Can anyone spot anything in my diags?

Do I need to reinstall the UNRAID OS perhaps?

It all suggests to me that something is 'building up' or hamming the system - but what is it?

ANY suggestion is welcome. I'm almost ready to give up.

ding-a-ling-diagnostics-20250805-1130.zip

Edited by late4473
Add brief HTOP findings.

Solved by late4473

this sounds like the exact issue i have! havent looked at your diag but are you on 7.1.4?

  • Author
16 minutes ago, mister_thew said:

this sounds like the exact issue i have! havent looked at your diag but are you on 7.1.4?

Yes 7.1.4

I guess you haven't solved it? Did you try a rollback?

  • Author
21 minutes ago, JorgeB said:

Enable the syslog server and post that after a crash.

Enabled, will post. But should add I've inspected before during the 'hangup' via ssh. Last entries always seem to be just the normal 'spin down' of the hard drives.

Edited by late4473

  • Author

@JorgeB Here is the syslog and some other screenshots. Died 6-Aug @ 22:10ish i.e. All dockers stopped responding. Hard reboot and all is well again....cycle continues

  1. I could access unraid GUI - note a CPU thread locked at 100%

  2. HTOP doesn't report the same locked CPU but is reporting the load average endlessly climbing

  3. There is an recurring error about a missing css file. I have tried both 'light' and 'dark' settings but this file is always missing

I hope you can spot something

Thanks

dashboard.png

dockerstats.png

htop.png

syslog-192.168.50.12.log

  • Community Expert

I'm afraid that there's nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it doesn't crash, start turning on the other services one by one, including the individual docker containers.

I've had the same problem with 7.1.4 - started with 7.0.1 but was less frequent back then.

The only useful output I've seen appears in the log window if you have it open (it is not written to the log file):

Server kernel: CIFS: VFS: \\Server has not responded in 180 seconds. Reconnecting...

Most times it dies around 1:15am, hours after the last drive has spun down (noted in the log)

I need to rollback to 6.12 - same hardware was stable back then.

  • Author

Thank you for the feedback. I'm not convinced by the version being the root cause because I was actually reporting the same issue on this forum about a year ago! My server dies at RANDOM times also.

I've actually today just gone over 2 days uptime (!) - unprecedented for months. I'd noticed that a 'flash backup' seemed to be running at random times during the day, so my last change was to completely remove the CONNECT plug-in. Let me monitor over the next few days. It's so good not having to hard boot every other day! #fingerscrossed

@JDGJr Do you have Connect installed and do you have auto backups switched on?

  • Author

My server survived for 4 days or so - and the unresponsive behaviour at the death was slightly different. I could access GUI (all looked normal) - no reported stuck CPUs but all docker containers unresponsive. Any attempt to stop them resulting in 'server error'.

I did notice my Docker PID limit was on the default (2048) and a quick docker stats showed I was in the high 1000s so have now bumped up substantially and restarted.

Watch this space.

I am having this same problem. Only started this last week or so though.

On 8/8/2025 at 11:55 PM, late4473 said:

@JDGJr Do you have Connect installed and do you have auto backups switched on?

I do not have Connect installed. Auto backups run once a week, but the system had been stopping every other day.

I'm still prepping for rolling back to 6.12 and system has been more stable since I posted here. But I've been doing a lot of backups to other systems overnight when the stoppage had been common.

My system's UI looked ok - i could change pages, but I got the 'system error' message when interacting with Dockers and the spin up/down actions on the main drive list did nothing with the drives.

Edited by JDGJr
more detail

  • Author

I'm going to mark this as solved - my uptime is now 5 days (!) and server seems rock solid (40 docker containers, 4 VMs all happily running). Last key actions:

  1. Uninstalled the Connect Plug-in (had some impact )

  2. Increased Settings/Docker/PID limit to 8000 (Even though my Docker Stats PID column only sums to c1300) from the default setting. Could they have been spikes perhaps?

pids.sh

  • 4 weeks later...
  • Author
  • Solution

One more follow up - previous solution was not permanent - server continued to die. I'm putting this down to the unRaid FUSE bug (shfs). Resolution which has a 7 day up-time so far...

1) Upgrade to 7.2.0-beta.2 (Can't be any worse that what I've got!)

2) Change all drive references (docker container persistent drives, docker.img, libvirt.img and all VM storage paths) from /mnt/user/... to /mnt/cache/... These are all cache only drives anyway and this takes FUSE out of the equation.

I hope this helps someone!

Will I make 2 weeks of up-time??

Edited by late4473

Thanks for posting this. I'm still trying to prove my 7.1.4 system isn't crash prone. (I am wary of moving to the next beta)

Interestingly, I made the /mnt/user -> /mnt/cache changes late this week - mostly to become consistent. I hope that gives me the positive result you're looking for.

Question: do you see a line similar to 'CIFS: VFS: \\Server has not responded in 180 seconds. Reconnecting...' at the end of the syslog when your machine crashes?

(need to have a console window displaying syslog when it dies, as it would not be able to write to the disk log file)

Edited by JDGJr
() addition

  • Author

No - never seen the CIFS line. I only see drives spinning up and down in my syslog.

Good luck!

I recently been dealing with the same issues. Nothing relating to upgrading the OS versions since I did my update a few weeks ago and the problem started 4 days ago.

Basically, the server freezes about every 24h to 36h. Did a complete server check and no hardware faults found. I did turn on syslog and will edit this post when I get the file. Help would certainly be greatly appreciated for now.

tower-diagnostics-20250907-0950.zip

Syslog server is enabled and the following files were recorded after multiple crashes. Today, I moved my appdata and docker.img files to my cache drive. Server still crashed.

syslog syslog-previous

Edited by leprechaun17

  • Community Expert

You have a serious LAN loop configured, this will kill the box sooner or later

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth3 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth3 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Looks like you have attached all ports to the same switch (or to different switches but these are interconnected too).

I assume, you want to do some kind of bonding, but either you have not told your switch about it, or it does not support this kind of setup.

The only valid option would be to pull out 3 of those cables or to set bond mode to "active backup".

All will result in only one card beeing used anymore.

ok thanks. I removed 3 out of 4 ethernet cables, let's see what happens.

So the server have up for the past 6 days without interruption. Thank you for your help.

On 9/6/2025 at 2:44 PM, late4473 said:

Change all drive references (docker container persistent drives, docker.img, libvirt.img and all VM storage paths) from /mnt/user/... to /mnt/cache/... These are all cache only drives anyway and this takes FUSE out of the equation.

Will I make 2 weeks of up-time??

Did you make it to 2 weeks?

I'm happy to report that making those changes to my 7.1.4 system has given me 16+ days of uptime! Haven't had this in months.

  • Author

Yep - solid as a rock - like you I haven't seen that for months! I'm pretty sure it was due to taking FUSE out of the equation rather than the upgrade (my conclusion being it couldn't handle the throughput)....glad you're in the same boat. I was ready to chuck unRAID into the bin!

image.png

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.