For months now my server goes unresponsive about every 2 days and I've had to hard power down to get it back. When it's back it runs like a dream Unresponsive means: I can't access GUI dashboard I can ssh. 'Reboot' with force or otherwise gives me a beep but not a shutdown. Docker Stop commands fail. HTOP suggests 3 CPU threads are stuck on 100% Docker stats suggest all my docker containers are running, but I can't access access them either locally or through reverse proxy There is nothing of any importance (error/warnings) that I see in the syslogs or via dmesg. Failure times are random (middle of the night, middle of the day, idle times, busy times) - I can't pin-point to anyone container or script In troubleshooting I've changed/upgraded all of PCI cards, memory and processor Uninstalled many plugins Tried switching off the dockers (but not all yet) Rebuilt docker vdisk (twice) Switched off all VMs completely Prevented drives from spinning down Ensured BIOS Idle settings are as everyone suggests (C-states disabled, idle control typical, cool'n'quiet disabled) Removed all USB devices Changed UPS Use Docker ipvlan networks Temperatures are all ok Hammered the box with 90% cpu processing for 7 hours - no problem All disks have plenty of capacity / SMARTs are good The server cannot survive for more than 48 hours.... Can anyone spot anything in my diags? Do I need to reinstall the UNRAID OS perhaps? It all suggests to me that something is 'building up' or hamming the system - but what is it? ANY suggestion is welcome. I'm almost ready to give up. ding-a-ling-diagnostics-20250805-1130.zip

@JorgeB Here is the syslog and some other screenshots. Died 6-Aug @ 22:10ish i.e. All dockers stopped responding. Hard reboot and all is well again....cycle continues I could access unraid GUI - note a CPU thread locked at 100% HTOP doesn't report the same locked CPU but is reporting the load average endlessly climbing There is an recurring error about a missing css file. I have tried both 'light' and 'dark' settings but this file is always missing I hope you can spot something Thanks syslog-192.168.50.12.log

Unraid Server Unresponsive - Can't Survive 2 days! - General Support

August 5, 2025Aug 5

For months now my server goes unresponsive about every 2 days and I've had to hard power down to get it back. When it's back it runs like a dream

Unresponsive means:

I can't access GUI dashboard

I can ssh. 'Reboot' with force or otherwise gives me a beep but not a shutdown. Docker Stop commands fail. HTOP suggests 3 CPU threads are stuck on 100%
Docker stats suggest all my docker containers are running, but I can't access access them either locally or through reverse proxy
There is nothing of any importance (error/warnings) that I see in the syslogs or via dmesg.
Failure times are random (middle of the night, middle of the day, idle times, busy times) - I can't pin-point to anyone container or script

In troubleshooting

I've changed/upgraded all of PCI cards, memory and processor
Uninstalled many plugins
Tried switching off the dockers (but not all yet)
Rebuilt docker vdisk (twice)
Switched off all VMs completely
Prevented drives from spinning down
Ensured BIOS Idle settings are as everyone suggests (C-states disabled, idle control typical, cool'n'quiet disabled)
Removed all USB devices
Changed UPS
Use Docker ipvlan networks
Temperatures are all ok
Hammered the box with 90% cpu processing for 7 hours - no problem
All disks have plenty of capacity / SMARTs are good

The server cannot survive for more than 48 hours....

Can anyone spot anything in my diags?

Do I need to reinstall the UNRAID OS perhaps?

It all suggests to me that something is 'building up' or hamming the system - but what is it?

ANY suggestion is welcome. I'm almost ready to give up.

ding-a-ling-diagnostics-20250805-1130.zip

Edited August 5, 2025Aug 5 by late4473
Add brief HTOP findings.

Quote

August 5, 2025Aug 5

Community Expert

Enable the syslog server and post that after a crash.

Quote

August 5, 2025Aug 5

this sounds like the exact issue i have! havent looked at your diag but are you on 7.1.4?

Quote

August 5, 2025Aug 5

Author

16 minutes ago, mister_thew said:
this sounds like the exact issue i have! havent looked at your diag but are you on 7.1.4?

Yes 7.1.4

I guess you haven't solved it? Did you try a rollback?

Quote

August 5, 2025Aug 5

Author

21 minutes ago, JorgeB said:
Enable the syslog server and post that after a crash.

Enabled, will post. But should add I've inspected before during the 'hangup' via ssh. Last entries always seem to be just the normal 'spin down' of the hard drives.

Edited August 5, 2025Aug 5 by late4473

Quote

August 7, 2025Aug 7

Author

@JorgeB Here is the syslog and some other screenshots. Died 6-Aug @ 22:10ish i.e. All dockers stopped responding. Hard reboot and all is well again....cycle continues

I could access unraid GUI - note a CPU thread locked at 100%
HTOP doesn't report the same locked CPU but is reporting the load average endlessly climbing
There is an recurring error about a missing css file. I have tried both 'light' and 'dark' settings but this file is always missing

I hope you can spot something

Thanks

syslog-192.168.50.12.log

Quote

August 7, 2025Aug 7

Community Expert

I'm afraid that there's nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it doesn't crash, start turning on the other services one by one, including the individual docker containers.

Quote

August 9, 2025Aug 9

I've had the same problem with 7.1.4 - started with 7.0.1 but was less frequent back then.

The only useful output I've seen appears in the log window if you have it open (it is not written to the log file):

Server kernel: CIFS: VFS: \\Server has not responded in 180 seconds. Reconnecting...

Most times it dies around 1:15am, hours after the last drive has spun down (noted in the log)

I need to rollback to 6.12 - same hardware was stable back then.

Quote

August 9, 2025Aug 9

Author

Thank you for the feedback. I'm not convinced by the version being the root cause because I was actually reporting the same issue on this forum about a year ago! My server dies at RANDOM times also.

I've actually today just gone over 2 days uptime (!) - unprecedented for months. I'd noticed that a 'flash backup' seemed to be running at random times during the day, so my last change was to completely remove the CONNECT plug-in. Let me monitor over the next few days. It's so good not having to hard boot every other day! #fingerscrossed

@JDGJr Do you have Connect installed and do you have auto backups switched on?

Quote

August 10, 2025Aug 10

Author

My server survived for 4 days or so - and the unresponsive behaviour at the death was slightly different. I could access GUI (all looked normal) - no reported stuck CPUs but all docker containers unresponsive. Any attempt to stop them resulting in 'server error'.

I did notice my Docker PID limit was on the default (2048) and a quick docker stats showed I was in the high 1000s so have now bumped up substantially and restarted.

Watch this space.

Quote

August 10, 2025Aug 10

I am having this same problem. Only started this last week or so though.

Quote

August 10, 2025Aug 10

On 8/8/2025 at 11:55 PM, late4473 said:
@JDGJr Do you have Connect installed and do you have auto backups switched on?

I do not have Connect installed. Auto backups run once a week, but the system had been stopping every other day.

I'm still prepping for rolling back to 6.12 and system has been more stable since I posted here. But I've been doing a lot of backups to other systems overnight when the stoppage had been common.

My system's UI looked ok - i could change pages, but I got the 'system error' message when interacting with Dockers and the spin up/down actions on the main drive list did nothing with the drives.

Edited August 10, 2025Aug 10 by JDGJr
more detail

Quote

1

August 15, 2025Aug 15

Author

I'm going to mark this as solved - my uptime is now 5 days (!) and server seems rock solid (40 docker containers, 4 VMs all happily running). Last key actions:

Uninstalled the Connect Plug-in (had some impact )
Increased Settings/Docker/PID limit to 8000 (Even though my Docker Stats PID column only sums to c1300) from the default setting. Could they have been spikes perhaps?

pids.sh

Quote

September 6, 2025Sep 6

Author
Solution

One more follow up - previous solution was not permanent - server continued to die. I'm putting this down to the unRaid FUSE bug (shfs). Resolution which has a 7 day up-time so far...

1) Upgrade to 7.2.0-beta.2 (Can't be any worse that what I've got!)

2) Change all drive references (docker container persistent drives, docker.img, libvirt.img and all VM storage paths) from /mnt/user/... to /mnt/cache/... These are all cache only drives anyway and this takes FUSE out of the equation.

I hope this helps someone!

Will I make 2 weeks of up-time??

Edited September 6, 2025Sep 6 by late4473

Quote

September 6, 2025Sep 6

Thanks for posting this. I'm still trying to prove my 7.1.4 system isn't crash prone. (I am wary of moving to the next beta)

Interestingly, I made the /mnt/user -> /mnt/cache changes late this week - mostly to become consistent. I hope that gives me the positive result you're looking for.

Question: do you see a line similar to 'CIFS: VFS: \\Server has not responded in 180 seconds. Reconnecting...' at the end of the syslog when your machine crashes?

(need to have a console window displaying syslog when it dies, as it would not be able to write to the disk log file)

Edited September 6, 2025Sep 6 by JDGJr
() addition

Quote

September 7, 2025Sep 7

Author

No - never seen the CIFS line. I only see drives spinning up and down in my syslog.

Good luck!

Quote

September 7, 2025Sep 7

I recently been dealing with the same issues. Nothing relating to upgrading the OS versions since I did my update a few weeks ago and the problem started 4 days ago.

Basically, the server freezes about every 24h to 36h. Did a complete server check and no hardware faults found. I did turn on syslog and will edit this post when I get the file. Help would certainly be greatly appreciated for now.

tower-diagnostics-20250907-0950.zip

Quote

September 8, 2025Sep 8

Community Expert

Enable the syslog server and post that after a crash.

Quote

September 9, 2025Sep 9

Syslog server is enabled and the following files were recorded after multiple crashes. Today, I moved my appdata and docker.img files to my cache drive. Server still crashed.

syslog syslog-previous

Edited September 9, 2025Sep 9 by leprechaun17

Quote

September 9, 2025Sep 9

Community Expert

You have a serious LAN loop configured, this will kill the box sooner or later

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth3 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth2 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth0 with own address as source address (addr:a4:ba:db:19:b2:be, vlan:0)

Sep 8 19:02:47 Tower kernel: br0: received packet on eth3 with own address as source address (addr:a4:ba:db:19:b2:c2, vlan:0)

Looks like you have attached all ports to the same switch (or to different switches but these are interconnected too).

I assume, you want to do some kind of bonding, but either you have not told your switch about it, or it does not support this kind of setup.

The only valid option would be to pull out 3 of those cables or to set bond mode to "active backup".

All will result in only one card beeing used anymore.

Quote

September 13, 2025Sep 13

ok thanks. I removed 3 out of 4 ethernet cables, let's see what happens.

Quote

September 18, 2025Sep 18

So the server have up for the past 6 days without interruption. Thank you for your help.

Quote

September 23, 2025Sep 23

On 9/6/2025 at 2:44 PM, late4473 said:
Change all drive references (docker container persistent drives, docker.img, libvirt.img and all VM storage paths) from /mnt/user/... to /mnt/cache/... These are all cache only drives anyway and this takes FUSE out of the equation.
Will I make 2 weeks of up-time??

Did you make it to 2 weeks?

I'm happy to report that making those changes to my 7.1.4 system has given me 16+ days of uptime! Haven't had this in months.

Quote

1

September 23, 2025Sep 23

Author

Yep - solid as a rock - like you I haven't seen that for months! I'm pretty sure it was due to taking FUSE out of the equation rather than the upgrade (my conclusion being it couldn't handle the throughput)....glad you're in the same boat. I was ready to chuck unRAID into the bin!

Quote

Unraid Server Unresponsive - Can't Survive 2 days!

Featured Replies

Solved by late4473

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)