[6.12.4] Server hangs once a day since updating to 6.12.4

David Grenon · February 14

Hi, I'm having the same issue here with my new install on 6.12.6 that at somepoint downgraded (manually) to 6.12.4.

This is a new server from maybe 1 month ago and it WAS NOT doing this at first. For like 1-2 weeks.

Then issues began:

- The server will freeze every 1-2 days

- No log when it freeze

- I was there once when it happened (connected to the UI while watching the dashboard) the CPU goes up and then nothing works

- On the hardware monitor I still see (login: ) I can type the username then press enter. Then nothing....

- To reboot I had luck 1-2 times by pressing power button once then the server would go in shutdown mode (seeing it on the cli monitor) then forcinf shutdown after 90 seconds. Tested to give plenty of time and it doesn't work I have to hold power button for the server to close. Now most of the time I see nothing on the screen when I press the power button.

I added syslog on the boot drive and the only thing I see is a black hole during the lockups (see attached)

I did multiple diags but here's the latest one (see attached)

I also installed Netdata console to see if I could have logs or graphs of what is actually happening right before the crash and I got some data out of it !:
This morning the issue seems to happen around 5:40AM and at this time all kinds of things seems to happen in terms of ram utulization and CPU usage. See attached !

Please help us. Having to force shutdown every 1-2 days is not good for the systems, devices and harddrives

I might try 6.12.7rc2 but if it doesn't work. I might go back to 6.11.5 where my last server didn't have any issue either.

tower-diagnostics-20240214-0727.zip syslog (1) Netdata Graphs of crash Feb14 5.40AM.7z

trurl · February 14

Have you done memtest?

David Grenon · February 14

29 minutes ago, trurl said:

Have you done memtest?

No I didn't.

I assume it is good because it my old gaming PC and that thing ran flawless.

Oh the other things I forgot to mention is my Hardware listing :

[8086:191f]   00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
[8086:1901]   00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
[8086:1912]   00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
[8086:a12f]   00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
[8086:a13a]   00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
[8086:a102]   00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
[8086:a167]   00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
[8086:a16a]   00:1b.3 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #20 (rev f1)
[8086:a110]   00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
[8086:a118]   00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
[8086:a145]   00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller (rev 31)
[8086:a121]   00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
[8086:a170]   00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
[8086:a123]   00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
[8086:15b8]   00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
[1000:0072]   01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
[1b21:0612]   03:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
[1b21:1242]   04:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
[144d:a802]   05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01)

- I disabled C-State in BIOS

- I disables XMP

- I reset the overclocking to Default

- All my dockers are in Docker custom network type: ipvlan

I could run a mem test, but I doubt its this.
Do you think the network card bug applies to me ?

trurl · February 14

3 hours ago, David Grenon said:

other things I forgot to mention is my Hardware listing

We can see that in your diagnostics in system/lspci.txt

3 hours ago, David Grenon said:

could run a mem test, but I doubt

If only to eliminate that. Better safe than sorry.

3 hours ago, David Grenon said:

Do you think the network card bug applies to me ?

Which bug are you referring to?

David Grenon · February 14

10 minutes ago, trurl said:

Which bug are you referring to?

Realtek network card bug

trurl · February 14

48 minutes ago, David Grenon said:

Realtek network card bug

You don't have one according to your diagnostics and the list you posted above.

bastl · February 14

Small update from my side. As long as I close any active session to Unraids web-ui from my main desktop, the server won't freeze. If I activly manage something on the server, no freezes. It only happens when I'am logged in on the web-ui from my Win10 PC and the PC isn't really in use. But even if on idle, it happens randomly only every 2-3 days. I'am still not sure how to fix this. 😒

David Grenon · February 14

19 minutes ago, bastl said:

Small update from my side. As long as I close any active session to Unraids web-ui from my main desktop, the server won't freeze. If I activly manage something on the server, no freezes. It only happens when I'am logged in on the web-ui from my Win10 PC and the PC isn't really in use. But even if on idle, it happens randomly only every 2-3 days. I'am still not sure how to fix this. 😒

It's interesting to me you're mentioning this because 2 times the issue happened when I was away (not at home) the server was obviously down all day because I couldn't hard reboot manually and both times I came back from the job (back at home) the server went up at the exact time I came back !

The second time I even closed my phone before coming home just to validate this, but there might be something else that triggered the server to come back up and running, like maybe my job laptop in my backpack that is sometimes not in sleep That could have the webui open in cache or Google remote desktop tab on my VM (different host) with the unraid webui opened.

I basically open my unraid UI on multiple devices and that might have something to do with it...

How to trouble shoot this though.... No logs...

To be clear both times I came back at home and the server started working again like nothing happened. Plex and everything were up!

trurl · February 14

1 minute ago, David Grenon said:

How to trouble shoot this though.... No logs...

setup syslog server

David Grenon · February 14

3 minutes ago, trurl said:

setup syslog server

It is already configured. You got all the logs since I set them up in my flash drive. But as you see, there's no log when the freeze begin. The only thing I can check is the history of Netdata that I installed on my server that clearly show things in CPU/RAM right before the issue happen.

Are you suggesting that putting logs elsewhere (not on boot drive) would give me more infos?

thanks

trurl · February 14

4 minutes ago, David Grenon said:

putting logs elsewhere (not on boot drive) would give me more infos?

no

David Grenon · February 16

On 2/14/2024 at 1:18 PM, trurl said:

If only to eliminate that. Better safe than sorry.

Voila!

What's next ?

I think I'm gonna try the new release...nothing to loose

David Grenon · February 16

Oh and btw I had the issue today too..(didn't had it yesterday) and I tried to close all browser, reopen them from all my devices and nothing came up like the earlier post. Maybe those 2 times where I came back home and at the same time everything came backup like magic were lucky/unlucky idk.

David Grenon · February 16

Not sure if related to update, but with the new 6.12.8 ? or 7RC2 update yesterday and it already crashed. I might check tonight when it actually happened (if its related to a thing that is running inside my containers that triggers after X hours or at specific time and crash the whole thing...), but If I don't see anything like a pattern I'm straight up reverting to 6.11.5 which was not causing issues with my old server (had the issue with my old server issue with 6.12.x +)

I'm a little worried to hard reboot my NAS every freaking day, no parity check because 22tb takes more time than the time before the next crash. I don't feel well these days...

At this point, if 6.11.5 on my new server does the same thing I might check with all containers stopped...

If this doesn't work... I think I might consider another OS until 6.13.x. This is ridiculous. A NAS is supposed to be stable. And from what I see (and for different reasons) multiple users had issue with 6.12.x.

Anything to help me ?

JorgeB · February 16

6 minutes ago, David Grenon said:

Anything to help me ?

If there's nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

David Grenon · February 17

19 hours ago, JorgeB said:

If there's nothing relevant logged, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Yeah, also my colleague suggested me that I could export process usage ever 15 minutes to see in the 1hour period that cpu and ram ramp up what process is working and maybe pinpoint the issue.

I also updated to 6.12.8 and rebooted this morning because the server hanged again.

If it crash today I'll try to run without docker and vms services enabled and see..

Thanks for the reply.

David Grenon · February 19

Didn't get the bug until this morning...

One thing to mention is that I open a laptop Webgui interface yesterday afternoon to do things (open dockers webui)

That laptop went on sleep.

I had my desktop with the Wegui also opened but for the whole weekend.

Netdata is able to show me these logs(see attachment)
(that's what happen before each crashes)

When it will crash I think I'll stop everything but Plex docker to see and if it happen again I'll disable everything. If it happen again.. at this point I'll go 6.11.5

I guess its a container issue, but even if it is... a Docker glitch/missconfig or w/e shouldn't lock the whole system to the point that we need to hard reset. No ?

JorgeB · February 19

5 minutes ago, David Grenon said:

I guess its a container issue, but even if it is... a Docker glitch/missconfig or w/e shouldn't lock the whole system to the point that we need to hard reset. No ?

It shouldn't, but it's been known to happen.

David Grenon · February 19

I also sometimes have this in syslogs-Previous:

Feb 19 02:03:42 Tower kernel: PMS LoudnessCmd[26943]: segfault at 0 ip 000014f36c8db080 sp 000014f36738a0c8 error 4 in libswresample.so.4[14f36c8d3000+18000] likely on CPU 7 (core 3, socket 0)
Feb 19 02:03:42 Tower kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06
Feb 19 02:04:07 Tower kernel: PMS LoudnessCmd[27486]: segfault at 0 ip 0000148a53886fc3 sp 0000148a4e1380c8 error 4 in libswresample.so.4[148a53885000+18000] likely on CPU 4 (core 0, socket 0)
Feb 19 02:04:07 Tower kernel: Code: 0f 00 00 00 0f 85 73 ff ff ff 48 f7 c6 0f 00 00 00 0f 85 66 ff ff ff 48 8d 34 56 48 8d 3c 97 48 f7 da 66 0f 6f 2d 7d 64 ff ff <66> 0f 6f 04 56 66 0f 6f 4c 56 10 66 0f ef d2 66 0f ef db 66 0f 61

I've read somewhere else that I shouldn't worry about it but do I ?

these 2 entries are the only one showing before the crash...

JorgeB · February 19

Those I see in a lot of diags, so I assume they are harmless.

David Grenon · February 19

1 hour ago, JorgeB said:

It shouldn't, but it's been known to happen.

How can I prevent this. Is there a way to specificly say to unraid like:

Hey, keep at least 2-4gb of ram /2 core cpu UNUSED by anything but the system itself so the Unraid don't freeze and when everything hang I can troubleshoot ?

Or what to do on my side to help you understand what docker (i don't run any VMS) or plugins would be missconfigured ? Printscreens of every docker configs ?

would that help you identify ?
Thank you,

JorgeB · February 19

3 minutes ago, David Grenon said:

would that help you identify ?

You can start the containers one at a time a retest to see if you find the culprit.

Terebi · February 19

I had a very similar issue, and even though my memtest for 8 hours passed, it was still bad/misconfigured ram.

I took out one of the ram dimms, and it completely resolved my problem

David Grenon · February 19

41 minutes ago, Terebi said:

I had a very similar issue, and even though my memtest for 8 hours passed, it was still bad/misconfigured ram.

I took out one of the ram dimms, and it completely resolved my problem

I understand this and might worth a try.

The thing is I had this specific issue with my old server using other ram stick and now this server (old) is running flawless with 6.11.5.

As I also told earlier, this was my old gaming PC and never had any issues with it.

David Grenon · February 19

1 hour ago, JorgeB said:

You can start the containers one at a time a retest to see if you find the culprit.

If it crash again, I'll start only Plex

Is pluggins something to be worried about ?

What to do if I only want to start Plex Docker, and nothing else?
How to make sure that nothing else is running even in pluggins ?

I actually have these running and forgewt about netdata because it was installed after the issue.

[6.12.4] Server hangs once a day since updating to 6.12.4

User Feedback

Recommended Comments

David Grenon 0

Link to comment

trurl 2,964

Link to comment

David Grenon 0

Link to comment

trurl 2,964

Link to comment

David Grenon 0

Link to comment

trurl 2,964

Link to comment

bastl 208

Link to comment

David Grenon 0

Link to comment

trurl 2,964

Link to comment

David Grenon 0

Link to comment

trurl 2,964

Link to comment

David Grenon 0

Link to comment

David Grenon 0

Link to comment

David Grenon 0

Link to comment

JorgeB 7,735

Link to comment

David Grenon 0

Link to comment

David Grenon 0

Link to comment

JorgeB 7,735

Link to comment

David Grenon 0

Link to comment

JorgeB 7,735

Link to comment

David Grenon 0

Link to comment

JorgeB 7,735

Link to comment

Terebi 37

Link to comment

David Grenon 0

Link to comment

David Grenon 0

Link to comment

Join the conversation