[6.12.3] Unknown server hangs (must be physically shut down or restarted manually)

a632079 · August 3, 2023

Today I had the hang again, exactly when it happened - not sure (possibly after 23:00?) The journal disappeared from here, but I can't see any information.

Below is the full syslog and diagnostic information.ouo-diagnostics-20230804-0517.zip

syslog.7z

shaunvis · August 5, 2023

I've tried 6.12, 6.12.1, 6.12.2 and had the same issues. Lockups, hangups, can't access the UI, etc.

There was one random release note that said to change docker settings from macvlan to ipvlan but that had no effect, I still had the same issues. And if something like that needs to be changed for an update, it should be done automatically or have big flashing lights telling you to do it.

Ultimately I kept going back to 6.11.5 and am staying there. The support I've got for 6.12 seems to be "Works fine for me" and that's about it. If you look around, a lot of people have issues with 6.12 and there doesn't seem to be any official acknowledgement that 6.12 just doesn't work for many people and not much help figuring out the cause.

With the issues I've been having I'm starting to regret moving form Synology to Unraid for my main NAS and may go back.

Mikki · August 10, 2023

I have the same problem, im going back to 6.11.5 as i can not keep 6.12.3 running stable.

In the beginning i was not able to have the system running for more than 1 hour after array start did not mather if that was 5 hours or 20 hours, 1 hour after array start a complete freeze, removed plugins and redid docker containers changed docker network from macvlan to ipvlan, and got it running but not stable. The UI still had problems freeze, not loading, but the system was running. Hope going back to 6.11.5 works for me.

bwech · August 11, 2023

I too have had several (at least 6 or 7) hard stops since updating to 6.12.3. I skipped .1 and .2 and went straight from a 6.11 release to 6.12.3.

Also attaching my diagnostics if it helps.

ion-diagnostics-20230811-1128.zip

Fantucie · August 12, 2023

Got the same issue here, server on latest restart or just shutdown without me doing anything on it time to time...

Nothing revelant much is said in the Dianostics / Syslog but if that can help I post mine here to.fantuserver-diagnostics-20230812-1925.zip

Add to rollback to 6.11.5 where I have no sudden restart/crash for no reason.

Edited August 12, 2023 by Fantucie
clarification

Railline · August 12, 2023

I got the same problem !

Holaf · August 12, 2023

It happened to me a few days ago ... (unraid 6.12.3)
There was a kind of reboot at around 19h30 as you can see on this extract of the log:

Aug 10 19:10:01 hlf-data3 mergerfs[24560]: running basic garbage collection
Aug 10 19:10:01 hlf-data3 mergerfs[24579]: running basic garbage collection
Aug 10 19:10:02 hlf-data3 mergerfs[24559]: running basic garbage collection
Aug 10 19:25:01 hlf-data3 mergerfs[24560]: running basic garbage collection
Aug 10 19:25:01 hlf-data3 mergerfs[24579]: running basic garbage collection
Aug 10 19:25:02 hlf-data3 mergerfs[24559]: running basic garbage collection
Aug 10 19:31:56 hlf-data3 kernel: Linux version 6.1.38-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #2 SMP PREEMPT_DYNAMIC Mon Jul 10 09:50:25 PDT 2023
Aug 10 19:31:56 hlf-data3 kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Aug 10 19:31:56 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Aug 10 19:31:56 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Aug 10 19:31:56 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Aug 10 19:31:56 hlf-data3 kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Aug 10 19:31:56 hlf-data3 kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
Aug 10 19:31:56 hlf-data3 kernel: signal: max sigframe size: 1776
Aug 10 19:31:56 hlf-data3 kernel: BIOS-provided physical RAM map:

then there was another at 21h58 :

Aug 10 19:33:12 hlf-data3 nmbd[11175]: [2023/08/10 19:33:12.631334,  0] ../../source3/nmbd/nmbd_become_lmb.c:398(become_local_master_stage2)
Aug 10 19:33:12 hlf-data3 nmbd[11175]:   *****
Aug 10 19:33:12 hlf-data3 nmbd[11175]:   
Aug 10 19:33:12 hlf-data3 nmbd[11175]:   Samba name server HLF-DATA3 is now a local master browser for workgroup WORKGROUP on subnet 10.10.0.5
Aug 10 19:33:12 hlf-data3 nmbd[11175]:   
Aug 10 19:33:12 hlf-data3 nmbd[11175]:   *****
Aug 10 19:47:01 hlf-data3 kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Aug 10 19:47:02 hlf-data3 kernel: nvidia-uvm: Loaded the UVM driver, major device number 241.
Aug 10 19:47:02 hlf-data3 kernel: NVRM: Persistence mode is deprecated and will be removed in a future release. Please use nvidia-persistenced instead.
Aug 10 21:58:49 hlf-data3 kernel: Linux version 6.1.38-Unraid (root@Develop-612) (gcc (GCC) 12.2.0, GNU ld version 2.40-slack151) #2 SMP PREEMPT_DYNAMIC Mon Jul 10 09:50:25 PDT 2023
Aug 10 21:58:49 hlf-data3 kernel: Command line: BOOT_IMAGE=/bzimage initrd=/bzroot
Aug 10 21:58:49 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Aug 10 21:58:49 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Aug 10 21:58:49 hlf-data3 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Aug 10 21:58:49 hlf-data3 kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Aug 10 21:58:49 hlf-data3 kernel: x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'compacted' format.
Aug 10 21:58:49 hlf-data3 kernel: signal: max sigframe size: 1776

Later on I realized that my server was unresponsive (nor web or ssh) so I had to do a hard reset.
Everything seems fine for me till then, so I consider myself lucky, but it makes me thinks that this release is not reliable.
Here is also my diagnostics file in case it can help : hlf-data3-diagnostics-20230812-2205.zip

johnny2678 · August 13, 2023

Same here on 6.12.3. Multiple hard restarts where I had zero issue on 6.11.x and no config/device changes to the server. I even had to allocate more CPU/RAM to a windows VM that previously ran fine under the exact same workload on 6.11.x

Not really sure where to start but happy to provide more info if requested. Here's my diags: 15620-beast-diagnostics-20230813-0822.zip

a632079 · August 19, 2023

Hang issue this time, I have found something maybe related?

* Maybe driver in Linux have issue? (kernel NULL pointer dereference, address: 0000000000000470)

syslog.txt

NewDisplayName · August 19, 2023

And again posted over 10 days ago, no response. Thats a stable release.

They could have changed it to beta as they noticed ppl having problems months ago, but no, every day new users update and have to face this issues. I DONT GET IT.

Edited August 19, 2023 by nuhll

SimonF · August 19, 2023

10 hours ago, a632079 said:

Hang issue this time, I have found something maybe related?

* Maybe driver in Linux have issue? (kernel NULL pointer dereference, address: 0000000000000470)

syslog.txt

I would maybe suspect a memory issue have you done a memtest?

NewDisplayName · August 19, 2023

Just now, SimonF said:

I would maybe suspect a memory issue have you done a memtest?

Maybe all of those users have corrupt ram, which never gave any problems < 6.12 or what?

a632079 · August 19, 2023

2 minutes ago, SimonF said:

I would maybe suspect a memory issue have you done a memtest?

i have done memtest at about two weeks ago. I tested it around 3 hours, a green PASS branner is shown at the center of bottom screen.

and the new RAM is on the way. When the new ram installed, I will continue to test, observing if this issue could be solved.

SimonF · August 19, 2023

6 minutes ago, nuhll said:

Maybe all of those users have corrupt ram, which never gave any problems < 6.12 or what?

This reponse was for the syslog I have just reviewed. Each one may not have the same faults. Please feel free to support and review code in the github repo.

NewDisplayName · August 19, 2023

2 minutes ago, SimonF said:

This reponse was for the syslog I have just reviewed. Each one may not have the same faults. Please feel free to sjpport and review code in the github repo.

Oh sorry, im the wrong person.

I posted my crashs months ago, googled errors, posted links to known issues related to linux and those errors - without a response (and before their team is again flaming me, lets just say not the response i was hoping for) - cant say how helpful it was tho.

Edited August 19, 2023 by nuhll

SimonF · August 19, 2023

2 minutes ago, a632079 said:

i have done memtest at about two weeks ago. I tested it around 3 hours, a green PASS branner is shown at the center of bottom screen.

and the new RAM is on the way. When the new ram installed, I will continue to test, observing if this issue could be solved.

Have you tried moving to ipvlan for dockers if not aleady?

SimonF · August 19, 2023

1 minute ago, nuhll said:

Oh sorry, im the wrong person.

I posted my crashs months ago, googled errors, posted links to known issues related to linux and those errors - without a response (and before their team is again flaming me, lets just say not the response i was hoping for) - cant say how helpful it was tho.

Fyi I do not work for Limetech this is my own time.

NewDisplayName · August 19, 2023

Just now, SimonF said:

Fyi I do not work for Limetech this is my own time.

All cool. Im not saying that i blame you. I blame them.

a632079 · August 19, 2023

16 minutes ago, SimonF said:

Have you tried moving to ipvlan for dockers if not aleady?

Yes. I test both ipvlan and macvlan.I think this is not the key factor in this issue. According to these two mouths observe(im a insider user), I found that: Although both WEBUI and SSH are unresponsive in appearance, there are some differences several times.

Worst of all, the machine is completely dead, unable to ping, and there is no response to pressing the physical shutdown button (you must press and hold the shutdown button to shut down).

In the second case, the probability of being able to ping is about 50%. At this time, press the shutdown button and the whistle will sound, but the keyboard cannot input, and the screen will display the information, waiting all procsses exit for xxx seconds, etc , of normal shutdown after pressing the shutdown button.

The third case is the mildest and the least frequent, that is, simply nginx or ssh is dead. I have now solved it by adding daily scheduled tasks using user scripts and restarting nginx and php-fpm regularly. (Even if SSH hangs up, it can be opened through the Terminal of WEBUI)

I don't know exactly what might be causing this to hang, so it seems like this has only been around since production. I don't experience this hang when using 6.12.0-rc.xx.
When upgrading the official version, I upgraded directly from 6.12.0 to 6.12.2 or 6.12.3 (I can’t remember exactly, but if it’s 6.12.2, then I also upgraded to 6.12 within 2 days .3 up). So I think it may be caused by the updated content of these few times
?

Maybe we can find the reason from these changes (and I am more inclined to internal bugs in the underlying components that Unraid depends on, so this is the reason we are most confused)

Also, say something irrelevant. Macvlan is indeed very problematic. Another machine I run directly on ArchLinux, macvlan often causes the default bridge network of docker to have no network problem (accessing any IP is directly lost), which also gives me quite a headache.

Edited August 19, 2023 by a632079

NewDisplayName · August 19, 2023

My update also destroyed my network.cnf maybe look there if something is suspicios.

a632079 · August 19, 2023

1 hour ago, nuhll said:

And again posted over 10 days ago, no response. Thats a stable release.

They could have changed it to beta as they noticed ppl having problems months ago, but no, every day new users update and have to face this issues. I DONT GET IT.

🤥Frankly speaking, I also don't know what the official team is doing. If they are working on a new test version, perhaps releasing it in the "next" channel for us to test whether the issues have been resolved might be a better idea? However, in reality, they haven't done that.

Another issue related to Docker IPv6 in version 6.12 was also ignored by the official team until I audited the script myself and found the root cause of the problem. At that point, they finally responded to that issue.

I'm not an expert in operating systems and don't have access to unraid's source code. Moreover, I'm just an ordinary consumer who has paid for the license, and I'm not obligated to help unraid solve these issues 😌.

Edited August 19, 2023 by a632079

NewDisplayName · August 19, 2023

1 minute ago, a632079 said:

🤥Frankly speaking, I also don't know what the official team is doing. If they are working on a new test version, perhaps releasing it in the "next" channel for us to test whether the issues have been resolved might be a better idea? However, in reality, they haven't done that.

Another issue related to Docker IPv6 in version 6.12 was also ignored by the official team until I audited the script myself and found the root cause of the problem. At that point, they finally responded to that issue.

However, I'm not an expert in operating systems and don't have access to unraid's source code. Moreover, I'm just an ordinary consumer who has paid for the license, and I'm not obligated to help unraid solve these issues 😌.

Yeah, communication was always their problem. We dont know if they know what the issue is, or if they have no time, or what ever.

But that doesnt mean they need to have that as stable up, letting more and more ppl update to "STABLE" (!!!!) while there are clearly HEAVY bugs like freezing and crashing. I mean, wtf?

SimonF · August 19, 2023

5 hours ago, a632079 said:

I'm not obligated to help unraid solve these issues

Yes I know.

Source is on github. I know they have been looking into Macvlan issue there was a mention of a possible rc release soon.

a632079 · September 16, 2023

Hello, everyone!

It's been almost a month since I updated to 6.12.4-rc and subsequently to 6.12.4. After some configuration, I believe I've resolved the suspension issue (although it's just a workaround).

Let me provide a brief overview of your machine's environment. My machine is equipped with two network interfaces, and Unraid has the Bond mode enabled by default.

It seems that following the official instructions alone might not be sufficient. I opted to disable Bond mode altogether and revert to the most basic network card separation status, taking a more direct approach to achieve my desired configuration.

With the above configuration, I am happy to say that my machine has been running stably for 8 days twice (a breack becuase update from RC to stable version).

I hope my experience can give you (developers, users) an inspiration. If you still encounter this problem, you can try disabling Bond.

Edited September 16, 2023 by a632079

Kev600 · October 13, 2023

I have this issue right now on 6.12.4.. I've been moving about 4.5TB of data from a USB Unassigned Device to another NAS Device over LAN; and also running a parity check..
I have 24GB of RAM, which I will MEMTest ASAP.. (The transfers are still in progress, despite most of the Console being unresponsive.. )

The logs are filled/filling with memory related errors.

I have stopped Docker and VM services.

Terminal window terminates a few seconds after opening it.

My Log file is set to 128Mb.

I recently migrated my Unraid server to new hardware, then upgraded to 6.12.4. (from 6.10 or 6.11).
*I will disable Bonding on my NIC Interface after my transfers complete.

kbnas-syslog-20231013-2221.zip

Edited October 13, 2023 by Kev600

[6.12.3] Unknown server hangs (must be physically shut down or restarted manually)

User Feedback

Recommended Comments

a632079 7

Link to comment

shaunvis 3

Link to comment

Mikki 3

Link to comment

bwech 0

Link to comment

Fantucie 20

Link to comment

Railline 0

Link to comment

Holaf 36

Link to comment

johnny2678 9

Link to comment

a632079 7

Link to comment

NewDisplayName 117

Link to comment

SimonF 979

Link to comment

NewDisplayName 117

Link to comment

a632079 7

Link to comment

SimonF 979

Link to comment

NewDisplayName 117

Link to comment

SimonF 979

Link to comment

SimonF 979

Link to comment

NewDisplayName 117

Link to comment

a632079 7

Link to comment

NewDisplayName 117

Link to comment

a632079 7

Link to comment

NewDisplayName 117

Link to comment

SimonF 979

Link to comment

a632079 7

Link to comment

Kev600 2

Link to comment

Join the conversation