Jump to content

UNRAID random crashes with same error loop in log


Go to solution Solved by thibfighter,

Recommended Posts

Hi everyone,

 

I'm ending up asking for some help as my UNRAID system is unstable, and after 3 months of searching through the forum, wasn't able to identify the cause of my problem.

 

Here is some context and specs:

System:

M/B: ASRock B660M Steel Legend

CPU: 12th Gen Intel® Core™ i3-12100

RAM: 2x4GB Crucial DDR4 2133MHz

Array: 4x6TB Toshiba NAS HDD (1 parity+3 disks)

Cache: WD 1Tb NVMe SSD

The current OS version is 6.10.3

 

Plugins:

Community applications, GPU statistics, Intel GPU Top

 

Dockers:

Plex

 

Issue:

The system crashes after random time according to the following sequence :

1. the WebGUI is no longer accessible (login page not reachable)

2. if I see it early enough, I can still use the keyboard on the NAS to do some command. But powerdown has no effect as it seems to indefinitely loop

3. if I do not see it early, the keyboard and screen connected to the NAS are frozen as well, and no command can be done locally.

4. in any case, a hard powerdown/reset is necessary.

 

I've been following these users who had similar issues:

In september, my NAS was taking my all network down, just like this user. I was in 6.11. I downgraded to 6.10 AND also connected the NAS through only 1 GiG port on my routern and no longer the 2.5G port. I don't know which solved what, but my network was fine after that, but UNRAID kept crashing.

 

I found this suggesting the RAM was faulty: 

I tested mine thoroughly (10 passes with MemTest), and found nothing.

 

Others found the dockers were the culprit with a bad setting with IPs: 

I changed my docker settings from macvlan to ipvlan. It has been stable for some 15 days, but after that, crashed again.

 

 

During all this time, I had the log being saved into the USB flashdrive. Here is the full one (66Mb, sorry...): https://drive.google.com/file/d/1747Qm_1qJOaK1x9BwnpWFg7e-eiHGcSE/view?usp=share_link

 

I thought also this could have an issue with the mover, but it seems not, as the crash occurs at random times.

I also checked that all array disks were XFS. 

 

In the syslog, when the system starts crashing, this loop happen every 3 minutes (starting line 41361): error bloc.txt

I tried to troubleshoot with what I could find on "rcu_sched self-detected stall on CPU", but didn't have success.

At some point, it seems that the system also executes a memory test, that fails everytime (e.g. line 693 472): mem test fail.txt

 

I really don't know what causes these crashes, and this loop to occur. Hopefully I've been clear enough and you can help out. If you need anymore info or details, please ask.

 

Thanks everyone!

Link to comment

NEVERMIND 😛

Crashed again after 10 days. "Funnily" enough, I've had new lines of errors saying my flash drive is blacklisted: 

 

Nov 23 21:23:42 NAS emhttpd: Unregistered - flash device blacklisted (EBLACKLISTED2)
Nov 23 21:23:42 NAS kernel: traps: udevadm[11688] general protection fault ip:149a422552f4 sp:7fff93c82b70 error:0 in ld-2.33.so[149a42248000+25000]
Nov 23 21:23:43 NAS emhttpd: Basic key detected, GUID: 0781-5583-0001-200628116535 FILE: /boot/config/Basic.key

 

Of what I found on the forum, this happened to some people after updating Unraid, and sometimes windows repairs works. I did not do any kind of update and windows did not find any errors in my case...

 

In the end, I still have the "general protection fault" (line Nov 26 03:03:39 in the syslog), which locks my system: I can access the files on the NAS and use plex but local command to shutdown is impossible. Hard shutdown was necessary.

 

Thank you for helping again!

syslog.txt

Link to comment
  • 1 month later...
  • Solution

Hi all !

Problem is solved: in the end, the RAM I had originally installed was causing all the errors.

 

The original RAM was an "old" 2x4Go 2133MHz. I guess the timing was incompatible with Intel 12th Gen CPUs, because all MEM tests I ran returned errorless.

 

I ordered a new 16Go stick, 3200MHz, and the server has been running flawlessly for 1 continuous month. I never had this stability before!

 

Thanks again to all who guided my researches!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...