Hi everyone,
I'm ending up asking for some help as my UNRAID system is unstable, and after 3 months of searching through the forum, wasn't able to identify the cause of my problem.
Here is some context and specs:
System:
M/B: ASRock B660M Steel Legend
CPU: 12th Gen Intel® Core™ i3-12100
RAM: 2x4GB Crucial DDR4 2133MHz
Array: 4x6TB Toshiba NAS HDD (1 parity+3 disks)
Cache: WD 1Tb NVMe SSD
The current OS version is 6.10.3
Plugins:
Community applications, GPU statistics, Intel GPU Top
Dockers:
Plex
Issue:
The system crashes after random time according to the following sequence :
1. the WebGUI is no longer accessible (login page not reachable)
2. if I see it early enough, I can still use the keyboard on the NAS to do some command. But powerdown has no effect as it seems to indefinitely loop
3. if I do not see it early, the keyboard and screen connected to the NAS are frozen as well, and no command can be done locally.
4. in any case, a hard powerdown/reset is necessary.
I've been following these users who had similar issues:
In september, my NAS was taking my all network down, just like this user. I was in 6.11. I downgraded to 6.10 AND also connected the NAS through only 1 GiG port on my routern and no longer the 2.5G port. I don't know which solved what, but my network was fine after that, but UNRAID kept crashing.
I found this suggesting the RAM was faulty:
I tested mine thoroughly (10 passes with MemTest), and found nothing.
Others found the dockers were the culprit with a bad setting with IPs:
I changed my docker settings from macvlan to ipvlan. It has been stable for some 15 days, but after that, crashed again.
During all this time, I had the log being saved into the USB flashdrive. Here is the full one (66Mb, sorry...): https://drive.google.com/file/d/1747Qm_1qJOaK1x9BwnpWFg7e-eiHGcSE/view?usp=share_link
I thought also this could have an issue with the mover, but it seems not, as the crash occurs at random times.
I also checked that all array disks were XFS.
In the syslog, when the system starts crashing, this loop happen every 3 minutes (starting line 41361): error bloc.txt
I tried to troubleshoot with what I could find on "rcu_sched self-detected stall on CPU", but didn't have success.
At some point, it seems that the system also executes a memory test, that fails everytime (e.g. line 693 472): mem test fail.txt
I really don't know what causes these crashes, and this loop to occur. Hopefully I've been clear enough and you can help out. If you need anymore info or details, please ask.
Thanks everyone!