(Solved) UnRAID 6.8.0 becomes unreachable


Recommended Posts

First of all, my disclaimer - I do not have much experience, having built UnRAID 6.8.0 system from scratch. The system goes systematically down (neither Web UI is responding, not docker containers and smb shares) and only hard reset helps. All the components are brand new. The configuration is as follows:

Fractal Design Node 804 - Corsair Rm850x, AMD Ryzen 3600, Noctua NH-D15s, 64Gb DDR4 3200Mhz Corsair Vengeance LPX RAM, AsRock X570m Pro, Zotac RTX 2070 Mini, 2x1Tb Samsung 970 Evo NVMe Cache, 2x2tb Raid 1 SSD as Unassigned devices, 10 HDDs with 2 Parities.

 

Initially I thought that I messed with plugins installation and some configs. Yesterday I reinstalled everything from scratch only keeping original network and array configuration files. Today the server is unreachable again.

I attach the lates diagnostics files I have. Will also try to get the latest file from today once I reach the server that is currently stuck and cannot be reached remotely again. Any advice and help is appreciated.

 

kk-server-diagnostics-20200107-2015.zip kk-server-diagnostics-20200107-1730.zip

Screenshot 2020-01-08 at 18.33.08.png

Edited by andrey_kk
Link to comment

Though 3rd gen Ryzen should have this fixed there have been some reports it can still be a problem, so worth trying:

 

Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates.

 

More info here:
https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/

 

 

Link to comment
2 hours ago, johnnie.black said:

Though 3rd gen Ryzen should have this fixed there have been some reports it can still be a problem, so worth trying:

 

Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates.

 

More info here:
https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/

 

 

Thanks a lot for the tip! Found the setting and changed. Will Monitor and report. I use latest ASRock BIOS version 2.3

Link to comment

Update: after 5 hours of uptime it is down again but differently. SMB works, WebUI is reachable but Diagnostics report cannot be generated, starts but never ends. None of the Docker containers are openable and Docket itself. Docker in settings cannot be opened either. Reboot does not work, nor from Web UI, nor from console. Only hard reset helps.

What else can I look for?

Link to comment
13 minutes ago, johnnie.black said:

There are multiple general protection errors, start by running memtest, also make sure RAM isn't overclocked, make sure you're respecting max support speed depending on system config:

 

867800372_3rdgen.jpg.27a8f666f3c5d384228102dad7313be3.jpg

 

 

Thanks! Will do tonight. RAM is 4x16gb DDR4-3200 in stock XMP Profile. Will do memtest and try to find the ranking. Assume I definitely need to go down to 2993 or 2667 MHz. Will report accordingly.

Link to comment

Updated: report as promised- switched to Auto setting for memory, it turned to 2133 MHz. Since then now 14 hours uptime without any issues. Will monitor and report later again. Maybe will try to increase memory frequency a bit later within the allowed limits.

For some reason cannot run memtest86. At boot I choose it, nothing happens, system reboots and again to the same normal boot status.

Thanks a lot for instructions and advice! Great support!

Link to comment
13 hours ago, juan11perez said:

For info I have ryzen 3900x with same memory sticks and had to bring them down to 3000 for stability. 

Oh, good to know, thanks. Is it at 1,35V or 1,2V?

I have my main PC with 3900x and Kingston HyperX memory (is quite expensive though) that is 3600 MHz stock XMP profile but I run it 3800 with Infinity Fabric overclocked to 1900 MHz and all is stable. 

Link to comment
7 minutes ago, juan11perez said:

Sorry oversimplified response. I meant 3200mhz. this is what i have:

https://www.memorybenchmark.net/ram.php?ram=G+Skill+Intl+F4-3200C16-16GTZR+16GB&id=12243

each 16GB; I have 64Gb total. 

.

Seems different from mine. I have Corsair LPX. They are 1.2V at stock profile but go to 1,35V in XMP at 3200 MHz. Yours seem to be GSkill, they may be different.

Edited by andrey_kk
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.