andrey_kk Posted January 9, 2020 Share Posted January 9, 2020 (edited) First of all, my disclaimer - I do not have much experience, having built UnRAID 6.8.0 system from scratch. The system goes systematically down (neither Web UI is responding, not docker containers and smb shares) and only hard reset helps. All the components are brand new. The configuration is as follows: Fractal Design Node 804 - Corsair Rm850x, AMD Ryzen 3600, Noctua NH-D15s, 64Gb DDR4 3200Mhz Corsair Vengeance LPX RAM, AsRock X570m Pro, Zotac RTX 2070 Mini, 2x1Tb Samsung 970 Evo NVMe Cache, 2x2tb Raid 1 SSD as Unassigned devices, 10 HDDs with 2 Parities. Initially I thought that I messed with plugins installation and some configs. Yesterday I reinstalled everything from scratch only keeping original network and array configuration files. Today the server is unreachable again. I attach the lates diagnostics files I have. Will also try to get the latest file from today once I reach the server that is currently stuck and cannot be reached remotely again. Any advice and help is appreciated. kk-server-diagnostics-20200107-2015.zip kk-server-diagnostics-20200107-1730.zip Edited November 1, 2020 by andrey_kk Quote Link to comment
JorgeB Posted January 9, 2020 Share Posted January 9, 2020 Though 3rd gen Ryzen should have this fixed there have been some reports it can still be a problem, so worth trying: Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ Quote Link to comment
andrey_kk Posted January 9, 2020 Author Share Posted January 9, 2020 2 hours ago, johnnie.black said: Though 3rd gen Ryzen should have this fixed there have been some reports it can still be a problem, so worth trying: Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ Thanks a lot for the tip! Found the setting and changed. Will Monitor and report. I use latest ASRock BIOS version 2.3 Quote Link to comment
andrey_kk Posted January 9, 2020 Author Share Posted January 9, 2020 Update: after 5 hours of uptime it is down again but differently. SMB works, WebUI is reachable but Diagnostics report cannot be generated, starts but never ends. None of the Docker containers are openable and Docket itself. Docker in settings cannot be opened either. Reboot does not work, nor from Web UI, nor from console. Only hard reset helps. What else can I look for? Quote Link to comment
JorgeB Posted January 9, 2020 Share Posted January 9, 2020 Try enabling syslog server/mirror to see if catches anything. Quote Link to comment
andrey_kk Posted January 9, 2020 Author Share Posted January 9, 2020 6 minutes ago, johnnie.black said: Try enabling syslog server/mirror to see if catches anything. Thanks! Will try. Here is another screen shot after hard reset. It did not boot properly. Second hard reset helped. I also put "Global C-state..." option to enabled besides current already set to "typical". Will monitor. Quote Link to comment
JorgeB Posted January 9, 2020 Share Posted January 9, 2020 7 minutes ago, andrey_kk said: I also put "Global C-state..." option to enabled Should be disable first to rule that out. Quote Link to comment
andrey_kk Posted January 9, 2020 Author Share Posted January 9, 2020 Just now, johnnie.black said: Should be disable first to rule that out. Got it, will fix. Thanks! Quote Link to comment
andrey_kk Posted January 10, 2020 Author Share Posted January 10, 2020 Update after night uptime- the same problem. Current is set to typical, global C-State disabled. Web UI login works but all the rest is stuck as before. Here is syslog file. And also diagnostics report at the beginning after last hard reset when the server was working properly. syslog kk-server-diagnostics-20200109-2038.zip Quote Link to comment
JorgeB Posted January 10, 2020 Share Posted January 10, 2020 There are multiple general protection errors, start by running memtest, also make sure RAM isn't overclocked, make sure you're respecting max support speed depending on system config: Quote Link to comment
andrey_kk Posted January 10, 2020 Author Share Posted January 10, 2020 13 minutes ago, johnnie.black said: There are multiple general protection errors, start by running memtest, also make sure RAM isn't overclocked, make sure you're respecting max support speed depending on system config: Thanks! Will do tonight. RAM is 4x16gb DDR4-3200 in stock XMP Profile. Will do memtest and try to find the ranking. Assume I definitely need to go down to 2993 or 2667 MHz. Will report accordingly. Quote Link to comment
andrey_kk Posted January 11, 2020 Author Share Posted January 11, 2020 Updated: report as promised- switched to Auto setting for memory, it turned to 2133 MHz. Since then now 14 hours uptime without any issues. Will monitor and report later again. Maybe will try to increase memory frequency a bit later within the allowed limits. For some reason cannot run memtest86. At boot I choose it, nothing happens, system reboots and again to the same normal boot status. Thanks a lot for instructions and advice! Great support! Quote Link to comment
JorgeB Posted January 11, 2020 Share Posted January 11, 2020 1 hour ago, andrey_kk said: For some reason cannot run memtest86. At boot I choose it, nothing happens, system reboots and again to the same normal boot status. Memtest won't work with UEFI boot, only CSM. Quote Link to comment
andrey_kk Posted January 12, 2020 Author Share Posted January 12, 2020 Update: after 2 days and 4 hours of uptime, everything seems to be stable. Please consider to be solved. Thanks a lot for help and instructions! Quote Link to comment
juan11perez Posted January 12, 2020 Share Posted January 12, 2020 For info I have ryzen 3900x with same memory sticks and had to bring them down to 3000 for stability. Quote Link to comment
andrey_kk Posted January 13, 2020 Author Share Posted January 13, 2020 13 hours ago, juan11perez said: For info I have ryzen 3900x with same memory sticks and had to bring them down to 3000 for stability. Oh, good to know, thanks. Is it at 1,35V or 1,2V? I have my main PC with 3900x and Kingston HyperX memory (is quite expensive though) that is 3600 MHz stock XMP profile but I run it 3800 with Infinity Fabric overclocked to 1900 MHz and all is stable. Quote Link to comment
juan11perez Posted January 13, 2020 Share Posted January 13, 2020 Sorry oversimplified response. I meant 3200mhz. this is what i have: https://www.memorybenchmark.net/ram.php?ram=G+Skill+Intl+F4-3200C16-16GTZR+16GB&id=12243 each 16GB; I have 64Gb total. They are 1.2V Quote Link to comment
andrey_kk Posted January 13, 2020 Author Share Posted January 13, 2020 (edited) 7 minutes ago, juan11perez said: Sorry oversimplified response. I meant 3200mhz. this is what i have: https://www.memorybenchmark.net/ram.php?ram=G+Skill+Intl+F4-3200C16-16GTZR+16GB&id=12243 each 16GB; I have 64Gb total. . Seems different from mine. I have Corsair LPX. They are 1.2V at stock profile but go to 1,35V in XMP at 3200 MHz. Yours seem to be GSkill, they may be different. Edited January 13, 2020 by andrey_kk Quote Link to comment
andrey_kk Posted January 16, 2020 Author Share Posted January 16, 2020 Update: turned memory to 2993 MHz, system is not stable. Decreased to 2667 MHz- now stable for 3 days. Leave for now at this setup. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.