mikeybinns Posted October 18, 2019 Share Posted October 18, 2019 Hi, I've been suffering for the past few days with an issue with my server crashing (black screen, fans on full) or just random reboots. I have my diagnostics file, although this is taken after a reboot as I have no choice, I can't access unraid at all once it crashes. I can't see a way to upload a file from here, so here is a link to the file on my G-Drive: https://drive.google.com/file/d/18LCiOu9Be1XY4g7S1PjSJGpTBS0a7u0W/view?usp=sharing I have tried using both Fix Common Problems plugin and Nerdpack mcelog to find the issue, but I'm lost. Here is as much detail I can give on my setup: Hardware - AMD 1700X CPU @ 3.4GHz with Dark Rock Pro 4 cooler ASUS PRIME B350-PLUS 16GB Corsair Vengence RAM @ 2933MHz 500GB NVMe Cache drive MSI GT710 Main GPU EVGA GTX 1070 FTW GPU for VM (currently unused but installed) 2 x 4TB Hard drives, 1 x 1TB Hard drive, no parity set up (Data loss would be annoying but not critical) Corsair RM650 PSU Software - Bios version 4801 x64 Unraid 6.7.2 - Basic licence Plugins: Community Applications - 2019.09.22 Dynamix SSD TRIM - 2017.04.23a Fix Common Problems - 2019.10.13a Nerd Tools - 2019.01.25 mcelog (mcelog-161-x86_64-1.txz) Docker: Plex from https://github.com/plexinc/pms-docker VMs: none Any help you can give would be much appreciated. Quote Link to comment
trurl Posted October 18, 2019 Share Posted October 18, 2019 Since you have been approved you can attach files now. Please do so. Have you setup Syslog Server? https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=781601 Have you done a memtest? Quote Link to comment
JorgeB Posted October 18, 2019 Share Posted October 18, 2019 Make sure to disable C-states or try the other Ryzen workarounds mentioned here. Quote Link to comment
mikeybinns Posted October 22, 2019 Author Share Posted October 22, 2019 Thanks for the options mentioned. I had previously set up storing syslog on flash, so I have uploaded the file below which contains multiple crashes, and likely some forced reboots with holding the power button. Since I posted, I have removed the GTX 1070 GPU, reset the RAM to JDEC speeds (2133MHz or auto in the BIOS) and I have updated my BIOS version to 5220. I also tried replacing my Plex docker container from the offical Plex container to the Linuxsystems version. None of these have seemed to have stoped the issues. I will try Ryzen workarounds and a memtest now and get back to you. syslog Quote Link to comment
mikeybinns Posted October 22, 2019 Author Share Posted October 22, 2019 Okay we may be getting somewhere. I have disabled the Global C-State Control, but when running a Memtest, as soon as I click the option, it seems to crash. It seems to be some kind of memory failure, I'll troubleshoot and report back. Quote Link to comment
mikeybinns Posted October 22, 2019 Author Share Posted October 22, 2019 Now I'm less sure this is a memory issue. Both sets of sticks performed exactly the same. As soon as I click the memtest option on the unraid boot menu, the system immediately and instantly restarts, as if it was off and I just turned it on. Please see this video I took which shows this: I'm going to dig more into these Ryzen workarounds to see if any of these can fix the issue. Quote Link to comment
itimpi Posted October 22, 2019 Share Posted October 22, 2019 Are you booting UEFI or Legacy mode? The Unraid memtest only works in legacy mode. If you want a version that can be used from UEFI mode then you need to download an appropriate version from the memtest86 web site and create a bootable USB. Quote Link to comment
John_M Posted October 22, 2019 Share Posted October 22, 2019 That looks like a UEFI boot. You can't run MemTest from the menu unless you legacy boot. Sort your memory problem out first. Corsair Vengeance DRAM has a lifetime guarantee so RMA the set if it fails. The best option for fixing the C-state problem (which affects only 1000-series Ryzen processors) is to look for the Power Supply Idle control in the BIOS (I found it at Advanced -> AMD CBS -> Power Supply Idle Control) and setting it to typical current idle instead of low current idle. Quote Link to comment
mikeybinns Posted October 22, 2019 Author Share Posted October 22, 2019 Okay, yes I am booting in UEFI mode, so I will boot in legacy and try again. I have also just set the Power Supply Idle Control as it was referenced in the other thread. Quote Link to comment
mikeybinns Posted October 22, 2019 Author Share Posted October 22, 2019 Okay, I just ran the memtest on all 4 sticks, and it passed with no errors. Now I have tried the Ryzen workarounds, I'll run the system again and report back if I have any issues. Quote Link to comment
mikeybinns Posted October 23, 2019 Author Share Posted October 23, 2019 It seems like the Ryzen workaround, either the Power Supply Idle Control or the Global C-State Control worked. Uptime is 18 hours and counting. Thanks everyone for the help! Quote Link to comment
John_M Posted October 23, 2019 Share Posted October 23, 2019 6 hours ago, mikeybinns said: It seems like the Ryzen workaround, either the Power Supply Idle Control or the Global C-State Control worked. Uptime is 18 hours and counting. The Power Supply Idle Control is the real fix. You should be able to re-enable the Global C-state option. Quote Link to comment
jovaee Posted October 24, 2019 Share Posted October 24, 2019 I just upgraded to a Ryzen 1600 and I'm also having random reboots. According to the syslog file the server is running, no logs for a while (probably because it's 3am and I'm not using the server) and then just restarts for no reason. I'm looking into trying these Ryzen fixes but I'm not sure what the Power Supply Idle Control actually does. I have figured out what the C-states do though. Could someone maybe give me a rough idea of what Power Supply Idle Control does as @John_M states that is the real fix? Quote Link to comment
John_M Posted October 24, 2019 Share Posted October 24, 2019 (edited) It prevents the processor cores from sinking so deeply into sleep that they can't wake up again. Why don't you give it a try? Having figured out what C-states do, you'll understand that globally disabling them is somewhat draconian now that there's a real fix that allows them to remain enabled. Edited October 24, 2019 by John_M Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.