Daniel Finch Posted August 1, 2018 Share Posted August 1, 2018 UnRAID version: 6.5.3 Plugins: Community Applications: 2018.07.22 Fix Common Problems: 2018.07.28 Nerd Tools: 2018.02.17 (screen installed) Unassigned Devices: 2018.06.01a Docker apps: binhex-krusader (not started) deluge nginx nzbget PlexMediaServer radarr sonarr Hardware: Motherboard: ASRock H110M-ITX CPU: i3 6100T (stock cooler, not overclocked) RAM: 2x 4GB 2133Mhz Hard drives: * 3x 4TB Seagate Barracuda 3.5 (1 being used for parity) * 1x 4TB Seagate IronWolf No GPU installed --- Since I first set my server up I've been seeing random and complete server hangs. None of the Docker instances will be available, nor will the GUI, and I'm unable to log in via the console - I'll enter the username and never get a password prompt. I have to perform a hard shut down and turn it back on. It seems to happen every 3-4 days or so, the last time it happened I put it into Diagnostics mode, so I've got the .zip and the syslog attached. Usually I'm not using the machine (it sits on a desk somewhere in my house untouched), so I don't often realise it's happened until I go to do something with one of the Docker apps. My network sits behind a pfSense device, so the only way to access the server is via VPN or by being in the physical location on the network. As far as I can see, there's never any errors shown in the console - the IP address and username prompt are always the last things displayed. I have not yet tried safe mode, and I haven't found any reproduction steps yet (sorry!). FCPsyslog_tail.txt htpc-diagnostics-20180730-1836.zip Quote Link to comment
TechMed Posted August 1, 2018 Share Posted August 1, 2018 Hi @Daniel Samuels, Welcome to the forum! LOTS of great help here and folks are friendly. So, I realize this will not be a direct answer, but it has been my experience that when things go "randomly" awry, it is almost always hardware related. I recently had a similar situation on a new build and discovered that the USB port I had my flash drive in was defective. System "appeared" to boot, but all kinds of weirdness, like you are talking about. Until one of the Pros gets a chance to review your Diags file, you may want to have a look see at some of the hardware. Again, this is not a point and shoot answer, just experience in general saying, look to the hardware first when there is randomness in the error(s). I'm sure it will all work itself out once the Pros get a chance to chime in, they really are great. Quote Link to comment
Altheran Posted August 3, 2018 Share Posted August 3, 2018 In my experience since Skylake : Hangups = C States. Since I disabled C States in the BIOS, no more worries Quote Link to comment
Daniel Finch Posted August 3, 2018 Author Share Posted August 3, 2018 8 hours ago, Altheran said: In my experience since Skylake : Hangups = C States. Since I disabled C States in the BIOS, no more worries How interesting, I'll try disabling that and see how it goes. Thanks! Quote Link to comment
perPLEXed Posted August 3, 2018 Share Posted August 3, 2018 I had random crashes and reboots on my Ryzen 1800 Desktop system Windows 10. It could be idling with nothing running or sometimes during a graphic intensive game and it would just reboot or crash. I was unable to repeat it consistently. I stress tested it for hours and sometimes it would crash and sometimes not. I started monitoring CPU temperatures and started logging it. I recorded temperatures a year ago with the system and noticed the current CPU temperatures were slightly higher idle and significantly higher during a load. So I pulled my CPU heat sink and reapplied new thermal paste. I noticed the temperatures went down slightly on idle but was much lower under load. I no longer have any random reboots or crashes now. Thermal paste degrades over time? Misaligned CPU Heat-sink? Not sure but it fixed it. Quote Link to comment
pwm Posted August 3, 2018 Share Posted August 3, 2018 43 minutes ago, perPLEXed said: Thermal paste degrades over time? Misaligned CPU Heat-sink? Some thermal paste degrades a lot over time, but I think most thermal paste works quite well for the expected lifetime of the system - many industrial systems are expected to work well for 10-20 years without need for replacing any thermal paste. It's more likely that there was an alignment issue or that not all of the chip had thermal paste. Either of these could result in a big variation in temperature between different parts of the chip potentially making one part hot enough that it becomes unstable while the temperature sensor still sees a temperature that does not require throttling. 1 Quote Link to comment
perPLEXed Posted August 3, 2018 Share Posted August 3, 2018 45 minutes ago, pwm said: It's more likely that there was an alignment issue or that not all of the chip had thermal paste. I agree that was probably it. Quote Link to comment
Daniel Finch Posted September 8, 2018 Author Share Posted September 8, 2018 Just wanted to come back to this thread and give an update. Since disabling C-States I have had no further hangs and the server is now sitting at 32 days uptime. Thanks everyone for your help! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.