BlueSialia Posted July 31, 2022 Share Posted July 31, 2022 So basically the tittle. It's been three mornings I wake up to my server not being up. The computer is powered on but the Web GUI can't be seen, my Plex is down too. I can't even ping the machine. I've restarted the machine and downloaded the diagnostics but the logs contain only entries since the last boot. So basically only a few minutes. As far as I can see I won't find any error log of the crash there. What can I do? unblue-diagnostics-20220731-2137.zip Quote Link to comment
BlueSialia Posted July 31, 2022 Author Share Posted July 31, 2022 After noticing the word "diagnostics" turned into an hyperlink I discovered the persistent logs feature of Unraid. Neat trick of the forums! Hopefully that will tell me more next time it happens. 1 Quote Link to comment
BlueSialia Posted August 1, 2022 Author Share Posted August 1, 2022 Things got weird. I don't know why but my VM settings changed. The VM manager tells me my default VM storage path does not exist. It's set to `/mnt/user/domains/` and, in fact, that share does not exist. I used `/mnt/user/vdisks/` (Yup, I got into this because of Linus Tech Tips). Setting it back to vdisks fixes it but the fact that my configuration just changed scares me. I feel like the system is slowly dying. Quote Link to comment
JorgeB Posted August 1, 2022 Share Posted August 1, 2022 Enable the syslog server and post that after a crash. Quote Link to comment
BlueSialia Posted August 17, 2022 Author Share Posted August 17, 2022 (edited) So it happened again. It happened on the 12th and this is every log entry from that day: Aug 12 01:01:34 UnBlue emhttpd: read SMART /dev/sdd Aug 12 01:02:23 UnBlue root: /etc/libvirt: 27.6 MiB (28966912 bytes) trimmed on /dev/loop3 Aug 12 01:02:23 UnBlue root: /var/lib/docker: 2.9 GiB (3095732224 bytes) trimmed on /dev/loop2 Aug 12 01:02:23 UnBlue root: /mnt/cache: 242.4 GiB (260238061568 bytes) trimmed on /dev/nvme0n1p1 Aug 12 02:00:01 UnBlue crond[1341]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Aug 12 02:03:55 UnBlue emhttpd: spinning down /dev/sdd Aug 12 02:40:55 UnBlue emhttpd: read SMART /dev/sdd Aug 12 03:42:45 UnBlue emhttpd: spinning down /dev/sdd Aug 12 17:52:17 UnBlue emhttpd: read SMART /dev/sdd Aug 12 18:52:40 UnBlue emhttpd: spinning down /dev/sdd As far as I know. There is nothing weird there. The next log entry is from today (I was away so I couldn't restart the server earlier). There is one thing that did caught my eye. On restart I have the notification: unassigned.devices.plg: An update is available. Click here to install version 2022.08.12. I can't say it for sure, but I believe every time my server has crashed I've found a pending update for that same plugin. Edited August 17, 2022 by BlueSialia I used markdown syntax thinking the forums accepted it Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 Check here to make sure you're using the correct "power supply idle control" setting. Quote Link to comment
trurl Posted August 17, 2022 Share Posted August 17, 2022 On 8/1/2022 at 3:22 AM, BlueSialia said: my configuration just changed Everything about your configuration (any settings from webUI) is in config folder on flash. Did you do anything that would have reset config? Quote Link to comment
BlueSialia Posted August 17, 2022 Author Share Posted August 17, 2022 4 minutes ago, JorgeB said: Check here to make sure you're using the correct "power supply idle control" setting. I'll check the BIOS settings. Although, would it even make sense that this is the cause when I've used the same BIOS settings for years and this crashing has started to happen a month and a half ago? 3 minutes ago, trurl said: Everything about your configuration (any settings from webUI) is in config folder on flash. Did you do anything that would have reset config? Nope, nothing. At least not while knowing that I was doing so. I actually don't remember changing anything in Unraid for a long while. I have my Plex, my VMs that I start with Wake Up on LAN... I have the Dashboard opened all the time, but just because I like looking at it haha. Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 10 minutes ago, BlueSialia said: I'll check the BIOS settings. Although, would it even make sense that this is the cause when I've used the same BIOS settings for years and this crashing has started to happen a month and a half ago? If the BIOS is correctly set and the crashes started out of the blue without anything relevant logged in the syslog it points to a hardware issue. Quote Link to comment
BlueSialia Posted August 17, 2022 Author Share Posted August 17, 2022 7 minutes ago, JorgeB said: If the BIOS is correctly set and the crashes started out of the blue without anything relevant logged in the syslog it points to a hardware issue. I was suspecting that. Specially because of my config change, which makes me think the flash drive may be dying. How would go to try to determine that? It's not possible to run any of the tests that can be executed on a cache or array drive on the flash drive, isn't it? Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 Unlikely that a dying flash drive would make Unraid crash without anything being logged, but you can replace it if you want: https://wiki.unraid.net/Manual/Changing_The_Flash_Device Quote Link to comment
BlueSialia Posted March 13, 2023 Author Share Posted March 13, 2023 I still face this issue. I replaced the USB, reformated the drives, new settings. I basically setup Unraid from zero again. I cannot get an uptime past 15 days. It ends up crashing. I am now on vacation and I cannot access my Plex or any other services hosted on it because it is down. I connect to my home network with a VPN and can't even ping the server. I have no idea about how to debug this. There is nothing in the syslog server. Quote Link to comment
BlueSialia Posted March 13, 2023 Author Share Posted March 13, 2023 4 hours ago, trurl said: Have you done memtest? Yes. Passed without any error or warning. Quote Link to comment
trurl Posted March 14, 2023 Share Posted March 14, 2023 On 3/13/2023 at 8:09 AM, BlueSialia said: There is nothing in the syslog server. On 8/1/2022 at 4:23 AM, JorgeB said: Enable the syslog server and post that after a crash. Doesn't look like you ever posted it. Quote Link to comment
BlueSialia Posted March 14, 2023 Author Share Posted March 14, 2023 10 hours ago, trurl said: Doesn't look like you ever posted it. Yes I did. Just after the post you quoted. Quote Link to comment
JonathanM Posted March 15, 2023 Share Posted March 15, 2023 I only see one file posted in this thread, the diagnostic zip file in the first post. The syslog server if configured correctly will save a copy of the syslog up until the crash, and I don't see it posted. Quote Link to comment
BlueSialia Posted March 15, 2023 Author Share Posted March 15, 2023 I didn't upload the entire file. Just posted the log entries of one of the days the server crashed. Quote Link to comment
BlueSialia Posted July 6, 2023 Author Share Posted July 6, 2023 One year since this started happening to me and the issue is still the same. At some point, the server just dies somehow. The hardware is still on but the system is not even part of the network. And I have no idea about how to debug this problem. I think the only thing I can do is turn off every docker and VM that is not critical, wait and see if it exhibits the same behavior. If by any chance it manages to reach an uptime of a month then I can enable what I turned off slowly. Like one per 2 weeks or something. Even if this works it'll mean months of having most of my things disabled. I'm just hopeless. Quote Link to comment
Solution BlueSialia Posted October 16, 2023 Author Solution Share Posted October 16, 2023 I finally found a solution. But a very undesirable one. The issue was this. My Ryzen CPU doesn't support Unraid's (Linux) c-states. I disabled them in the BIOS. Now, as it is expected, my server has a very high power consumption even at idle. Quote Link to comment
JorgeB Posted October 17, 2023 Share Posted October 17, 2023 See here, if there's a power supply idle control in the BIOS setting you should be able to leave global c-stares enabled. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.