Jump to content

My server has crashed 3 times in the last month and I don't know how to find the cause


Go to solution Solved by BlueSialia,

Recommended Posts

So basically the tittle. It's been three mornings I wake up to my server not being up. The computer is powered on but the Web GUI can't be seen, my Plex is down too. I can't even ping the machine.

 

I've restarted the machine and downloaded the diagnostics but the logs contain only entries since the last boot. So basically only a few minutes. As far as I can see I won't find any error log of the crash there. What can I do?

unblue-diagnostics-20220731-2137.zip

Link to comment

Things got weird. I don't know why but my VM settings changed.

 

The VM manager tells me my default VM storage path does not exist. It's set to `/mnt/user/domains/` and, in fact, that share does not exist. I used `/mnt/user/vdisks/` (Yup, I got into this because of Linus Tech Tips).

 

Setting it back to vdisks fixes it but the fact that my configuration just changed scares me. I feel like the system is slowly dying.

Link to comment
  • 3 weeks later...

So it happened again. It happened on the 12th and this is every log entry from that day:

Aug 12 01:01:34 UnBlue emhttpd: read SMART /dev/sdd
Aug 12 01:02:23 UnBlue root: /etc/libvirt: 27.6 MiB (28966912 bytes) trimmed on /dev/loop3
Aug 12 01:02:23 UnBlue root: /var/lib/docker: 2.9 GiB (3095732224 bytes) trimmed on /dev/loop2
Aug 12 01:02:23 UnBlue root: /mnt/cache: 242.4 GiB (260238061568 bytes) trimmed on /dev/nvme0n1p1
Aug 12 02:00:01 UnBlue crond[1341]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null
Aug 12 02:03:55 UnBlue emhttpd: spinning down /dev/sdd
Aug 12 02:40:55 UnBlue emhttpd: read SMART /dev/sdd
Aug 12 03:42:45 UnBlue emhttpd: spinning down /dev/sdd
Aug 12 17:52:17 UnBlue emhttpd: read SMART /dev/sdd
Aug 12 18:52:40 UnBlue emhttpd: spinning down /dev/sdd

As far as I know. There is nothing weird there. The next log entry is from today (I was away so I couldn't restart the server earlier).

 

There is one thing that did caught my eye. On restart I have the notification: unassigned.devices.plg: An update is available. Click here to install version 2022.08.12. I can't say it for sure, but I believe every time my server has crashed I've found a pending update for that same plugin.

Edited by BlueSialia
I used markdown syntax thinking the forums accepted it
Link to comment
4 minutes ago, JorgeB said:

Check here to make sure you're using the correct "power supply idle control" setting.

I'll check the BIOS settings. Although, would it even make sense that this is the cause when I've used the same BIOS settings for years and this crashing has started to happen a month and a half ago?

 

3 minutes ago, trurl said:

Everything about your configuration (any settings from webUI) is in config folder on flash. Did you do anything that would have reset config?

Nope, nothing. At least not while knowing that I was doing so. I actually don't remember changing anything in Unraid for a long while. I have my Plex, my VMs that I start with Wake Up on LAN... I have the Dashboard opened all the time, but just because I like looking at it haha.

Link to comment
10 minutes ago, BlueSialia said:

I'll check the BIOS settings. Although, would it even make sense that this is the cause when I've used the same BIOS settings for years and this crashing has started to happen a month and a half ago?

If the BIOS is correctly set and the crashes started out of the blue without anything relevant logged in the syslog it points to a hardware issue.

Link to comment
7 minutes ago, JorgeB said:

If the BIOS is correctly set and the crashes started out of the blue without anything relevant logged in the syslog it points to a hardware issue.

I was suspecting that. Specially because of my config change, which makes me think the flash drive may be dying. How would go to try to determine that? It's not possible to run any of the tests that can be executed on a cache or array drive on the flash drive, isn't it?

Link to comment
  • 6 months later...

I still face this issue. I replaced the USB, reformated the drives, new settings. I basically setup Unraid from zero again.

 

I cannot get an uptime past 15 days. It ends up crashing. I am now on vacation and I cannot access my Plex or any other services hosted on it because it is down. I connect to my home network with a VPN and can't even ping the server.

 

I have no idea about how to debug this. There is nothing in the syslog server.

Link to comment
  • 3 months later...

One year since this started happening to me and the issue is still the same. At some point, the server just dies somehow. The hardware is still on but the system is not even part of the network.

 

And I have no idea about how to debug this problem. I think the only thing I can do is turn off every docker and VM that is not critical, wait and see if it exhibits the same behavior. If by any chance it manages to reach an uptime of a month then I can enable what I turned off slowly. Like one per 2 weeks or something. Even if this works it'll mean months of having most of my things disabled. I'm just hopeless.

Link to comment
  • 3 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...