Jump to content

Server Crashing Randomly


Recommended Posts

I'm having some problems with my server randomly locking up.  I've not been able to identify any specific actions or circumstances as a cause and it appears to be random.  I can go 30 days with no crash, or a few days.  Diagnostics attached.  Appreciate any insight folks might have.

 

The only thing that's been happening (other than regularly applying updates) is I've been replacing older drives.  I have a couple with SMART errors.  Just replace one drive this week.  Have 2 more to go...

 

Is there a knowledgebase article somewhere which describes how to interpret the diagnostics?  I'd love to learn how...

 

Thanks in advance!

hunternas-diagnostics-20230830-1507.zip

Link to comment

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
12 hours ago, jeffreywhunter said:

I would like to learn how to decipher these logs.  Where's a good place to start?

Reading them, compare them to logs from other Unraid systems, google things that aren't clear, mainly just spend time looking at them. After a while your brain will notice patterns, and something that's out of the ordinary will stick out. Could be relevant, may not be, google and see if it appears elsewhere, and what it indicates.

 

I don't know of a shortcut to learning how to read diagnostics, you just have to immerse yourself, and after a while you start to feel like you can see the girl in the red dress in the lines of code.

Link to comment
6 hours ago, JorgeB said:

Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Assuming its a hardware problem, we'd want to try to isolate where that's coming from.  Safe mode is interesting.  If I run in safe mode, will I be able to run apps?  I've seen a post that lots of things don't work, but not been able to find a definitive answer as to what does and does not work in safe mode.  https://forums.unraid.net/topic/142975-safe-mode-no-local-gui-no-remote-gui-no-remote-ssh/

 

That said, are there any hardware diagnostics that can be recommended?  i.e. run a memory check (although I think that would show up in diags?), HBA controller check?  I do have a couple drives that had displayed errors (I'm currently replacing them over time as they fully fail).  Could the drives in partial fail state cause the server to crash?  (see attached SMART reports).

UNRAID Disk 3 - chrome_2023-09-06_11-01-00.png

Unraid Disk 5 SMART - chrome_2023-09-06_11-04-21.png

hunternas-smart-20230906-1100.zip hunternas-smart-20230906-1100.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...