Jump to content

My Server Keep crashing at night


Recommended Posts

I've been having this issue for while now. I've downgraded, upgraded, disabled dockers, disabled VMs, disabled cstates, swapped out RAM, deleted and re-created docker.img.

I'm using a pc from minisforum, the MS-01. I'm not sure if it's just giving out or what. I've setup a logging server and I have 0 logs. Nothing indicating what's going on. I did finally get something on my screen. I've also attached my diagnostic logs. I could really use some help.

I'll just wake up and it's unresponsive. I have to hold the power button to get it to turn off.

 

tower-diagnostics-20240324-1153.zip

unraid_error.jpg

Edited by FreakyBigFoot
Link to comment

Here is all I have:

Mar 30 00:00:05 Tower emhttpd: read SMART /dev/sde
Mar 30 00:00:21 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 00:10:33 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 00:15:07 Tower emhttpd: spinning down /dev/sde
Mar 30 00:20:45 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 00:30:57 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 00:41:09 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 00:51:21 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:01:33 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:11:45 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:21:57 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:32:09 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:42:21 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 01:52:32 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:00:03 Tower emhttpd: read SMART /dev/sdd
Mar 30 02:00:11 Tower emhttpd: read SMART /dev/sdc
Mar 30 02:01:51 Tower root: /etc/libvirt: 71.2 MiB (74670080 bytes) trimmed on /dev/loop3
Mar 30 02:01:51 Tower root: /var/lib/docker: 1.7 GiB (1846673408 bytes) trimmed on /dev/loop2
Mar 30 02:01:51 Tower root: /mnt/cache: 28 GiB (30114258944 bytes) trimmed on /dev/nvme0n1p1
Mar 30 02:02:31 Tower emhttpd: read SMART /dev/sdh
Mar 30 02:02:44 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:02:51 Tower emhttpd: read SMART /dev/sde
Mar 30 02:12:55 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:22:51 Tower emhttpd: spinning down /dev/sdc
Mar 30 02:23:06 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:33:18 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:43:30 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 02:53:42 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:03:54 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:14:06 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:24:19 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:34:31 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:44:43 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 03:54:55 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:05:07 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:10:09 Tower emhttpd: spinning down /dev/sde
Mar 30 04:10:10 Tower emhttpd: spinning down /dev/sdg
Mar 30 04:15:20 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:25:32 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:32:25 Tower emhttpd: spinning down /dev/sdd
Mar 30 04:35:44 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:40:01 Tower root: Fix Common Problems Version 2024.03.09
Mar 30 04:40:02 Tower root: Fix Common Problems: Warning: Plugin fix.common.problems.plg is not up to date
Mar 30 04:40:02 Tower root: Fix Common Problems: Warning: Plugin user.scripts.plg is not up to date
Mar 30 04:40:02 Tower root: Fix Common Problems: Warning: Docker Application transmission has an update available for it
Mar 30 04:40:09 Tower root: Fix Common Problems: Warning: Syslog mirrored to flash
Mar 30 04:40:09 Tower root: Fix Common Problems: Warning: Wrong DNS entry for host ** Ignored
Mar 30 04:40:20 Tower emhttpd: read SMART /dev/sdc
Mar 30 04:40:21 Tower emhttpd: read SMART /dev/sdg
Mar 30 04:40:21 Tower emhttpd: read SMART /dev/sde
Mar 30 04:40:21 Tower emhttpd: read SMART /dev/sdf
Mar 30 04:40:31 Tower emhttpd: read SMART /dev/sdd
Mar 30 04:45:56 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:56:09 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 04:56:44 Tower emhttpd: spinning down /dev/sde
Mar 30 04:56:45 Tower emhttpd: spinning down /dev/sdg
Mar 30 04:56:45 Tower emhttpd: spinning down /dev/sdf
Mar 30 04:56:55 Tower emhttpd: spinning down /dev/sdd
Mar 30 04:56:55 Tower emhttpd: spinning down /dev/sdc
Mar 30 05:06:21 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 05:16:13 Tower emhttpd: spinning down /dev/sdi
Mar 30 05:16:34 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 05:24:50 Tower emhttpd: spinning down /dev/sdh
Mar 30 05:26:47 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 05:37:01 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 05:47:13 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 05:57:26 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:07:39 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:17:52 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:28:05 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:38:17 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:48:30 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 06:58:44 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 07:08:58 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 07:19:10 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 07:29:24 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 07:39:37 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 07:49:50 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 08:00:04 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 08:10:18 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 08:20:30 Tower apcupsd[3370]: Communications with UPS lost.
Mar 30 08:30:43 Tower apcupsd[3370]: Communications with UPS lost.

I uplugged the UPS on purpose because I have been trying to see if it runs better with certain things unplugged. It was crashing before & after I unplugged it so I know it's not the issue. Nothing indicating an issue here it seems. It just straight crashes :(

Link to comment
  • 2 months later...
Posted (edited)

Hey @FreakyBigFoot hope you are doing good! I also run unraid on ms01 (with 2x48GB Crucial DDR5 5600MHz) and had 3 similar crashes within the last 2 days.

I wonder if you were able to fix the situation or what you did since your last post. Anything you can share? 

Edited by reneil1337
Link to comment

i'm also using a ms-01 with no major trouble. I have 2x32GB ram. I used to have troubles until using external fans.

MS-01 is heating a lot if you are using it without additional fan with turbo boost on or with a pcie add-in card.

I can suggest to monitor temp (cpu and ssd) and put a 120mm or 140mm blowing fan underneath and a 80mm above.

You can find more details on serve the home There are also other threads about comaptibility

Link to comment
On 6/6/2024 at 5:04 PM, reneil1337 said:

Hey @FreakyBigFoot hope you are doing good! I also run unraid on ms01 (with 2x48GB Crucial DDR5 5600MHz) and had 3 similar crashes within the last 2 days.

I wonder if you were able to fix the situation or what you did since your last post. Anything you can share? 

Managed to fix the crashes on my end without having to disable efficiency cores or touching c-states. My homelab is located in a different room so my goal is not to silence the machine but to maximize performance while ensuring stability. I'm running Bios 1.22 btw and did a few things to improve overall thermals which resulted in way more stability:

1) Repasted CPU with Thermal Grizzly Kryonaut after reading  

https://forums.servethehome.com/index.php?threads/minisforum-ms-01-pcie-card-and-ram-compatibility-thread.42785/post-415479
 


2) Maxxed PL2 TDP limit in Bios to 115000 assuming that the system might not get enough juice and therefore crashes
https://forums.servethehome.com/index.php?threads/minisforum-ms-01-pcie-card-and-ram-compatibility-thread.42785/post-415333


3) Adjusted fan curve making them spin earlier + increase overall fan speed to lower overall temperatures
https://forums.servethehome.com/index.php?threads/minisforum-ms-01-pcie-card-and-ram-compatibility-thread.42785/post-415381 

 

My Unraid server is now online for over 17 hours which never happened before. The MS01 usually crashed after 3-4 hours and never made it through an entire night.

Link to comment
On 6/11/2024 at 8:19 AM, reneil1337 said:

Managed to fix the crashes on my end without having to disable efficiency cores or touching c-states. My homelab is located in a different room so my goal is not to silence the machine but to maximize performance while ensuring stability. I'm running Bios 1.22 btw and did a few things to improve overall thermals which resulted in way more stability:

 

My Unraid server is now online for over 17 hours which never happened before. The MS01 usually crashed after 3-4 hours and never made it through an entire night.

 

This should be of interest to you if you haven't seen it already, but MS-01 has severe issues with external drive cases to where I would not use it at all for drive storage duty.

 

https://forums.servethehome.com/index.php?threads/minisforum-ms-01-qnap-jbod-issues.44250/

 

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...