Jump to content
unrno.spam

Help - my Server is dying!

8 posts in this topic Last Reply

Recommended Posts

Hi there,

 

since March I'm using Unraid 6.8.3 on my self-built server with all new hardware. Did run smooth and without any glitches so far.

But the last couple of days I'm facing more and more problems which lead to total freeze of the system - had to do a hard reboot with switch off the system via power button. Even the cli didn't work anymore nor ssh were available.

Today I was lucky to get a syslog and a diagnostic 5 minutes prior to the crash and I installed a syslog server on my notebook for catching the log remotely.

 

Last entry before the crash was on Sonntag, 15. November 2020 16:56:21

 

I also attached a screenshot of the console from a former crash

 

Before this crash I swa all 4 cores of the cpu rising up to 100% step by step

 

I tried to do memtest86+ from the console when booting, but this always forced a reboot with no further action

 

Can anyone please check, wether it's a hardware problem or some docker/plugins/software is causing all this trouble?

 

Thanx in advance,

 

unrno

20201115_114938.jpg

unr-server-diagnostics-20201115-1652.zip unr-server-syslog-20201115-1553.zip 2020-11-15.txt

Edited by unrno.spam
some more information

Share this post


Link to post

Tried to do memtest but didn't succeed from the Unraid Menu at boot (system just did a reboot)

 

Try to get a live linux CD to do a memtest...

 

Will check the BIOS settings for "Power Supply Idle Control"...

Share this post


Link to post
3 minutes ago, unrno.spam said:

Tried to do memtest but didn't succeed from the Unraid Menu at boot (system just did a reboot)

I think you have to boot in legacy instead of UEFI for memtest on the Unraid boot menu to work.

 

Or

4 minutes ago, unrno.spam said:

Try to get a live linux CD to do a memtest...

or you can make a bootable memtest flash drive.

Share this post


Link to post

Just had a new crash at Sonntag, 15. November 2020 18:54:02

 

Flashing a USB stick for memtest right now

 

What does the page fault in the log mean? Guess this is related to memory not to the CPU, right?

20201115_184029.jpg

2020-11-15.txt

Share this post


Link to post

Didn't do am memtest so far, because since the last crash the server is running now almost a full week 24/7 with no glitches at all. So I wait for the next time, I have to reboot.

 

What else did I do? I entered the BIOS at the last reboot, to check some entries. But didn't change anything, although I saved the settings when I left.

Can't really count for a problem solver.

Then I installed the Plugin "Tips and Tweaks" and changed via the plugin the settings for:

 

vm.dirty_background_ratio = 1

vm.dirty_ratio = 2

 

Don't know if this solved my problem, but at least I had no crashes since.

Guess a memtest is still a good idea...

 

Share this post


Link to post

I wouldn’t wait for another hard crash. Much nicer to avoid file system errors with a clean boot than risk issues.
Looks like most likely bad memory. Do the memtest now, if there is bad RAM it often shows up pretty quickly and you can got on with a warranty RMA.
You should also do a file system check on you array and cache as well.


Sent from my iPhone using Tapatalk

Share this post


Link to post
1 hour ago, unrno.spam said:

Guess a memtest is still a good idea...

memtest never hurts. Running with bad RAM can hurt quite a bit, including data loss.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.