Jump to content
dgomel

My Unraid Got unstable

10 posts in this topic Last Reply

Recommended Posts

Dear Gurus,

I need your help with the problem that I have. During last couple of months my Unraid server got unstable. It was working perfectly for years.

Recently i added a hard drive and updated a version of Unraid.

I attempted some troubleshooting but couldn't find the root cause.

The problem is that once in a while everything stack. No networking. The vents is on, no beeps from BIOS.

I think it could be a memory problem somewhere in upper addresses or some kind of another hardware problem.

Please help to find a root cause.

tower-diagnostics-20200116-0928.zip

Share this post


Link to post

The Memtest86+ shows no problem on a couple of passes.

I see fast grow in zombie processes visible from top. around 100 in 5 mins. Could this be a problem?

I localized a source of zombies to specific container. The problem started before I implemented the container. So, seems irrelevant.

Edited by dgomel

Share this post


Link to post

I found the problem. The problem started when i enabled VT-D having SYBA SI-PEX40108 with Marvell 88SE9215. After replacing it with LSI 9211-8i everything back to work.

Unfortunately, the problem persists. It looks like it's depends on usage of VMs.

Latest Diagnostics and Syslog attached. 

 

tower-diagnostics-20200121-0930.zip syslog (6).zip

Edited by dgomel
Problem persists.

Share this post


Link to post

Hi there,

 

Looks like you need to update your forum signature as your hardware has changed from AMD to Intel ;-).  That said, I'm not seeing any events in the logs themselves that show a problem.  And I'm not sure what this is supposed to mean:

On 1/16/2020 at 9:11 AM, dgomel said:

The problem is that once in a while everything stack.

Also hoping that you removed this container while troubleshooting this:

On 1/16/2020 at 9:51 AM, dgomel said:

I localized a source of zombies to specific container. The problem started before I implemented the container. So, seems irrelevant.

 

At this point, I would suggest hooking up a monitor and keyboard to the system and tailing the log (command is tail /var/log/syslog -f).  This will begin printing the log out to the screen.  Then try to get the system to crash again and capture whatever was printed to the screen (use your phone to take a picture if necessary).  This should give us at least some information to point towards a cause.

 

I'd also check for a BIOS update on your motherboard.

Share this post


Link to post

I forwarded syslog to flash and provided it along the lines. I can't see a anything in the syslog.

The outages could be easily found by gaps in printout and new start sequence.

Console is connected. Bios is latest for the MB.

Share this post


Link to post

The syslog to flash method is not valid for capturing log events related to major crashes like you're experiencing.  The problem is that the hang can occur before the write to the flash can occur.  This is why I am suggesting you connect a monitor / keyboard to the system.

Share this post


Link to post

I think i could safely exclude overheating of hardware. I created High CPU load along with average IO and keep this running for a couple of hours. No crashes.

7BEB0E27-9CD3-4843-805F-CABB408C42EC.jpeg

Share this post


Link to post

One of theories was a potential of HW monitoring from the BIOS.  Yesterday I went to check this and found nothing related to threshold on temp. On the way I changed a CPU governor setting to Performance mode.

In addition, I found yesterday that dynamix.system.temp.plg wasn't updated for a while. When I tried to update, it failed. So, I uninstalled and installed again. 

After these two changes the system is working for a day with no crashes. I'll keep monitoring. 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.