Unraid 6.5.1 locking up once a week, requiring hard reset to recover


Recommended Posts

Hi Everyone,

 

Im trying to configure an unraid server on a new NAS build that i purchased a few weeks ago and im running into a pretty bad lockup problem:

 

Problem:

Approximately once per week, the unraid box becomes completely unresponsive, (webui goes down, ssh doesn't work, ping doesn't work, monitor + keyboard unresponsive), requiring a hard reset to fix.

 

Steps to reproduce:

run Unraid for ~1 week, wake up to find the server is crashed

 

Server config:

  • 2x E5 2660 v2
  • 32GB ECC ram
  • Super micro dual LGA2011 Mobo
  • 4x HGST HDD (3 drive, 1 parity)
  • 2x 500 GB SSD (1x XFS cache drive, 1x UD XFS stratch drive)

 

Things i've alreayd tired:

  1. Memtest: ran for 24hrs, no failures
  2. changed from BRTFS cache pool (2 drives) to 1x XFS drive for cache pool (and 1 drive for 'other')

 

I've attached the most recent diagnostic and tail from FCP

 

Any advice would be appreciated

odin-diagnostics-20180602-0837.zip

FCPsyslog_tail.txt

Link to comment

I'm having similar problems but infrequently. Unraid locks up. Pings to server fail and console is non responsive. If it helps I have posted diags and syslog before (I have troubleshooting mode enabled). You should be able to find them by looking for my posts but can post again if needed. Still no closer to any resolution.

Sent from my SM-N920I using Tapatalk

Link to comment

Hi. I've got similar symptoms. Latest lockup left an MCE message on the screen.

 

It might be worthwhile installing mcelog in case you are getting hardware errors. If you want to do it, you first install the nerd tools plugin and then open it and select mcelog.

 

I haven't had a re-occurance yet, but hoping that I might capture something with it.

 

 

Link to comment
4 hours ago, ADDAndy said:

@Squid i tried running mcelog on my machine, but i keep getting an error message that /proc/mcelog doesn't exist.

 

Ideas?

 

Did you do this?

 

8 hours ago, PeteB said:

It might be worthwhile installing mcelog in case you are getting hardware errors. If you want to do it, you first install the nerd tools plugin and then open it and select mcelog.

 

Link to comment

Sorry. Don't get that message at all.

 

*maybe* I'm not getting it yet as I haven't had an MCE to capture? It might be worthwhile persisting with getting mcelog working correctly in case it's trying to capture a problem.

 

Hopefully someone else here can help with this error.

Link to comment
Update:

I wanted to make sure i was on the newest firmware and bios, and it looks like i am. so no possible solution there

 

I did have another crash last night:

  • It occurred around 1 am local, and based on the syslog_tail and diagnotiscs info (attached)
  • This corresponds to a CATERR event in my BIOS: (my bios clock is wrong, it's 8:30 am, and my bios clock is listing 14:30)

 

Quote

 

8 2018/06/08 07:04:16 OEM  CPLD

CATERR - Asserted

 

 

so im now working under the assumption that this CATERR is the root cause of the instability

 

What could cause a CATERR? whats the debugpath?

odin-diagnostics-20180608-0055.zip

FCPsyslog_tail.txt

Link to comment
  • 1 month later...

I have had this issue since owning Supermicro X11SAE-M

 

It will only happen when starting/shutdown of a LibreELEC VM. The motherboard CATERR_LED will glow orange and the machine needs to be reset, triggering a parity check once rebooted.

 

I have never been able to capture any logs as its an extreme system halt. FWIW the system has been up for 1 hr before error, i never leave the server on for more than a day.

 

 

 

Edited by raidserver
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.