unraid "server" crashes/hangs


Bill A

Recommended Posts

I have been a long time unraid user and love the product!

I have flipped/flopped my server through a few pieces of hard ware over the years and have never reall had any issues... 

My server was running on the current hardware for a while, then I had an issue with the VMs running on it crashing (this was a few years ago and I don't remember exactly what was happening but I seem to remember it being something to do with the VMs fighting with space on the cache drive)... so I moved the Unraid to an older PC. I swapped my current hardware to being an ESXi host, I stopped using it maybe a year ago because the server would lose connection to the network... I stopped using the ESXi server and stopped using the VMs for a while... I recently swapped my unraid back to the hardware that was the ESXi server (I had forgotten it was having issues). 

 

The hardware is

Asus M5A97 R2.0 motherboard

AMD fx8350 CPU

4 8gb sticks of ram 

Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (onboard ethernet)

Marvell Technology Group Ltd. Device 9215 (4 port PCIe SATA controller)

Intel Corporation 82546GB Gigabit Ethernet Controller (2 port PCIe ethernet controller) 

LSI SAS2008 PCI-Express Fusion-MPT SAS-2 (8 port PCIe SATA controller, this is brand new)

brand new 850w power supply

 

USB Boot disk

[0:0:0:0]disk SanDisk U3 Cruzer Micro 8.02 /dev/sda 2.00GB

 

Parity

[2:0:5:0]disk ATA WDC WD80EMAZ-00W 0A83 /dev/sdg 8.00TB

 

Data drives

[1:0:0:0]disk ATA TOSHIBA MD04ACA5 FP2A /dev/sdj 5.00TB

[2:0:0:0]disk ATA TOSHIBA HDWE150 FP2A /dev/sdb 5.00TB

[2:0:1:0]disk ATA HGST HDN724030AL A5E0 /dev/sdc 3.00TB

[2:0:7:0]disk ATA WDC WD80EMAZ-00W 0A83 /dev/sdi 8.00TB

[3:0:0:0]disk ATA HGST HDN724040AL A5E0 /dev/sdk 4.00TB

[4:0:0:0]disk ATA TOSHIBA MD04ACA5 FP2A /dev/sdl 5.00TB

[5:0:0:0]disk ATA ST3000DM001-1CH1 CC26 /dev/sdm 3.00TB

[11:0:0:0]disk ATA WDC WD10EZEX-00B 1A01 /dev/sdq 1.00TB 

 

SSD Cache disks

[6:0:0:0]disk ATA SAMSUNG MZ7TD256 2L5Q /dev/sdn 256GB

[7:0:0:0]disk ATA INTEL SSDSC2KF25 L10P /dev/sdo 256GB

[10:0:0:0]disk ATA INTEL SSDSC2BF18 LSTi /dev/sdp 180GB

[2:0:2:0]disk ATA Samsung SSD 850 2B6Q /dev/sdd 500GB

[2:0:3:0]disk ATA TOSHIBA THNSFJ25 1102 /dev/sde 256GB

[2:0:4:0]disk ATA INTEL SSDSC2KF25 L10P /dev/sdf 256GB

[2:0:6:0]disk ATA MKNSSDCR240GB-7 BBF0 /dev/sdh 240GB

 

When the PC/Server was my ESXi server the PC would lose connection (I didnt do alot of troubleshooting at the time, and quite frankly gave up and built a PC vs the virual machine(s), I forgot that the server was hanging/loosing connection. 

 

When I rebuilt the hardware to become my unraid I added in the new LSI SAS2008 and a used  Intel Corporation 82546GB. I moved all existing drives form my older Unraid server (core 2 quad), and a couple used SSDs for cache and 1tb drive for a 2nd drive for data on the VM(s) I am going to build. 

 

I swapped my existing USB (config and all) to the "new" server, and very soon I noticed the server would hang/freeze/loose connection. The only solution was to hard crash the server. My first thought was the PSU might be bad... I searched on quite a few PSU calculators and all rated my power draw at less than 700w or so... to be safe I purchased a Corsair RX850x PSU, when swapped out the PSU I found out that I had a very new 850w PSU in the PC already. I swapped out the PSU to be safe.

 

I am still having crashes/hangs/freezes... If I boot into GUI mode on reboot it hangs at teh logon screen and keyboard has no effect, If I miss booting into GUI mode, the server crashes/hangs with a kernel panic (I am a 100% Linux noob so I have 100% no idea what to even look at). I took a picture of the screen maybe a week ago when it hung and the only thing I can see (at least that seems valid to me) was  

? Panic +0x1df/0x227 

do_exit+0x88/0x919

? cpu_startup_entry+0x6a/0x6c

rewind_stack_do_exit+0x17/0x20

 

this evening the server had hung again and I didn't snap a pic and my wife needed to access files so I didnt think to grab a pic or write it down... 

 

the server will run anywhere from a couple hours to a few days between crashes.

 

I have "fix common problems" installed, I enabled system logs, copying to the USB drive, but the last logs there are form 9/01/19... so it doesn't seem like logs are successfully being copied to the USB. 

 

 

The 2nd issue, is that I cannot update the OS, or any plugins... I'm guessing something corrupted on the USB due to the crashes. The server is currently sitting at 6.7.0 and, at least through the software, not up-gradable to the current version.  I am OK manually upgrading the USB to the current OS version (Im assuming this will blow away my existing config and that is fine as long as I can recover my existing data etc)... BUT I don't want to take the time to do this until I have the suspected hardware issues figure out.... But if you think reloading the OS would maybe fix the issues then I would be fine updating the OS!. 

 

I am more than willing to grab anything I can off the server to help figure out what is happening. 

 

Here is the pic I grabbed of the screen when the server hung about a week or so ago.

h700k65Y.jpg

 

Link to comment
11 hours ago, Squid said:

Set GUI mode as the default boot mode.  Main, Boot Device, Click on Flash, Syslinux configuration, and set the radio button next to GUI mode

Thank you for that Squid!

 

Sorry re-reading that line... The server does not just hang when in non GUI mode.... If I am in GUI mode the server typically hangs on the logon screen with a dead keyboard and no network  connectivity. If I am in NON-GUI mode then it typically shows the actual kernel panic on the screen (at least part of it). 

 

I have let MEMtest run over night, it had completed 4 passes over night with no errors. 

Link to comment

24 hours of memtest with no errors.

Booted into safe mode now, due to failed shutdown it is running a parity check now. I have syslog server set up and configured to mirror to the flash drive, but the last log that was mirrored to the flash drive was from over 2 weeks ago 😞


I just uploaded the hardware profile.

I will see what the server does running in safe mode

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.