Jump to content

Server going offline on a nightly basis


Recommended Posts

Hi Guys,

 

I have a SuperMicro X9DRi-LN4+/X9DR3-LN4+ server that's on the latest Unraid.  It seems to run fine during the day, yet when I wake up it's almost always crashed and the disks are in a weird state.  This morning it showed all the disks mounted on the top of main, yet also showed them all in available devices.  

 

The box has a data array of 7 spinning disks (6+1 parity) on XFS.  There's 5 18tb spinning disks at 18tb each and 2 at 4tb.  I also then run a BTRFS cache pool with 4 devices.   2xXeon's and 196gb of DDR3 ECC at 1066.  The BIOS has conservative settings, no overclocking or anything odd.

 

I am using an IT mode flashed HBA to drive the array.  

 

I've attached the hardware dump and the diags.  I'm hopeful someone can tell me what's going on, as I can't use this system until this problem has been resolved.  Luckily I haven't encountered any system-disabling corruption as yet, but if it keeps dying I don' t think that trend will continue.  I'm keeping all of my valuable data offline at the point, waiting until this system is stable.

 

Most the storage is rather new, the server is rather old, yet all firmware is up-to-date (that I could find).  I also have a RAID controller that I could put in.  It has JBOD and battery backup, yet I'm thinking the less complex card is better.

 

Looking forward to your help, ` as I'd really like to complete this box, so I can start on a 3 way cluster that's been waiting in the wings.

 

Thank you!

 

Keith

hardwarre.txt media-diagnostics-20210807-1402.zip

Link to comment

Ran an overnight diag that stressed the disk, CPU, Memory, and Graphics.  Not one error so I conclude there's some sort of issue with the included drivers and my system OR since it happens overnight, every night, it may be a bad plugin or conflict.  Can the UNRAID team take a look?  In the meantime I'm going to try TrueNAS I suppose. :(  

 

Interesting sub-fact, while I was backing everythig up last night, had the array online but disabled Docker and all recurring plugins, and it ran all night with heavy usage (rsync of several hundred TB with URAID running) and no crashes at all.

Edited by sirebral
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...