Crash/Freeze of new Server-Setup


Recommended Posts

Hey all,

 

after long time of unsuccessful troubleshooting by myself i am forced to ask you for help.

My situation:

- My Server crashes every time in a timespan between of 2-24h after the reboot (since the begginning on)

- mostly no access via WebUI or SMB or via direct command line after crash (sometimes but rarely still access via Web UI)

- crash can be induced while copying or moving, but mostly happens in idle (mostly while parity check if it's running)

- crash during copying e.g. from windows result in error messages on Windows like "0x8007003B" or "0x8007003A" or "path is not available"

- it once a time induced the switch (it is connected to) also to crash/block the other clients of the switch

- once i saw one cpu core stuck at 100% (while only WebUI was still working on a crash)

- parity check usually becomes extremely slow after 40% progress (recently it started to find errors in parity check, but crash was existing long time before)

- Commandline shows log (see CommandlineOutputScreenshot.rar )

 

My lineup:

- Supermicro X10SBA (SoC: J1900)

- G.Skill SO-DIMM 8 GB DDR3L-1333 Kit

- PSU: Be-Quiet Pure Power 11 BN290

- 2x WD Red 4 TB WD40EFAX (unfortunately SMR) (Array)

- 2x SanDisk SSD PLUS 480GB (Cache Pool Raid 1)

 

I did the following approach of troubleshooting without any success:

- RAM MemTest (passed)

- CPU stress test (with Windows10): (passed)

- HDD SMART test: (passed)

- USB Boot stick replaced, USB port changed

- SATA cables replaced

- Ubuntu 20.04 with the same hardware setup: manual break after 5 days uptime without any errors

- Windows 10 with same hardware setup: manual break after days of uptime, also passed MemTest and CPU stress test

- update to Unraid OS 6.9.0-Beta25: still crash

- applied this patch (C-States): https://howto.lintel.in/freezing-intels-bay-trail-socs-cushioned-patch/

- removed the cache drives (due to BTRFS and non ECC RAM combination)

- boot in safe mode; boot in legacy and UEFI mode

- Syslog and enhanced Syslog does not help because of unexpected hard shutdowns

 

What else can i try to find the problem? Does anyone know or see a problem? Could it be a Hardware/Software incompatibility?

 

Next step for me would be to remove the hard drives or at least try a new "greenfield" build up of the server, what i actually wanted to avoid rather.

 

Thank you!

 

 

 

 

powernas-diagnostics-20200726-2053.zip powernas-smart-20200726-2053.zip powernas-smart-20200726-2052.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.