Hey all,
after long time of unsuccessful troubleshooting by myself i am forced to ask you for help.
My situation:
- My Server crashes every time in a timespan between of 2-24h after the reboot (since the begginning on)
- mostly no access via WebUI or SMB or via direct command line after crash (sometimes but rarely still access via Web UI)
- crash can be induced while copying or moving, but mostly happens in idle (mostly while parity check if it's running)
- crash during copying e.g. from windows result in error messages on Windows like "0x8007003B" or "0x8007003A" or "path is not available"
- it once a time induced the switch (it is connected to) also to crash/block the other clients of the switch
- once i saw one cpu core stuck at 100% (while only WebUI was still working on a crash)
- parity check usually becomes extremely slow after 40% progress (recently it started to find errors in parity check, but crash was existing long time before)
- Commandline shows log (see CommandlineOutputScreenshot.rar )
My lineup:
- Supermicro X10SBA (SoC: J1900)
- G.Skill SO-DIMM 8 GB DDR3L-1333 Kit
- PSU: Be-Quiet Pure Power 11 BN290
- 2x WD Red 4 TB WD40EFAX (unfortunately SMR) (Array)
- 2x SanDisk SSD PLUS 480GB (Cache Pool Raid 1)
I did the following approach of troubleshooting without any success:
- RAM MemTest (passed)
- CPU stress test (with Windows10): (passed)
- HDD SMART test: (passed)
- USB Boot stick replaced, USB port changed
- SATA cables replaced
- Ubuntu 20.04 with the same hardware setup: manual break after 5 days uptime without any errors
- Windows 10 with same hardware setup: manual break after days of uptime, also passed MemTest and CPU stress test
- update to Unraid OS 6.9.0-Beta25: still crash
- applied this patch (C-States): https://howto.lintel.in/freezing-intels-bay-trail-socs-cushioned-patch/
- removed the cache drives (due to BTRFS and non ECC RAM combination)
- boot in safe mode; boot in legacy and UEFI mode
- Syslog and enhanced Syslog does not help because of unexpected hard shutdowns
What else can i try to find the problem? Does anyone know or see a problem? Could it be a Hardware/Software incompatibility?
Next step for me would be to remove the hard drives or at least try a new "greenfield" build up of the server, what i actually wanted to avoid rather.
Thank you!
powernas-diagnostics-20200726-2053.zip
powernas-smart-20200726-2053.zip
powernas-smart-20200726-2052.zip