May 27, 20197 yr Dear all, I need help with my unstable server. For many months I have tried to find the root cause to my problem(s). I feel that I have tried everything, but since I still have the problems I obviously have not. Let me list the symptoms: 1) At boot unRAID can see all PCIe cards inserted, but after array starts (I have not experienced it before array starts, even when I have disabled autostart) after a few minutes it suddenly drops 3 of the PCIe cards. I have tried with two NVME, a LSI RAID card and a Chelsio 10GBIT NIC. (All testet on another system no problems). If I remove all other cards it seems to be working (though I cannot be sure because removing the other cards cripples the system). 2) If I remove the "excess" PCIe cards the system "runs" for some hours I have created a grafan graph here I can see I have had restarts 17/19/23/23 and system halts the 21/25/27. With restarts the server unprovoked just reboots. With system halts the server halt, 100% unresponsive to anything SSH, HTTP or lokal input (ACPI power button). Only a reset button or power button (on PSU) works. I do not have keyboard on the unRAID as I have routed my 1 of 2 usb controllers to a VM (the other has a single port is used for unRAID USN stick). 3) I have tried to shutdown my VM to see if that is the culprit, and it does seem to have an effect on stability, though I can not make it conclusive. Also a virtual PC affecting hardware in such degree is in my mind not very likely (but not impossible). When it works, all seems fine. I have tried to use different PCIe slots on the motherboard (Supermicro X9DRI-LN4+) all is working except when described above. I have tried to reseed the CPU's, I have tried to check the RAM. I know you want my diagnostics, however, with every reboot it will be lost right? (I am not 100% sure if that is the case). But for good measure I have attached it anyways. At the time of the diagnostic download the system have been running a little over an hour. Any ideas? Know of any tools to test the system (CPU/chipset/RAM/PCIe-bus) (I know there is a memtest included in the unRAID boot menu) Kind regards Alphahelix xeon-diagnostics-20190527-0535.zip
May 27, 20197 yr 10 hours ago, Alphahelix said: For many months I have tried to find the root cause to my problem(s). Does two 8-pin CPU power socket fully wire and indivual direct connect to PSU ? As not fix by longtime troubleshoot, I would suggest remove a CPU, only run in 1S ( plug all card in corresponding slot ) and try again.
May 27, 20197 yr Author 2 hours ago, Benson said: Does two 8-pin CPU power socket fully wire and indivual direct connect to PSU ? Yes. I use a Corsair HX1200, I bought it specifically to get native 2CPU support. 2 hours ago, Benson said: As not fix by longtime troubleshoot, I would suggest remove a CPU, only run in 1S ( plug all card in corresponding slot ) and try again. I will try that, however I cant fit both 2 slot GPUs the LSI (used for cache) and another card, as it will be blocked by ine of the GPUs.
Archived
This topic is now archived and is closed to further replies.