xokia Posted September 19 Share Posted September 19 (edited) My System randomly crashes with what looks like seg faults. I can not figure out why. I have an i9-13900 cpu iGPU is using 512MB of system memory asus b760-i motherboard WIFI & bluetooth -> disabled in bios Soundcard -> disabled in bios G.SKILL Ripjaws S5 Series 64GB (2 x 32GB) 288-Pin PC RAM DDR5 5600 (PC5 44800) Desktop Memory Model F5-5600J3636D32GX2-RS5W 2TB solidigm NVME -> not added to anything Team group 2TB SSD -not added to anything 20TB exos HDD 10TB WD red pro HDD 2TB random drive Nothing else installed. Nothing over clocked. I have tested the RAM every way possible and it passes. I can not figure out where the fault is. It doesnt appear to be hardware. Issue never happens when the CPU is doing something always happens when things go idle. home-server-diagnostics-20230919-1456.zip syslog Edited September 22 by xokia Quote Link to comment
JorgeB Posted September 20 Share Posted September 20 Crashes look hardware related to me, try with just on RAM stick, if the same try the other one, that would basically rule out a RAM issue, memtest doesn't always find every issue. Quote Link to comment
xokia Posted September 20 Author Share Posted September 20 (edited) 9 hours ago, JorgeB said: Crashes look hardware related to me, try with just on RAM stick, if the same try the other one, that would basically rule out a RAM issue, memtest doesn't always find every issue. Good suggestion will try that. Just seems weird to me that issue wouldn't be hit under load if its faulty RAM. This only occurs when things are idle. Make me think C-state or package S-state issue. I noticed Asus released new bios code yesterday with micro-code update. I will try bios first. Then try try single stick of RAM. Can you update micro-code manually with slackware? I'll add CPU has been swapped and issue remains. So the only other hardware would be memory or motherboard. Edited September 20 by xokia Quote Link to comment
JorgeB Posted September 20 Share Posted September 20 32 minutes ago, xokia said: . This only occurs when things are idle. Make me think C-state or package S-state issue. Could be, you can try disabling C-states in the BIOS, BIOS update may also help, newer Unraid when released may have newer microcode, but doubt you can update it manually easily. Quote Link to comment
xokia Posted September 20 Author Share Posted September 20 (edited) 2 hours ago, JorgeB said: Could be, you can try disabling C-states in the BIOS, BIOS update may also help, newer Unraid when released may have newer microcode, but doubt you can update it manually easily. On debian and other linux builds apt install intel-microcode If your bios micro code is older then the update then it gets applied Edited September 20 by xokia Quote Link to comment
xokia Posted September 22 Author Share Posted September 22 (edited) BIOS update did not resolve issue moving onto single 32GB RAM. Will see if it stays up. If this fails I'll swap the RAM sticks. *edit* First RAM stick still get crashes. Trying 2nd RAM stick only. This is so frustrating because this system is fairly basic. Using no external cards. Typically Seg faults are programs errors vs RAM errors but anything is worth trying. 😟 Edited September 22 by xokia Quote Link to comment
xokia Posted September 23 Author Share Posted September 23 (edited) Both RAM sticks were run individually and it still crashed can't be the memory. Is there any additional debug switches that can be turned on that could give us some clue what is causing the constant crash? Broke it down as bare bones as possible, Removed the NVME, Removed the SSD, only 1 stick of 32GB RAM. Have 3 HDD which all work and a i-i9-13900 which the CPU has been swapped and the problem remains. I seem to always get the following before the crash Sep 22 20:22:27 Home-Server kernel: __vm_enough_memory: pid: 9429, comm: dockerd, no enough memory for the allocation I currently have just nraid and plex media server running. I suppose I can try shutting down plex and just have Nraid running syslog_9_22_2023_dimm2.txt Edited September 23 by xokia Quote Link to comment
JorgeB Posted September 23 Share Posted September 23 One more thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
xokia Posted September 23 Author Share Posted September 23 12 hours ago, JorgeB said: One more thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Is there additional debug switches we can try to enable to catch it? Hardware is CPU (which has been swapped) highly unlikely 2 CPUs would exhibit the same issue. RAM (swapped) RAM. Also increasingly unlikely both RAMs would exhibit same issue. MemTest passes no matter how long I run the test. Asus Motherboard and motherboard peripherals. This one is the only thing I can't swap. I have disabled as much as I can in the bios. The only thing I can't disable is the GPU or the LAN. I can try and disable IOMMU. Maybe Asus has a bug in bios? Quote Link to comment
xokia Posted September 29 Author Share Posted September 29 (edited) sent both the motherboard and the RAM back for replacements. Everything except these two items has been swapped. Dont know what else to try. Edited September 29 by xokia Quote Link to comment
xokia Posted October 15 Author Share Posted October 15 motherboard and RAM replaced. Fingers crossed but so far no more crashing. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.