Please help with crash log


Recommended Posts

My System randomly crashes with what looks like seg faults. I can not figure out why.

 

I have an i9-13900 cpu

iGPU is using 512MB of system memory

asus b760-i motherboard

    WIFI & bluetooth -> disabled in bios

    Soundcard -> disabled in bios

G.SKILL Ripjaws S5 Series 64GB (2 x 32GB) 288-Pin PC RAM DDR5 5600 (PC5 44800) Desktop Memory Model F5-5600J3636D32GX2-RS5W

2TB solidigm NVME -> not added to anything

Team group 2TB SSD -not added to anything

20TB exos HDD

10TB WD red pro HDD

2TB random drive

 

Nothing else installed. Nothing over clocked.

 

I have tested the RAM every way possible and it passes.fail.thumb.jpg.e1162dbcd677bb8c997b585eecccb408.jpg

 

I can not figure out where the fault is. It doesnt appear to be hardware. Issue never happens when the CPU is doing something always happens when things go idle.

home-server-diagnostics-20230919-1456.zip syslog

Edited by xokia
Link to comment
9 hours ago, JorgeB said:

Crashes look hardware related to me, try with just on RAM stick, if the same try the other one, that would basically rule out a RAM issue, memtest doesn't always find every issue.

Good suggestion will try that. Just seems weird to me that issue wouldn't be hit under load if its faulty RAM. This only occurs when things are idle. Make me think C-state or package S-state issue.

I noticed Asus released new bios code yesterday with micro-code update. I will try bios first. Then try try single stick of RAM. Can you update micro-code manually with slackware?

 

I'll add CPU has been swapped and issue remains. So the only other hardware would be memory or motherboard.

Edited by xokia
Link to comment
32 minutes ago, xokia said:

. This only occurs when things are idle. Make me think C-state or package S-state issue.

Could be, you can try disabling C-states in the BIOS, BIOS update may also help, newer Unraid when released may have newer microcode, but doubt you can update it manually easily.

Link to comment
2 hours ago, JorgeB said:

Could be, you can try disabling C-states in the BIOS, BIOS update may also help, newer Unraid when released may have newer microcode, but doubt you can update it manually easily.

On debian and other linux builds

 

apt install intel-microcode

 

If your bios micro code is older then the update then it gets applied

Edited by xokia
Link to comment

BIOS update did not resolve issue moving onto single 32GB RAM. Will see if it stays up. If this fails I'll swap the RAM sticks.

 

*edit*

First RAM stick still get crashes. Trying 2nd RAM stick only.

This is so frustrating because this system is fairly basic. Using no external cards.

 

Typically Seg faults are programs errors vs RAM errors but anything is worth trying. 😟

Edited by xokia
Link to comment

Both RAM sticks were run individually and it still crashed can't be the memory. Is there any additional debug switches that can be turned on that could give us some clue what is causing the constant crash?

 

Broke it down as bare bones as possible, Removed the NVME, Removed the SSD, only 1 stick of 32GB RAM. Have 3 HDD which all work and a i-i9-13900 which the CPU has been swapped and the problem remains.

 

I seem to always get the following before the crash

Sep 22 20:22:27 Home-Server kernel: __vm_enough_memory: pid: 9429, comm: dockerd, no enough memory for the allocation

 

I currently have just nraid and plex media server running. I suppose I can try shutting down plex and just have Nraid running

 

image.thumb.png.4279122829e9f2569b86588d03730c25.png 

 

syslog_9_22_2023_dimm2.txt

Edited by xokia
Link to comment
12 hours ago, JorgeB said:

One more thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. 

Is there additional debug switches we can try to enable to catch it?

 

Hardware is

CPU (which has been swapped) highly unlikely 2 CPUs would exhibit the same issue.

RAM (swapped) RAM. Also increasingly unlikely both RAMs would exhibit same issue. MemTest passes no matter how long I run the test.

 

Asus Motherboard and motherboard peripherals. This one is the only thing I can't swap.

I have disabled as much as I can in the bios. The only thing I can't disable is the GPU or the LAN. I can try and disable IOMMU. Maybe Asus has a bug in bios?

Link to comment
  • 3 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.