September 3, 20214 yr Hi, In the last few weeks I've been having random crashes occur more regularly over that time. What started from perhaps every two weeks went to every week and now virtually every day. I've done a memtest with no errors sadly. I originally took out the NVMe cache drives temporarily with no change, as during some of the crashes I was only able to write to the HDDs and not the NVMe drives until a hard reset was done. I've upgraded to 6.10rc1 due to the macvlan crash, which I originally had allowed the host access to containers (now disabled and using ipvlan). The voltages shown on the IPMI seem fine so I can't believe it's a power supply fault developing. At this point I'm stumped, the hardware (other than the hard drives) is pretty new too and it was reliable for quite some time. I've now enabled mirroring of syslog to flash for now, but I've attached a snippet of what I was able to retrieve prior to needing to hard reset again. Next time it crashes I will be able to get a full syslog. I'm hoping someone possibly has an idea of what might be causing this problem, beyond the "it could be the motherboard, CPU, memory, hard drives or power supply" which sadly doesn't narrow things down much. Let me know if you have any questions. Thanks in advance. Basic summary of specs: AMD Threadripper Pro 3995WX 64-Core CPU 512GB DDR4 ECC RDIMM (64GB x 8 at 3200MHz), Kingston Server Premier ASUS WRX80-E SAGE Wifi Motherboard Corsair AXi 1200 PSU ASUS ROG 1080Ti OC GPU Samsung 970 Pro 512GB NVMe x 4 Western Digital Red NAS drives for general storage and parity, 2 x 10TB and 3 x 4TB EDIT: Looks like either or both changing the memory clock speed to a lower value, not that which is officially stated as compatible with my motherboard on Kingston's website, and disabling global c-states control has solved the instability. I've not had any issues as yet since changing those settings. Thanks for the help! Fingers crossed it stays this way. unraid_syslog_snippet.txt tower-diagnostics-20210901-1659.zip Edited September 10, 20214 yr by Ixel Possibly solved
September 3, 20214 yr Author 1 minute ago, ChatNoir said: Hello, Did you check this part of the FAQ ? Hi, Thanks for replying. I did not, sorry. I have just read it now though and will see what I can find in the BIOS related to that and make the appropriate changes. I'll let you know how it goes, thanks! 👍
September 3, 20214 yr 2 minutes ago, Ixel said: Thanks for replying. I did not, sorry. I have just read it now though and will see what I can find in the BIOS related to that and make the appropriate changes. I'll let you know how it goes, thanks! 👍 Also consider memory speed. I glanced at it and it seems you have 8x dual rank DIMMs. Not sure you are running it at a speed supported by the memory controller.
September 3, 20214 yr Author 39 minutes ago, ChatNoir said: Also consider memory speed. I glanced at it and it seems you have 8x dual rank DIMMs. Not sure you are running it at a speed supported by the memory controller. According to Kingston it should be supported, however I've manually set them to 2666 now. Global C-states control is now disabled too. Fingers crossed it solves the problem.
September 3, 20214 yr 1 minute ago, Ixel said: According to Kingston it should be supported, Kingston is the memory module maker, the limiting factor is the motherboard and CPU, not the memory.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.