December 23, 20196 yr Hi all, I've been troubleshooting some random freezes that I cannot replicate and I would appreciate any other advice on what to do next. Background These random freezes started happening while I was on vacation (meaning that I didn't make any changes to the system when it started happening) These freezes started on 6.7.2 I wasn't too concerned yet because I knew that 6.8.0 was around the corner and I was doing some major holiday hardware upgrades soon What I upgraded since crashes started happening I upgraded from 6.7.2 → 6.8.0 I upgraded & replaced all of my data & cache drives I upgraded my processor from AMD Ryzen 1600 to AMD Ryzen 2700X I replaced my Unraid flash drive (just because the previous one was getting old) What has stayed the same My motherboard My memory (8GB x 4 DIMMS) Tests I have run (but still have random freezes) Disabling VMs and Docker (through the options on Web GUI) MemTest (passed -- see attached) Upgraded BIOS (to latest recommended for my CPU version by ASROCK) Resetting all BIOS settings Enabled/disabled C6 states (for AMD Ryzen processors) What's strange I cannot replicate the issue This all started happening when I was not making any hardware or software changes Sometimes there are no errors at the time of the freeze, other times I get a lot of messages The RIP error values are not always the same (see my last syslog error and compare it to my actual attached console errors) The latest error that I received in my syslog is: kernel: RIP: 0010:get_page_from_freelist+0x252/0xd0b Attached are my diagnostics and my syslog (been recording it to the flash drive ever since I upgraded my processor). I would greatly appreciate any other perspectives on what to try next. Thanks in advance! syslog zeppelin-diagnostics-20191223-0842.zip Edited July 13, 20205 yr by vorel Solved the issue
December 23, 20196 yr Author Just wanted to share another update. I just cleared all of my syslogs and it failed again within ~30 minutes. Attached is fresh syslog and picture of console. syslog.zip Edited December 23, 20196 yr by vorel
July 13, 20205 yr Author It ended up being a memory issue with my server. I was running 4 DIMMs. 2 of them I purchased in 2017. I purchased 2 more in 2018. It ran fine for a year until I had the issues. Strange thing was, testing each stick individually with Memtest would work. If I put all four DIMMs in there, it would fail. I contacted my motherboard manufacturer (ASRock) and they said even though I had the EXACT same part number for memory in all four slots, they said it’s because I didn’t buy the memory all together is why I had the problem. I ordered new memory and my server hasn’t crashed at all. Update your BIOS and try different memory if you have it. Good luck! Hope this helps you.
July 13, 20205 yr 3 minutes ago, vorel said: It ended up being a memory issue with my server. I was running 4 DIMMs. 2 of them I purchased in 2017. I purchased 2 more in 2018. It ran fine for a year until I had the issues. Strange thing was, testing each stick individually with Memtest would work. If I put all four DIMMs in there, it would fail. I contacted my motherboard manufacturer (ASRock) and they said even though I had the EXACT same part number for memory in all four slots, they said it’s because I didn’t buy the memory all together is why I had the problem. I ordered new memory and my server hasn’t crashed at all. Update your BIOS and try different memory if you have it. Good luck! Hope this helps you. bios is updated to latest and memory was bought at the same time. 4x16gb sticks. the most ive had it run was about 6 hrs before it completely hanged and needed reboot. ive done all the tweaks to bios and config files but still cant get away from it. ill try removing the sticks and see where that gets me.
July 13, 20205 yr Author You can try one stick at a time to see if that exposes any clues. What I learned was when I tested one stick at a time, Memtest would pass. It wasn't until I was running all four sticks is where I had an issue. I was very skeptical when ASRock told me to get new memory, but it hasn't crashed since my original post.
July 13, 20205 yr 1 minute ago, vorel said: You can try one stick at a time to see if that exposes any clues. What I learned was when I tested one stick at a time, Memtest would pass. It wasn't until I was running all four sticks is where I had an issue. I was very skeptical when ASRock told me to get new memory, but it hasn't crashed since my original post. actually i looked at my mobo qvl and my ram sticks arent listed. so i went ahead and placed an order on specific sticks that work with it and amd. hoping that clears up the issue. thanks for your input. i wouldnt have thought about that lol
Archived
This topic is now archived and is closed to further replies.