vorel Posted December 23, 2019 Share Posted December 23, 2019 (edited) Hi all, I've been troubleshooting some random freezes that I cannot replicate and I would appreciate any other advice on what to do next. Background These random freezes started happening while I was on vacation (meaning that I didn't make any changes to the system when it started happening) These freezes started on 6.7.2 I wasn't too concerned yet because I knew that 6.8.0 was around the corner and I was doing some major holiday hardware upgrades soon What I upgraded since crashes started happening I upgraded from 6.7.2 → 6.8.0 I upgraded & replaced all of my data & cache drives I upgraded my processor from AMD Ryzen 1600 to AMD Ryzen 2700X I replaced my Unraid flash drive (just because the previous one was getting old) What has stayed the same My motherboard My memory (8GB x 4 DIMMS) Tests I have run (but still have random freezes) Disabling VMs and Docker (through the options on Web GUI) MemTest (passed -- see attached) Upgraded BIOS (to latest recommended for my CPU version by ASROCK) Resetting all BIOS settings Enabled/disabled C6 states (for AMD Ryzen processors) What's strange I cannot replicate the issue This all started happening when I was not making any hardware or software changes Sometimes there are no errors at the time of the freeze, other times I get a lot of messages The RIP error values are not always the same (see my last syslog error and compare it to my actual attached console errors) The latest error that I received in my syslog is: kernel: RIP: 0010:get_page_from_freelist+0x252/0xd0b Attached are my diagnostics and my syslog (been recording it to the flash drive ever since I upgraded my processor). I would greatly appreciate any other perspectives on what to try next. Thanks in advance! syslog zeppelin-diagnostics-20191223-0842.zip Edited July 13, 2020 by vorel Solved the issue Quote Link to comment
vorel Posted December 23, 2019 Author Share Posted December 23, 2019 (edited) Just wanted to share another update. I just cleared all of my syslogs and it failed again within ~30 minutes. Attached is fresh syslog and picture of console. syslog.zip Edited December 23, 2019 by vorel Quote Link to comment
Furby8704 Posted July 13, 2020 Share Posted July 13, 2020 did you resolve it?? just did a 3950x build and getting the crashes Quote Link to comment
vorel Posted July 13, 2020 Author Share Posted July 13, 2020 It ended up being a memory issue with my server. I was running 4 DIMMs. 2 of them I purchased in 2017. I purchased 2 more in 2018. It ran fine for a year until I had the issues. Strange thing was, testing each stick individually with Memtest would work. If I put all four DIMMs in there, it would fail. I contacted my motherboard manufacturer (ASRock) and they said even though I had the EXACT same part number for memory in all four slots, they said it’s because I didn’t buy the memory all together is why I had the problem. I ordered new memory and my server hasn’t crashed at all. Update your BIOS and try different memory if you have it. Good luck! Hope this helps you. Quote Link to comment
Furby8704 Posted July 13, 2020 Share Posted July 13, 2020 3 minutes ago, vorel said: It ended up being a memory issue with my server. I was running 4 DIMMs. 2 of them I purchased in 2017. I purchased 2 more in 2018. It ran fine for a year until I had the issues. Strange thing was, testing each stick individually with Memtest would work. If I put all four DIMMs in there, it would fail. I contacted my motherboard manufacturer (ASRock) and they said even though I had the EXACT same part number for memory in all four slots, they said it’s because I didn’t buy the memory all together is why I had the problem. I ordered new memory and my server hasn’t crashed at all. Update your BIOS and try different memory if you have it. Good luck! Hope this helps you. bios is updated to latest and memory was bought at the same time. 4x16gb sticks. the most ive had it run was about 6 hrs before it completely hanged and needed reboot. ive done all the tweaks to bios and config files but still cant get away from it. ill try removing the sticks and see where that gets me. Quote Link to comment
vorel Posted July 13, 2020 Author Share Posted July 13, 2020 You can try one stick at a time to see if that exposes any clues. What I learned was when I tested one stick at a time, Memtest would pass. It wasn't until I was running all four sticks is where I had an issue. I was very skeptical when ASRock told me to get new memory, but it hasn't crashed since my original post. Quote Link to comment
Furby8704 Posted July 13, 2020 Share Posted July 13, 2020 1 minute ago, vorel said: You can try one stick at a time to see if that exposes any clues. What I learned was when I tested one stick at a time, Memtest would pass. It wasn't until I was running all four sticks is where I had an issue. I was very skeptical when ASRock told me to get new memory, but it hasn't crashed since my original post. actually i looked at my mobo qvl and my ram sticks arent listed. so i went ahead and placed an order on specific sticks that work with it and amd. hoping that clears up the issue. thanks for your input. i wouldnt have thought about that lol Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.