March 16, 20206 yr Hi all, Lately my server started to behave strangely. It freezes without further notice. I checked the logs but did not see any problem. I have an out of band nic to be able to connect to the server to remote control it and reboot it. When the server freezes the ipmi interface is also unreachable. You can find the diagnostic files attach to this post. Thanks. Denis ket-diagnostics-20200316-1113.zip
March 16, 20206 yr 4 hours ago, Unraid_Noob said: When the server freezes the ipmi interface is also unreachable. With that symptom, the first thing I would investigate is CPU cooling, PSU issues, memory issues. If it's hanging so hard that the IPMI is dead, that's almost got to be hardware.
March 17, 20206 yr Author Dear Jonathan, Thank you for your comment. It helped me a lot for the investigation. By checking the IPMI events I manage to find always the same error before the freeze. Here is the extract: "742 | 03/16/2020, 19:11:49 | CPU_CATERR | Processor | State Asserted - Asserted". By googling, I found that CATERR stands for catastrophic error. I read this article which explain pretty well the error handling process. Now I have to find what is causing my problem. Denis Edited March 17, 20206 yr by Unraid_Noob
July 9, 20205 yr Author Hi all, I contacted the Asrock support in order to get a response about my CPU_CATERR error. After several weeks of investigation, trial and errors, they apparently seem to point a problem with the OS. To summarize the problem, I experience random server freezes with a CPU_CATERR error in IPMI logs. The only solution is a hard reset of the server. Another interesting aspect is that when the problem occurs, the switch where my server is connected gets mad and blocks all the other ports as well. Meaning all the other hosts connected to the same switch become unreachable. After the hard restart everything returns to normal. I tried: - To remove all the PCI cards - To remove all the RAM modules except one - To change the RAM modules with another brand - To update the Bios with a special version provided by Asrock - To change the PSU with another one This without any success. Sometimes I can run the server for weeks without any issues, but on the other hand it happens that during the same day the server freezes 2 to 3 times. What can I do to prove to ASRock that the problem is the motherboard itself and nothing else. Thank you for your help. Denis
August 20, 20205 yr I have somewhat the same issue: I have a dual socket motherboard which has onboard error leds and when these errors happen the leds points at the 2nd CPU. I still have some troubleshooting to do, (re-seat the CPU in the socket (re-seat the RAM), and if that fails switch the CPUs from socket), but it is hard when the system continues to run without errors for months even though i did not change anything.
August 26, 20205 yr Author Hi NeoJoris, Sad news for you. After weeks of investigation and testing I ended requesting a replacement which happened. I just received it. Due to the time requested for the investigation process I bought a Supermicro board which will be my definitive motherboard. I will resell the Asrock one. Cheers, Denis
January 20, 20224 yr Hi got the same freezing and CAT_ERR errors out of nothing. sometimes it runs stable sometimes not. i also have a dual socket ASUS Z10PA-D8 Server board. did you solve the problem?
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.