Jump to content
DoeBoye

False temp alerts and fan Issues - Xeon E5-25x0 v1

False CPU Sensor Events and Max Fan Speeds  

31 members have voted

You do not have permission to vote in this poll, or see the poll results. Please sign in or register to vote in this poll.

33 posts in this topic Last Reply

Recommended Posts

This is not just an ASrock issue.  With Intel 2600cp in Intel P4000 chassis I get the following log spamming.  Dynamix system Temperature on the unRaid dashboard is reporting Array Startedcpu.png100 Cmb.png81 CunRaid webGui.  I'll use this Dynamix CPU temp (the first of the 2 shown) to show how rogue Windows VM's are mostly to blame.

 

At this time the system should have been idling early in the morning.  This is with 3 Win10 VM's running, pfSense and a Transmission docker.  unRaid dashboard was reporting that 2 of the 3 VM''s were flashing over 50% utilization of their allocated CPU cores.  Investigating the VM's showed that Firefox and Chrome were pulling a lot of CPU in the VM's.  One was related to a sitting at a youtube screen, and the other was sitting at the unRaid dashboard.  Closing those browser tabs helped a bit, bringing cpu temp down to about 90C.  Killing the worst offending VM brought CPU temp down to 74 C.  Killing the Transmission dropped temp to 73 C, killing pfSense didn't change anything, killing the second Win10 VM also had minimal impact, but now Dynamix is showing cpu.png70 Cmb.png58 C.  This seems to be about the baseline for this box with no dockers or VM's running.

Jul 11 08:48:39 Tower99 kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 42758769)
Jul 11 08:48:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:48:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40455975)
Jul 11 08:48:41 Tower99 kernel: CPU30: Core temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU14: Core temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU28: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU12: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU14: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU24: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU8: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU25: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU9: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU10: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU26: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU11: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU27: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU29: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU13: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU15: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU31: Package temperature/speed normal
Jul 11 08:53:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:53:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40479777)
Jul 11 08:53:41 Tower99 kernel: CPU30: Core temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU14: Core temperature above threshold, cpu clock throttled (total events = 40504114)
Jul 11 08:58:35 Tower99 kernel: CPU12: Package temperature above threshold, cpu clock throttled (total events = 42808178)
Jul 11 08:58:35 Tower99 kernel: CPU28: Package temperature above threshold, cpu clock throttled (total events = 42808195)
Jul 11 08:58:35 Tower99 kernel: CPU24: Package temperature above threshold, cpu clock throttled (total events = 42808325)
Jul 11 08:58:35 Tower99 kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 42808322)
Jul 11 08:58:35 Tower99 kernel: CPU9: Package temperature above threshold, cpu clock throttled (total events = 42808346)
Jul 11 08:58:35 Tower99 kernel: CPU25: Package temperature above threshold, cpu clock throttled (total events = 42808351)
Jul 11 08:58:35 Tower99 kernel: CPU10: Package temperature above threshold, cpu clock throttled (total events = 42808345)
Jul 11 08:58:35 Tower99 kernel: CPU26: Package temperature above threshold, cpu clock throttled (total events = 42808348)
Jul 11 08:58:35 Tower99 kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 42808337)
Jul 11 08:58:35 Tower99 kernel: CPU27: Package temperature above threshold, cpu clock throttled (total events = 42808352)
Jul 11 08:58:35 Tower99 kernel: CPU29: Package temperature above threshold, cpu clock throttled (total events = 42808357)
Jul 11 08:58:35 Tower99 kernel: CPU13: Package temperature above threshold, cpu clock throttled (total events = 42808349)
Jul 11 08:58:35 Tower99 kernel: CPU31: Package temperature above threshold, cpu clock throttled (total events = 42808339)
Jul 11 08:58:35 Tower99 kernel: CPU15: Package temperature above threshold, cpu clock throttled (total events = 42808334)
Jul 11 08:58:35 Tower99 kernel: CPU28: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU12: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU8: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU24: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU25: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU9: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU26: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU10: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU27: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU11: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU13: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU29: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU15: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU31: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU14: Core temperature/speed normal
Jul 11 08:58:39 Tower99 kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 42808190)
Jul 11 08:58:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:58:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40504573)

 

Share this post


Link to post

No. You're wrong. This is indeed an ASRock issue.

Share this post


Link to post

The baseline for the system is around 40c.  The spikes up to 100c is the extreme error. Look at the pictures attached earlier in this thread and the other related threads to see.

Share this post


Link to post

I'm having similar issues. The CPU fans go into full speed. BMC firmware was 0.18.0 but upgraded to 0.19.0 (the latest). Also upgraded the BIOS to the latest version (1.9) and performed a clear CMOS as recommended by an ASRock Tech. What I don't understand is the machine worked fine for 4 months with no changes and then out of the blue this issue occurred. I'm guessing the sensors tolerance change over time and causes this problem.

Have also tried restoring the BMC defaults then re-configuring. Works fine for a while and then at any given time the fans will kick into high gear and stay there until a reboot or two.

Have emailed ASRock again about an hour ago to see what they can recommend. The other issue is the board is under warranty and the supplier said they would refund or send a replacement but if it's a generic issue with the board it's bound to happen again I would have thought.

 

I have two Intel Xeon E5 2670's installed.

Share this post


Link to post
Posted (edited)

I have had this motherboard since 9-1-2013 and was one of the first ones to post this problem on the comments for Newegg about it, my post was the user James G.  I spent a few months going around with AsRock about it and as always there response was that if it was the board then everyone would have the problem.  What they fail to see is that not everyone monitors IPMI issues.  I use mine on vSphere so I do monitor these events.  So far the board has not died but I sure would had liked to have been able to fully monitor the health of my server.

But yeah, mine works fine after a reboot but after an hour or too the CPU temp on both go completely nuts.

 

Capture.JPG

Edited by reefcrazed
  • Upvote 1

Share this post


Link to post
Posted (edited)

So basically you guys are seeing a bug that is an insane 5 years old now.  The board has not changed price in all that time either, I paid $319.99 plus tax for it back in 2013.

Edited by reefcrazed

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now