False temp alerts and fan Issues - Xeon E5-25x0 v1


False CPU Sensor Events and Max Fan Speeds  

40 members have voted

You do not have permission to vote in this poll, or see the poll results. Please sign in or register to vote in this poll.

Recommended Posts

This is not just an ASrock issue.  With Intel 2600cp in Intel P4000 chassis I get the following log spamming.  Dynamix system Temperature on the unRaid dashboard is reporting Array Startedcpu.png100 Cmb.png81 CunRaid webGui.  I'll use this Dynamix CPU temp (the first of the 2 shown) to show how rogue Windows VM's are mostly to blame.

 

At this time the system should have been idling early in the morning.  This is with 3 Win10 VM's running, pfSense and a Transmission docker.  unRaid dashboard was reporting that 2 of the 3 VM''s were flashing over 50% utilization of their allocated CPU cores.  Investigating the VM's showed that Firefox and Chrome were pulling a lot of CPU in the VM's.  One was related to a sitting at a youtube screen, and the other was sitting at the unRaid dashboard.  Closing those browser tabs helped a bit, bringing cpu temp down to about 90C.  Killing the worst offending VM brought CPU temp down to 74 C.  Killing the Transmission dropped temp to 73 C, killing pfSense didn't change anything, killing the second Win10 VM also had minimal impact, but now Dynamix is showing cpu.png70 Cmb.png58 C.  This seems to be about the baseline for this box with no dockers or VM's running.

Jul 11 08:48:39 Tower99 kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 42758769)
Jul 11 08:48:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:48:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40455975)
Jul 11 08:48:41 Tower99 kernel: CPU30: Core temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU14: Core temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU28: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU12: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU14: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU24: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU8: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU25: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU9: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU10: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU26: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU11: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU27: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU29: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU13: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU15: Package temperature/speed normal
Jul 11 08:53:35 Tower99 kernel: CPU31: Package temperature/speed normal
Jul 11 08:53:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:53:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40479777)
Jul 11 08:53:41 Tower99 kernel: CPU30: Core temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU14: Core temperature above threshold, cpu clock throttled (total events = 40504114)
Jul 11 08:58:35 Tower99 kernel: CPU12: Package temperature above threshold, cpu clock throttled (total events = 42808178)
Jul 11 08:58:35 Tower99 kernel: CPU28: Package temperature above threshold, cpu clock throttled (total events = 42808195)
Jul 11 08:58:35 Tower99 kernel: CPU24: Package temperature above threshold, cpu clock throttled (total events = 42808325)
Jul 11 08:58:35 Tower99 kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 42808322)
Jul 11 08:58:35 Tower99 kernel: CPU9: Package temperature above threshold, cpu clock throttled (total events = 42808346)
Jul 11 08:58:35 Tower99 kernel: CPU25: Package temperature above threshold, cpu clock throttled (total events = 42808351)
Jul 11 08:58:35 Tower99 kernel: CPU10: Package temperature above threshold, cpu clock throttled (total events = 42808345)
Jul 11 08:58:35 Tower99 kernel: CPU26: Package temperature above threshold, cpu clock throttled (total events = 42808348)
Jul 11 08:58:35 Tower99 kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 42808337)
Jul 11 08:58:35 Tower99 kernel: CPU27: Package temperature above threshold, cpu clock throttled (total events = 42808352)
Jul 11 08:58:35 Tower99 kernel: CPU29: Package temperature above threshold, cpu clock throttled (total events = 42808357)
Jul 11 08:58:35 Tower99 kernel: CPU13: Package temperature above threshold, cpu clock throttled (total events = 42808349)
Jul 11 08:58:35 Tower99 kernel: CPU31: Package temperature above threshold, cpu clock throttled (total events = 42808339)
Jul 11 08:58:35 Tower99 kernel: CPU15: Package temperature above threshold, cpu clock throttled (total events = 42808334)
Jul 11 08:58:35 Tower99 kernel: CPU28: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU12: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU8: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU24: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU25: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU9: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU26: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU10: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU27: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU11: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU13: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU29: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU15: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU31: Package temperature/speed normal
Jul 11 08:58:35 Tower99 kernel: CPU14: Core temperature/speed normal
Jul 11 08:58:39 Tower99 kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 42808190)
Jul 11 08:58:39 Tower99 kernel: CPU30: Package temperature/speed normal
Jul 11 08:58:41 Tower99 kernel: CPU30: Core temperature above threshold, cpu clock throttled (total events = 40504573)

 

Link to comment
  • 2 months later...

I'm having similar issues. The CPU fans go into full speed. BMC firmware was 0.18.0 but upgraded to 0.19.0 (the latest). Also upgraded the BIOS to the latest version (1.9) and performed a clear CMOS as recommended by an ASRock Tech. What I don't understand is the machine worked fine for 4 months with no changes and then out of the blue this issue occurred. I'm guessing the sensors tolerance change over time and causes this problem.

Have also tried restoring the BMC defaults then re-configuring. Works fine for a while and then at any given time the fans will kick into high gear and stay there until a reboot or two.

Have emailed ASRock again about an hour ago to see what they can recommend. The other issue is the board is under warranty and the supplier said they would refund or send a replacement but if it's a generic issue with the board it's bound to happen again I would have thought.

 

I have two Intel Xeon E5 2670's installed.

Link to comment

I have had this motherboard since 9-1-2013 and was one of the first ones to post this problem on the comments for Newegg about it, my post was the user James G.  I spent a few months going around with AsRock about it and as always there response was that if it was the board then everyone would have the problem.  What they fail to see is that not everyone monitors IPMI issues.  I use mine on vSphere so I do monitor these events.  So far the board has not died but I sure would had liked to have been able to fully monitor the health of my server.

But yeah, mine works fine after a reboot but after an hour or too the CPU temp on both go completely nuts.

 

Capture.JPG

Edited by reefcrazed
  • Upvote 1
Link to comment
  • 2 months later...
  • 1 month later...
  • 2 months later...

I have this moterboard on my desktop and 2x 2680 V2. And have this problem. Max CPU temp on 100% load is 65C.

may be Power suply problem?

and i have problems whis USB.

 

In stress tect AIDA 64, and only stress "test cache" 1-2 minutes and my desktop restart. Sory for my English!

Link to comment
  • 2 months later...

Just putting this out there..... it may be too early to get excited however, I've just updated to 6.7.1 and upon checking my event logs to do my daily clearing of overtemp warnings,... THERE ARE NONE!!

 

Just getting "There are no event log entries present at this time."

 

I would normally have about 6 overtemp warnings by this stage, anyone else checked their event logs recently after 6.7.1??

 

M/B: ASRock EP2C602-4L/D16 Version

BIOS: American Megatrends Inc. Version P1.90. Dated: 04/11/2018

CPU: 2 x Intel® Xeon® CPU E5-2670 0 @ 2.60GHz

Link to comment

I would love to get excited about that, but so far nothing has fixed this.  I am pinning it down to the hardware.  Post back in a week please?I would love to get excited about that, but so far nothing has fixed this.  I am pinning it down to the hardware.  Post back in a week please?

I would love to be able to keep this board even longer, although it is getting old.  I upgraded my processors to the fastest the board will take and I bumped up the ram to 192 gig recently, so it would be nice to finally get this ridiculous problem fixed.

Edited by reefcrazed
Link to comment

i think problem whith regulation fans speed, or just fans. on desktop i have freezzes evere time, 1-24 hour. 5-10 times every day. I change every restart fan speed in bios (5-10 try)(level 5- level 6- ltvel 5), and all right in one time. month, 2 month, to new restart. may by problem in coolers???

Edited by fila61
Link to comment
  • 2 weeks later...
  • 3 weeks later...
2 hours ago, reefcrazed said:

The chances are that it is fixed are like probably zero.  I am finally ditching my board, 192gb of ECC, dual Xeons and going low power.  The monitoring not working does not bother me that bad, but the amount of power used all day does.

mine has been running about 14 hours with zero entries normally i would have at least 4-6 by now

Link to comment
9 hours ago, majorpaynedof said:

I have been working with ASrock on this issue as I recently built a machine. THey have updated me to the 0.19.2 BMC bios. I ran this since last night. I hope to be able to report that they have finally fixed this.

 

Keep us updated. 

 

Do they have 0.19.2 BMC Bios available for the rest of us?

Link to comment
14 hours ago, BRiT said:

 

Keep us updated. 

 

Do they have 0.19.2 BMC Bios available for the rest of us?

I'm not sure, This was given to me via Dropbox. I can upload it somewhere and make it available with the understanding that I'm not responsible for any results. 

Edited by majorpaynedof
Link to comment
1 hour ago, majorpaynedof said:

image.png.27f7a559c61f50a4a9527fc84551a39c.png

 

So far I'm happy to report with 3 days uptime that I have yet to get a Event log telling me of the false spike in temps. 

can you reboot? then test uptime again? I have upteme 22 days now, but after rebut i have this problem.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.