Array fails to start with Overclock


Recommended Posts

When I first built this server I was always under the impression the overclock was applied and working. However due to some issue with USB shorting out that caused the server to not boot at all, it was offline for a few months. I made due till recently and decided to get it back up and running. In the trouble shooting process I cleared the CMOS. I figured out that it was the USB when I removed the server from the rack and set it up on my desk. It booted first try. I used an extra SSD with a windows install to test all the hardware. I reapplied the OC and updated the BIOS. I continued to stress test it for about a week until I was confident it was stable.

 

I have now reinstalled the server into the rack. Unfortunately when I try to start the array in unraid the server will crash and restart. Any overclock even the simple multi core enhancement will result in the array failing to start. 

 

It's late now but tomorrow I plan on trying to start the array with VMs not auto starting to see if it's a VM issue. I was also thinking about starting fresh as I know early unraid involved messing around with XML files for CPU pinning and GPU pass through as well as random other tweaks that I've made to get some random thing to work. All thoughts and guidance welcome with the exception of "overclocking = unstable = bad idea". If I can't get it to work in the end, I'll just have to suck it up, but I'm going to try! Let me know if there is anything I can provide to assist in troubleshooting. Thank you in advance!

 

Hardware:

CPU - 7980xe

MOBO - Fatal1ty X299 Professional Gaming i9

GPUs - 1070, 1080ti

 

Unraid ver 6.7.2

Edited by FumblingintheDark
Link to comment

Does once not OC then everything run normal ? ( include VM, GPU passthrough ... )

 

4 hours ago, FumblingintheDark said:

All thoughts and guidance welcome with the exception of "overclocking = unstable = bad idea".

Really like this and support !! 

I seldom O.C. in recent years, mainly bacause O.C. always oppsite to power draw and not much gain. But some tweak I also will do.

 

I belive you could identify the cause, because overclocker always harder for try and error. 🤣

 

 

 

Edited by Benson
Link to comment
5 hours ago, Benson said:

Does once not OC then everything run normal ? ( include VM, GPU passthrough ... )

Currently with no OC everything works. I just wonder if in the past when trying to get turbo boost or some other out of the box thing to work I added a line of code or something that is causing the issue. Unraid has come so far that a lot of things you had to mod a file is now a simple mouse click.

Link to comment

I've added some files to hopefully help in the troubleshooting. I've noticed w/o the OC on the "watch -n 1 grep MHz /proc/cpuinfo" will show the speed get to 3.9GHz with the very rare 4GHz. I really have no idea why those are the chosen speeds seeing that the 7980xe has a base of 2.6GHz, Max Turbo of 4.2GHz, and a Turbo 3.0 of 4.4GHz. I've seen sites say the all core boost is 3.4GHz

syslog.txt lscpu.txt syslog

Edited by FumblingintheDark
Link to comment

You complaint CPU clock speed always too high ? Not reach turbo freq ? Not keep in low when light loading ? or whatever .... what is the problem ?

 

There are CPU Scaling Governor which you can control in OS, you may install "tips and tweaks" plugin to control that. But you also need to check does BIOS have any override or hard setting.

 

Nowadays CPU are complicated, for example, some CPU have turbo in different form, i.e you load 2 core, then those core will be 4.9G, if loading 4 then 4 core 4.6G, if load all then core will be 4.3G etc. But some CPU also allow set all core in full speed all-time in BIOS.

Link to comment

So I was able to run the server OCed on a fresh install of unraid. I took 3 random drives I had laying around to create an array and was able to start and stop it with out it failing. I also verified that the CPU was reaching 4.6GHz (the OC) in unraid. Now I just need to figure out what configuration is causing it to crash. Any ideas on where to look?

Link to comment

So I don't think it's the OC setting. The reason is that my current unraid config crashes however, a fresh install doesn't. I've also tested and I'm confident that OC is stable after a lot of testing in bare metal Win10. I think it's a issue with a config in my primary unraid install but I'm unsure what it might be. I might have narrowed it down as when I copied over network configs, drive assignments, drive settings, and share settings the server started to crash again.

Link to comment
2 hours ago, FumblingintheDark said:

The reason is that my current unraid config crashes however, a fresh install doesn't.

No, OC also a factor because without OC, old config run normal, otherwise it should no crash whatever what config.

 

I prefer know what OC have done, so may have some clues.

 

Or you may try stop application one by one in old config, or add application one by one on new setup to found out the sweet spot.

Edited by Benson
Link to comment

https://www.kitguru.net/components/leo-waldock/intel-core-i9-7980xe-extreme-edition-18-cores-of-overclocked-cpu-madness/2/

 

In page 2, OP also state most software fine except Adobe Premiere, would you try O.C. but stop at 4.2GHz ?? ( Power draw also reduce 100w )

 

At 4.6GHz all the cores appeared stable in the software we used with the notable exception of Adobe Premiere which caused the system to crash at 4.5GHz-4.6GHz and refused to open a Premiere Project at 4.3GHz-4.4GHz. Reducing the speed to 4.2GHz resulted in a system that ran Premiere perfectly.

 

36x100MHz, 1.000VID
37x100MHz, 1.030VID
38x100MHz, 1.050VID
39x100MHz, 1.077VID
40x100MHz, 1.100VID
41x100MHz, 1.124VID
42x100MHz, 1.148VID
43x100MHz, 1.175VID
44x100MHz, 1.203VID
45x100MHz, 1.203VID
46x100MHz, 1.203VID
4.7GHz system froze

 

Default clocks, 1.005VID, 275W total
36x100MHz, 1.000VID, 310W total
37x100MHz, 1.030VID, 330W total
38x100MHz, 1.050VID, 350W total
39x100MHz, 1.077VID, 365W total
40x100MHz, 1.100VID, 390W total
41x100MHz, 1.124VID, 415W total
42x100MHz, 1.148VID, 450W total
43x100MHz, 1.175VID, 485W total
44x100MHz, 1.203VID, 525W total
45x100MHz, 1.203VID, 540W total
46x100MHz, 1.203VID, 550W total

Edited by Benson
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.