Gigabyte X399 Designare EX Threadripper


Recommended Posts

1 minute ago, JWMutant said:

Let me know how it goes please.

Just pressed boot after setting bios settings, Will leave array running and VM on all night. will report tomorrow.

Btw the unstabilities i had, might be a docker problem a specific one.., don't know why. 
But after upgrading my machine. I tried running Handbrake, encoded a couple of clips but on 3rd the docker just restarted. 
Tried again. the same. i though it was the file. tried another. all good, tried the "bad" file again, encoded without problems.
between those failed encodes i had  server lockups or not responding on some things. after 3 days stopped handbrake from opening and is very stable for now on F12i. 


will let you know tomorrow for f12. 

  • Thanks 1
Link to comment

F12 seems solid, didnt have any problems. Only one time Windows VM didnt want to wake up after couple of hours on sleep. But host was fine. force stopped and started again fine.

With only ACS enabled from bios, i didnt need acs override from VM settings. everything is split in its own iommu.

The only thing i have problem is with mb audio. i can pass it to vm, its working, its in its own iommu group.
But vm logs says: Cannot reset device, depends on group xx which is not owned.
Had this on previous bios version also tho

Edited by skois
Link to comment
  • 3 weeks later...
On 12/22/2019 at 12:20 AM, JWMutant said:

So I assume F12 was more stable than 12i?

Seems rock solid, had some problems which was from the XMP ram profile after all. (had them before bios update also). disabled it and its fine.
I should sometime fine tune manually the ram

Link to comment
  • 2 months later...

Temperature question for anyone with this motherboard, my motherboard is reporting at 95.8C, which seems unlikely to be accurate.  I know the CPU temp has a 27C offset but haven't seen anything about the motherboard.  I ran sensors-detect and added this to my go file:

 

# modprobe for each sensor
modprobe k10temp
modprobe it87

 

Is there a fix for this?  Is 95.8 accurate? Not accurate but have to live with it?  In dynamix system temp, "it87 k10temp" is in Available Drivers, and the mb temp is labeled "k10temp - MB Temp.  k10temp CPU die and tdi are in the high 60's, I think it's clear that these are CPU sensors.  Tctl is in the mid 90's, which is just the annoying 27C offset.  So there aren't any obvious alternatives. 

Edited by bobobeastie
spelling
Link to comment

Having similar results here:

k10temp - Tdie - 60.8C

k10temp - Tctl - 87.8C

 

The Github for k10temp says:

"There is one temperature measurement value, available as temp1_input in sysfs. It is measured in degrees Celsius with a resolution of 1/8th degree. Please note that it is defined as a relative value; to quote the AMD manual: Tctl is the processor temperature control value, used by the platform to control cooling systems. Tctl is a non-physical temperature on an arbitrary scale measured in degrees. It does _not_ represent an actual physical temperature like die or case temperature. Instead, it specifies the processor temperature relative to the point at which the system must supply the maximum cooling for the processor's specified maximum case temperature and maximum thermal power dissipation. The maximum value for Tctl is available in the file temp1_max. If the BIOS has enabled hardware temperature control, the threshold at which the processor will throttle itself to avoid damage is available in temp1_crit and temp1_crit_hyst."

 

To see the difference from the temp1_input and temp1_max I ran:

find /sys -name temp1_input

 

My sensors are here:

/sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon0/temp1_input
/sys/devices/pci0000:00/0000:00:19.3/hwmon/hwmon1/temp1_input
 

cat /sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon0/temp1_input
61625

 

cat /sys/devices/pci0000:00/0000:00:19.3/hwmon/hwmon1/temp1_input
62000

 

cat /sys/devices/pci0000:00/0000:00:18.3/hwmon/hwmon0/temp1_max
70000
 

cat /sys/devices/pci0000:00/0000:00:19.3/hwmon/hwmon1/temp1_max
70000
 

Because its a relative reading I see that I'm about 88% to temp max (Norco 4224 case with air cooling)

I must not have enabled hardware temp control because I have no temp1_crit and temp1_crit_hyst

Next time I bring server down I will look for it. 

 

 

Edited by guru69
Link to comment

@guru69I have the same chassis and cooler and replaced the stock fans with some kid of Noctuas, which you did as well.  I don't see that you mentioned the motherboard temp, is it similar to mine?  If not what do your settings look like?  I'm assuming a spot on the motherboard actually being 95.8C is pretty unlikely, or if it was headed in that direction it would shut down before getting there. I suppose rebooting and checking in the bios would help, but I'm in the middle of a parity check, which I'm doing after a parity rebuild, because my 2 parity drvies decided to become disabled.

 

I'm not so worried about the CPU because of the 27C Tctl offset, and it looks like Tdie or CPU temp are without the offset, and they seem fine.  My CPU temp is reading 68.7, and I am running a handbrake docker that's running full tilt, not pinned to any core.  My best guess that the motherboard temp of 95.8 is also getting an offset for some reason, because that brings it back down to the same range as the CPU.

Link to comment
  • 2 weeks later...
On 3/11/2020 at 11:55 AM, bobobeastie said:

@guru69I have the same chassis and cooler and replaced the stock fans with some kid of Noctuas, which you did as well.  I don't see that you mentioned the motherboard temp, is it similar to mine?  If not what do your settings look like?  I'm assuming a spot on the motherboard actually being 95.8C is pretty unlikely, or if it was headed in that direction it would shut down before getting there. I suppose rebooting and checking in the bios would help, but I'm in the middle of a parity check, which I'm doing after a parity rebuild, because my 2 parity drvies decided to become disabled.

 

I'm not so worried about the CPU because of the 27C Tctl offset, and it looks like Tdie or CPU temp are without the offset, and they seem fine.  My CPU temp is reading 68.7, and I am running a handbrake docker that's running full tilt, not pinned to any core.  My best guess that the motherboard temp of 95.8 is also getting an offset for some reason, because that brings it back down to the same range as the CPU.

Temperature on this board wasn't available at all when I started, so I have kind of been driving blind until now 🙂 I replaced the Norco center fan plate with the 120" one, dumped their fans, put Noctua everywhere and used liquid metal on the CPU and hoped for the best. Here are what mine look like, I just assumed the CPU will be hotter than the mobo when picking, maybe have the sensors switched? I haven't had any thermal shutdown even when the room reached 90 one day (A/C died) and this server has been running around the clock for over a year, so guess its ok. I live in the deep south, always hot, so I have a 5,000 BTU window AC running cool/max 24/7 about 2 feet my server. I smoke one of these window AC units about every 2 years, but I buy a few of them when on sale.

 

unraid-temp.png.9b86cdb7d0163124d9561dc797868fe4.png

 

unraid-temp2.png.20d19e012cabb0500f3656967364bd3e.png

 

Link to comment
  • 1 month later...
  • 3 weeks later...

Anyone else having reliability issues with this motherboard? I've just had a second one die on me. Both failed during a reboot, removed everything in an attempt to get it to POST but got nothing.  I must have built more than 20 PCs but never had such problems before. 

Annoying 'cause I had everything working as I wanted.

 

I'm now loosing confidence in this motherboard and will be looking for a refund rather than a replacement.

Link to comment
1 minute ago, SimonG said:

Anyone else having reliability issues with this motherboard? I've just had a second one die on me. Both failed during a reboot, removed everything in an attempt to get it to POST but got nothing.  I must have built more than 20 PCs but never had such problems before. 

Annoying 'cause I had everything working as I wanted.

 

I'm now loosing confidence in this motherboard and will be looking for a refund rather than a replacement.

Same board for a 15 months now, never been an issue, solid as a rock.

 

Model: XCASE24

M/B: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX-CF Version x.x - s/n: Default string

BIOS: American Megatrends Inc. Version F12. Dated: 12/11/2019

CPU: AMD Ryzen Threadripper 2950X 16-Core @ 3500 MHz

HVM: Enabled

IOMMU: Enabled

Cache: 1536 KiB, 8192 KiB, 32768 KiB

Memory: 48 GiB DDR4 (max. installable capacity 512 GiB)

Network: bond0: fault-tolerance (active-backup), mtu 9000
 eth0: 10000 Mbps, full duplex, mtu 9000

Kernel: Linux 4.19.107-Unraid x86_64

OpenSSL: 1.1.1d

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.