Anybody planning a Ryzen build?

March 26, 20179 yr

Update:

UnRaid was crashing randomly every 12-36 hrs or so.

Since a recent Windows 10 patch unraid has been stable for 4 days and counting.

Quote

March 26, 20179 yr

16 minutes ago, Beancounter said:

Update:

UnRaid was crashing randomly every 12-36 hrs or so.

Since a recent Windows 10 patch unraid has been stable for 4 days and counting.

That would make it sound like a problem with your Win10 VM was affecting unRAID and causing it to crash. That seems odd. I wonder if the issue is truly resolved.

Quote

March 26, 20179 yr

On 3/26/2017 at 5:12 PM, chadjj said:

Yes, their are a fair number of differences between our motherboards and a lot of common chipsets, LAN etc. I have run memtest86 and did hear back from limetech support and they are having the same issues and thought it may have been a memory problem and were considering an RMA. They have gSkill and I have Corsair so the assumption is that we had similar RAM with similar issues. With RAM as a differentiator and my RAM reporting 100% ok it is unlikely we both have bad RAM. My kit is brand new for this build as well.

Wow, so that is big news, Lime-Tech is having issues too.

While I had memtest issues with my original memory, those problems didn't manifest in Windows, even with heavy stability testing. I've since replaced my memory with ASRock QVL'd memory, and have completed 16 passes (and counting, it's been running for several days now) of Memtest86 without a single issue. Yet I still get frequent crashes in unRAID. Only in unRAID.

I just reread through this entire topic and gathered the following info. If anyone has any updates, please share:

TType85 has an ASUS B350 Prime, and has not reported any issues [presumed to be running Windows VM]

hblockx has a Gigabyte X370 Gaming 5 and says it is running great [presumed to be running Windows VM's on his 2 gamers 1 pc]

jonp (Lime-Tech) has an ASRock X370 Killer SLI/ac, and appears to be having issues but has not shared any info with us [unclear if they are running any Windows VM's]

Beancounter has an ASUS Prime B350 Plus, was having crashes but is running fine [running patched Win10 VM]

Naiqus has a Gigabyte X370 Gaming 5, and has not reported any issues [multiple VM's, did not list OS's]

puddleglum has an ASRock AB350 Fatal1ty, and has not reported any issues [running Win10 VM]

Bureaucromancer has an ASUS Crosshair VI Hero (X370 based), has had 2 crashes when idle and no VM's, no crashes with Win VM's [both Linux and Windows VM's]

johnnie.black borrowed for a day an ASUS Prime B350M-A, and did not report any issues [did not report running any VM's]

ufopinball has an ASUS Prime X370-Pro and has had 1 crash when only running a LInux VM [now running a Win VM now to test]

chadjj has an ASUS Prime X370-Pro and is having issues similar to mine [ran a Windows VM before, but not now]

I have an ASRock X370 Fatal1ty Professional Gamine, and I am having lots of issues in unRAID only [No VM's]

JIM2003 has an ASUS Prime B350M-A, and had identical issues [No VM's]

JIM2003 also has an ASRock Fatal1ty AB350 Gaming K4, and had identical issues [No VM's]

Akio has ordered an ASUS Crosshair VI Hero, having assembly challenges with his waterblock cooler

urbanracer34 ~~has~~ wants an ASRock X370 Taichi, not known if he bought it yet

Reading through the list (and everyone's posts), a few things popped out at me.

First, so far only X370 chipset boards have been reported as having any problems. So we might be looking at a problem specific to the X370, and not present on the B350 or A320 (which no one here seems to have purchased). The one exception to this is Beancounter was experiencing issue's on his B350, but those may have been caused by his Win10 VM, and recent Windows patches have fixed it.

Second, seems like almost everyone immediately set up a Windows VM on their unRAID box. I seem to be one of the few that did not (I dual booted into Win10, but don't have a Win10 VM set up inside unRAID).

This second observation becomes more interesting when you note that both chadjj and ufopinball are running the ASUS Prime X370-Pro, and chadjj is having issues and ufopinball is not. ufopinball does have a Win10 VM running. I don't know if chadjj ever got as far as a Windows VM, but his current testing has paired back the hardware and array to perform troubleshooting, so it seems he is not running one now.

Add in Beancounter's note above that he was experiencing crashes (on a B350) that went away after his Win10 VM was recently patched.

So now I am wondering, is running a Windows 10 VM somehow fixing the issue? And if so, how?

-Paul

Edited March 28, 20179 yr by Pauven

Quote

March 26, 20179 yr

Having just found this topic, I figure I would throw in my frustrations as well.

I got the Asus Prime B350M-A two weeks ago, and ever since setting it up, it would continually lock up hard, requiring a hard reset every 16-24 hours. I tested the ram, and it went 48 hours without issue. I tried copying all my data off, and doing a clean unraid setup, and it still locked up. I went so far as to swap out the motherboard with the ASRock Fatal1ty AB350 Gaming K4, and it is still locking up. I never setup any VM's. I did install windows on an SSD in the system, just to see if Windows would crash, and it was fine. This is really starting to get irritating, going on almost 3 weeks of issues.

Quote

March 26, 20179 yr

6 minutes ago, JIM2003 said:

Having just found this topic, I figure I would throw in my frustrations as well.

I got the Asus Prime B350M-A two weeks ago, and ever since setting it up, it would continually lock up hard, requiring a hard reset every 16-24 hours. I tested the ram, and it went 48 hours without issue. I tried copying all my data off, and doing a clean unraid setup, and it still locked up. I went so far as to swap out the motherboard with the ASRock Fatal1ty AB350 Gaming K4, and it is still locking up. I never setup any VM's. I did install windows on an SSD in the system, just to see if Windows would crash, and it was fine. This is really starting to get irritating, going on almost 3 weeks of issues.

Hi JIM2003, I understand your frustration exactly. Thanks for posting your results here, this helps.

I'm really surprised you've had issues with two different motherboards, and both being B350 based. I'm guessing this is not just an X370 issue.

I think it was very notable that you never set up any VM's, just like me. I need to read through and see which users did Windows VM's to see if the pattern holds true.

For those that are not having issues and are running Windows VM's, it would be helpful if you could fresh boot and not start the VM's and see if any problems crop up.

-Paul

Quote

March 27, 20179 yr

This is to be expected with bleeding edge hardware, it takes time for the bugs to be truly worked out of the software including the Linux Kernels and Virtualization managers. I'm sure everything will be worked out, it's just a matter of time.

Quote

March 27, 20179 yr

I just updated my post above to clarify what VM's, if any, members reported running on Ryzen. Some of it is a little vague, so if anyone has updates, please share.

In general, it seems that those running VM's, probably of the Windows variety, are not having issues.

Those that have definitely not run a VM, myself and JIM2003, are having issues.

chadjj was working on getting a Windows VM running, don't know what flavor, and don't know if he had issues with the VM running, or with it stopped. His current troubleshooting sounds like he is not running it now, and he is having issues.

jonp hasn't shared any details on the troubles they've faced. And while he did a nice write up on their motherboard for use in a 2 gamers 1 pc type build, Jon didn't go into any detail about actually running a Windows VM. Maybe Jon will fill in the blanks for us.

And Beancounter was having issues, but a properly patched Win10 VM seems to have his system running great.

-Paul

Quote

March 27, 20179 yr

46 minutes ago, Pauven said:

For those that are not having issues and are running Windows VM's, it would be helpful if you could fresh boot and not start the VM's and see if any problems crop up.

Well, for what it's worth ... I just upgraded my main server to the new Ryzen build. Since signature lines change over time, here's what I have:

Cortex • unRAID Server Pro 6.3.2 (Dual Parity) • ASUS Prime X370-PRO MB • AMD Ryzen 7 1800X 8-Core 3.6GHz • Crucial CT16G4DFD8213 DDR4 2133 (64GB) • Seasonic SS-660XP2 660W 80 PLUS PLATINUM • Asus Radeon 6450 1GB (Desktop) Graphics Card • 12 x Seagate ST4000DM000 4TB 5400rpm (40TB) • 2 x SAMSUNG 850 EVO 1TB SSD (Cache) • Docker: PlexTV • VM: CentOS (LAMP server)

Basically, I took screen shots of everything I had, and changed all VMs/Dockers to not auto-start. I pulled the old MB and put in the new one. Since I now have 8 SATA ports on the MB, I removed one of the SATA controller cards. I added the GPU, and also included a USB card that will be used later for Windows 10 VM support ... that is a phase 2 item, though.

The main issue I had was system instability because the Dynamix System Temp plugin was looking for sensors from the old MB. I removed the plugin and things started working fine. I commented in the plugin thread asking for support for the new MB/chipset because that's apparently missing at the moment. I don't know if anyone else is running this plugin?

For now, my system reports 1 hour, 32 minutes of uptime. The VM running is a LAMP server, not a Windows 10 VM. At present, it's not convenient to turn off the LAMP VM. The best I can do is continue to report on system stability as things move forward.

FWIW, I'm still using slower memory than everyone else. Dunno how much difference that makes.

Will check in again later...

- Bill

Quote

March 27, 20179 yr

4 minutes ago, ufopinball said:

The main issue I had was system instability because the Dynamix System Temp plugin was looking for sensors from the old MB. I removed the plugin and things started working fine. I commented in the plugin thread asking for support for the new MB/chipset because that's apparently missing at the moment. I don't know if anyone else is running this plugin?

Yes, I was running this plugin just fine on my mb. The driver for my mb is nct6779. I installed Perl (from the Nerdpack), and the Dynamix Fans and the System Temp. Either the Fans or the System Temp (don't recall which), has the ability to search for the correct driver, but only if you have Perl installed. Once it finds it, you have to both "Save" and "Load" it.

Some of the voltages appear wrong (like CPU and RAM), and my CPU and MB temps appeared to be reversed in the defaults, but otherwise it worked, and the fan speeds were accurate.

I understand regarding your LAMP server, just happy you're up and running. Luckily there's lots of other users around, maybe one of them can try the test.

Myself, I plan to install a Win10 VM...

-Paul

Quote

March 27, 20179 yr

2 hours ago, ufopinball said:

Well, for what it's worth ... I just upgraded my main server to the new Ryzen build. Since signature lines change over time, here's what I have:

Cortex • unRAID Server Pro 6.3.2 (Dual Parity) • ASUS Prime X370-PRO MB • AMD Ryzen 7 1800X 8-Core 3.6GHz • Crucial CT16G4DFD8213 DDR4 2133 (64GB) • Seasonic SS-660XP2 660W 80 PLUS PLATINUM • Asus Radeon 6450 1GB (Desktop) Graphics Card • 12 x Seagate ST4000DM000 4TB 5400rpm (40TB) • 2 x SAMSUNG 850 EVO 1TB SSD (Cache) • Docker: PlexTV • VM: CentOS (LAMP server)

Basically, I took screen shots of everything I had, and changed all VMs/Dockers to not auto-start. I pulled the old MB and put in the new one. Since I now have 8 SATA ports on the MB, I removed one of the SATA controller cards. I added the GPU, and also included a USB card that will be used later for Windows 10 VM support ... that is a phase 2 item, though.

The main issue I had was system instability because the Dynamix System Temp plugin was looking for sensors from the old MB. I removed the plugin and things started working fine. I commented in the plugin thread asking for support for the new MB/chipset because that's apparently missing at the moment. I don't know if anyone else is running this plugin?

For now, my system reports 1 hour, 32 minutes of uptime. The VM running is a LAMP server, not a Windows 10 VM. At present, it's not convenient to turn off the LAMP VM. The best I can do is continue to report on system stability as things move forward.

FWIW, I'm still using slower memory than everyone else. Dunno how much difference that makes.

Will check in again later...

- Bill

Hi Bill,

I had the same issue and removed the Dynamix System Temp. I was getting an error on boot and needed to load safe mode and remove the plug in. It resolved that issue but then ran into the system locking up requiring hard reboot. Hopefully they update the monitor with the newer chipsets.

Thanks,

Chad

Quote

March 27, 20179 yr

2 hours ago, Pauven said:

I just updated my post above to clarify what VM's, if any, members reported running on Ryzen. Some of it is a little vague, so if anyone has updates, please share.

In general, it seems that those running VM's, probably of the Windows variety, are not having issues.

Those that have definitely not run a VM, myself and JIM2003, are having issues.

chadjj was working on getting a Windows VM running, don't know what flavor, and don't know if he had issues with the VM running, or with it stopped. His current troubleshooting sounds like he is not running it now, and he is having issues.

jonp hasn't shared any details on the troubles they've faced. And while he did a nice write up on their motherboard for use in a 2 gamers 1 pc type build, Jon didn't go into any detail about actually running a Windows VM. Maybe Jon will fill in the blanks for us.

And Beancounter was having issues, but a properly patched Win10 VM seems to have his system running great.

-Paul

Hi Paul,

It took several attempts to get a Win 10 VM working a few days ago. I was actually compiling screenshots to share in the forums on how it would hang at the UEFI and wouldn't load the .iso. I had past Windows 10 .iso's at different dates in time and then pulled down a brand new one from Microsoft and that one worked. I don't know what was different from the other 2? Maybe they were corrupt over time not sure.

So yes, I have had Windows 10 working but in trying to diagnose I started paring back all services running on unRAID. Troubleshooting with the fewest services running in the background 1 at a time to pinpoint the cause. Like many of us coming up with few answers.

Uptime 12hrs, 55mins. I have not changed anything in the environment since the last lock up but yet to hit 24hrs. Last lockup was at 5:55am so if it happens again at that time I will dig into what happened within that window.

PS - To clarify my communication with the LimeTech team, they asked me what RAM I was using as they were using Gskill and had similar issues and were considering RMA'ing the RAM thinking it could be defective. I am making the assumption that it's unlikely we are having similar issues with different RAM and both coincidentally bad so theirs is likely ok as I'm not showing any memtest errors. I would find it very strange that all members of the unRAID community are reporting bad memory.

Thanks,

Chad

Quote

March 27, 20179 yr

3 hours ago, Pauven said:

Yes, I was running this plugin just fine on my mb. The driver for my mb is nct6779. I installed Perl (from the Nerdpack), and the Dynamix Fans and the System Temp. Either the Fans or the System Temp (don't recall which), has the ability to search for the correct driver, but only if you have Perl installed. Once it finds it, you have to both "Save" and "Load" it.

Some of the voltages appear wrong (like CPU and RAM), and my CPU and MB temps appeared to be reversed in the defaults, but otherwise it worked, and the fan speeds were accurate.

I understand regarding your LAMP server, just happy you're up and running. Luckily there's lots of other users around, maybe one of them can try the test.

Myself, I plan to install a Win10 VM...

-Paul

Okay, so I have installed Dynamix System Temp, and just added Dynamix Auto Fan Control. Both include "Detect" buttons, but neither button returns any information.

I have NerdPack installed and added Perl. Here's what it says at the console:

root@Cortex:~# perl -version

This is perl 5, version 24, subversion 0 (v5.24.0) built for x86_64-linux-thread-multi

Copyright 1987-2016, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

Trying to follow this wiki page:

http://lime-technology.com/wiki/index.php/Setting_up_CPU_and_board_temperature_sensing

There is no mention of having to reboot, and it seems like Perl is available so I'm not sure what I may be doing wrong or differently that "Detect" doesn't detect anything?

Running "sensors-detect" at the command line returns the following:

Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): y
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               No
Trying family `ITE'...                                      Yes
Found unknown chip with ID 0x8665
    (logical device 4 has address 0x290, could be sensors)
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'...               No
Trying family `SMSC'...                                     No
Trying family `VIA/Winbond/Nuvoton/Fintek'...               No
Trying family `ITE'...                                      No

If I type in IT8665E manually and click "Load drivers", it doesn't report an error, but none of the other pulldowns offer any new options.

Otherwise, uptime is listed now at 5 hours. LAMP VM is running smoothly. I'm happy enough for now, but would still like to get the sensors running properly.

- Bill

Quote

March 27, 20179 yr

1 hour ago, ufopinball said:

Otherwise, uptime is listed now at 5 hours. LAMP VM is running smoothly. I'm happy enough for now, but would still like to get the sensors running properly.

- Bill

Well, for better or worse, my "Cortex" server crashed after approximately 5 hours, 55 minutes. The server wasn't doing much ... LAMP VM, and Plex running in a Docker, serving up media for my kid.

Since I was messing around with Dynamix System Temp earlier, I went ahead and deleted "drivers.conf". I'll also start a Windows 10 VM and see how things go. This is an older test VM, and was going to be deleted so I can work on a fresh Win10 VM moving forward. This one uses VNC because I don't have everything set up for the GPU pass-through yet.

I guess you can add me to the "0 days since the last crash" list. Be really weird if the Windows 10 VM makes the difference...

- Bill

Quote

March 27, 20179 yr

******Its been a roller coaster for me. If your not using the same board and face the same code 8 error please skip reading my post as its long.*****

Got mine on friday and its now monday posting this, but ran into problems straight out the box (also mother's day in uk so family time and a de - frustrating away from this).

-Asus crosshair vi

-Ryzen 1700x

-Corair vengenance 3000 16gb (2x8)

-Exwb supremacy evo (custom loop)

Had to drain the loop and rebuild cpu block and run thta a few hours for leak test.

The evo came with the am4 backplate and cpu block plate . Changed out the internal block parts to am4 as described in the ekwb manual. Taken off the original back plate and simply holding down the block to test for post.

I knew there power getting to the board due to the rgbs . Tried to start and got nothing.

Reseated the cpu and ram. Still nothing. I know theres memory compatibility

Downloaded a new bios 1001 and back flashed the bios to that.

Yay i got a post boot!

Used the full ekwb rubber groment between mobo and backplate and tried that . Nothing happened. You know that feeling of chill down your spine when " death of boards" is lurking in the corner of the room.

Ok so reading around a bit i find out that the middle piece of groment once removed (against offical instructions) works. So i tried that and yes it posted only to find a dreaded code 8!

More digging around to find out that trying the new offical 0902 bios would solve this... yeah flashed it and cleared cmos. Still a code 8.

Loosening the cpu block and reseating cpu fixed that. Tightening down the block lightly gave me a uefi to finally work with!

Ran the qfans to test that and ser my loop to quiet without any issues. Reboot and works. Temps seem to be initially around 35'C idling. I ran the auto overclock and it boosted up 12/14% and rebooted but failed to post.

Assuming its ram still i cleared cmos and tried again managed to test a little more but finding it just wasnt having the ram upto 2933.

I called it quits. Cleared settings and set my fans manually. Put the system together and it code 8's again. I obviously over tightening the cpu block To 50%!!! Loosening it back off and not reseating the cpu works....

But after a series of quarter turns and power on tests ive getting it to say around 40% down the threads.

Again sorry for long post but hope it helps you guys.

****Summarise *****

Asus code 8 = cpu. Try reseating and lifting the cpu retention pressure via heatsink screws etc.

No post?

Check the 8pin cable again. If led is on your cable/8pin socket could be faulty.

Still no post?

Flash bios to newest. Format fat32 usb drive, rename the bios file to C6H.cap

While system is off plug in your flash. Press and hold the bios button on rear panel for 2 secs untill it blinks. Upon that release the button and wait.

3 blinks means it errors and doesnt like the bios file. Continuous blinking getting faster means its working and doing its job. Once that finishes removed usb. Press clear cmos and boot her up.

******

Edited March 27, 20179 yr by Akio
Typo

Quote

March 27, 20179 yr

10 hours ago, chadjj said:

Uptime 12hrs, 55mins. I have not changed anything in the environment since the last lock up but yet to hit 24hrs. Last lockup was at 5:55am so if it happens again at that time I will dig into what happened within that window.

That's without a Win10 VM running, correct?

10 hours ago, ufopinball said:

If I type in IT8665E manually and click "Load drivers", it doesn't report an error, but none of the other pulldowns offer any new options.

Looks like you have the IT8665E sensor chip, and sensors-detect is failing to find the right driver for it. Had it detected them, you would have been prompted to install them. You can try to manually modprobe the drivers from the command line, since you know what they are, assuming they are included in unRAID 6.3.2. In all likelihood, you're playing the waiting game for a newer unRAID version.

9 hours ago, ufopinball said:

Well, for better or worse, my "Cortex" server crashed after approximately 5 hours, 55 minutes. The server wasn't doing much ... LAMP VM, and Plex running in a Docker, serving up media for my kid.

Oh no, another one bites the dust. I felt lonely when it was just me, but now this is getting to be a crowded room.

All of my crashes have been when the server was pretty much doing absolutely nothing. I've wondered if idle states are contributing to the problem. It's also crossed my mind that a Win10 VM is helping simply because it is giving the unRAID OS some work to do...

5 hours ago, Akio said:

Asus code 8 = cpu. Try reseating and lifting the cpu retention pressure via heatsink screws etc.

I understand the desire for water cooling, but it sounds like something is wrong with your waterblock retention mechanism. I've never seen a cooler mounting that allowed it to be over tightened. This is extra concerning since you indicated you have to remove the specified gasket in order to get it to work at all, this makes me think you have some parts wrong. You should talk to the manufacturer to make sure you have all the right components and are assembling it correctly.

You also need to ask yourself, is water cooling worth it? With my Noctua NH-C14S, running dual 140mm fans, at full speed they are fairly quiet and my 1800X never went over 38c during benchmarking/stress testing. At idle it was closer to 30c or less. With the fans running in silent mode, I still pretty much stay below 40-45c at peak. It's amazing for what it does, and it isn't even the best cooler Noctua sells - this was just the biggest that would fit my server case.

I had a DIY water cooling rig years ago (maybe back in 2003) and it was great until it leaked. Funny how water and electronics don't mix so well, let's out the magic smoke that powers the processor. I swore I'd never do it again. Though I did use an all-in-one Corsair H100 on my office build 5 years ago. It still works okay, but I never felt it lived up to the billing.

-Paul

Quote

March 27, 20179 yr

I updated my list above with ufopinball's crash, and added some color coding to highlight problems: https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?do=findComment&comment=548546

So there are now at least 11 Ryzen unRAID boxes currently operational (not counting johnnie.black's system he borrowed for a day, or Akio's system since he's still working out his cooling setup).

Of those 11, 6 have reported issues with unRAID (well, technically 5 have reported, but I'm adding in jonp/Lime-Tech rumored issues).

1 of those 6 that reported isssues is Beancounter, who was having random crashes every 12-36hrs, and he indicated his problems went away after he patched his Win10 VM.

Of the 5 that have not reported issues, it seems they may be running Windows VM's, though in a few cases there's a lack of detail in their posts to make that determination. It could also be that they are experiencing issues and just haven't reported back.

Everyone that has a Ryzen system, please update this thread on your current status with unRAID. Have you had any crashes? Are you running Windows VM's? Have you had any crashes while running Windows VM's? And any other detail you think will help identify the problem.

Since my system crashes so easily on unRAID, often within an hour or two, and I've never seen it go beyond 14 hours, it is an excellent test bed for determining if a Windows10 VM alleviates/resolves the issue. I plan to get it installed in the next day or two, I'm a little busy with work at the moment.

-Paul

Quote

March 27, 20179 yr

21 minutes ago, Pauven said:

I updated my list above with ufopinball's crash, and added some color coding to highlight problems: https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?do=findComment&comment=548546

So there are now at least 11 Ryzen unRAID boxes currently operational (not counting johnnie.black's system he borrowed for a day, or Akio's system since he's still working out his cooling setup).

Of those 11, 6 have reported issues with unRAID (well, technically 5 have reported, but I'm adding in jonp/Lime-Tech rumored issues).

1 of those 6 that reported isssues is Beancounter, who was having random crashes every 12-36hrs, and he indicated his problems went away after he patched his Win10 VM.

Of the 5 that have not reported issues, it seems they may be running Windows VM's, though in a few cases there's a lack of detail in their posts to make that determination. It could also be that they are experiencing issues and just haven't reported back.

Everyone that has a Ryzen system, please update this thread on your current status with unRAID. Have you had any crashes? Are you running Windows VM's? Have you had any crashes while running Windows VM's? And any other detail you think will help identify the problem.

Since my system crashes so easily on unRAID, often within an hour or two, and I've never seen it go beyond 14 hours, it is an excellent test bed for determining if a Windows10 VM alleviates/resolves the issue. I plan to get it installed in the next day or two, I'm a little busy with work at the moment.

-Paul

Seems like a contributing cause might be any power saving options configurable within the BIOS. Has anyone tried disabling all power saving features and load test Unraid with and without a VM running? I doubt a VM running W10 can help with stability unless the issue is bourne out of resource allocation or lack of. I have an 1800x and Asus prime waiting to be built but probably going to to wait until the 6.4betas to hit the masses.

Quote

March 27, 20179 yr

Thought I would share my latest crash, it was unique.

I was working on getting a Win10 VM installed, and was downloading the VirtIO drivers from the Settings>VM Manager page. The server crashed right around the time the download completed, after only 37 minutes of uptime.

I went down to the server to restart it, and I saw the "Matrix" had come to life on my console screen! Whatever was being displayed was scrolling so fast, it looked like the infamous green char waterfall. I think it looked like hexidecimal memory values, but I also thought I saw some readable text in there too. Purely a guess on what I was seeing though, it was just a blur.

I've just gotten Win10 VM installed, ~~currently with nothing passed through, not even CPU~~ scratch that, just noticed that the Ryzen CPU was passed through, must have been a default for Window10 VM template.. I will test in this config for a while, and if it crashes I will do my next test with some more hardware passed through. Hopefully my piecemeal approach will zero in on what (if anything) part of the Win10 VM helps prevent unRAID from crashing.

Which leads me to the next question for our Ryzen gang: For those running Win10 VM's and not having any unRAID issues, please clarify how you configured the VM with regards to passing hardware through.

-Paul

Edited March 27, 20179 yr by Pauven

Quote

March 27, 20179 yr

2 hours ago, Pauven said:

Which leads me to the next question for our Ryzen gang: For those running Win10 VM's and not having any unRAID issues, please clarify how you configured the VM with regards to passing hardware through.

Quick update ... so mine crashed again last night within an hour. I looked and although I had no sensors configured, there still seems to be some issue with the Dynamix System Temp plugin. When I went to the Dashboard tab, unRAID showed all 16 CPUs available, but only the first three showed % activity ... the rest were blank when they should be showing 0%. I had seen this before prior to removing Dynamix System Temp the first time. I was also seeing oddball behavior in the System Log:

Mar 26 22:46:01 Cortex kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Mar 26 22:46:01 Cortex kernel:  3-...: (1 GPs behind) idle=c09/140000000000000/0 softirq=63115/64077 fqs=100075
Mar 26 22:46:01 Cortex kernel:  (detected by 8, t=420012 jiffies, g=-6, c=-7, q=1)
Mar 26 22:46:01 Cortex kernel: Task dump for CPU 3:
Mar 26 22:46:01 Cortex kernel: CPU 0/KVM       R  running task        0  9908      1 0x00000008
Mar 26 22:46:01 Cortex kernel: ffff880f77028cc0 ffff88101ecd7a80 ffffc9000cc13cb0 ffffffffa0164cce
Mar 26 22:46:01 Cortex kernel: ffffc9000cc13cc0 ffffffffa0182ccd ffff880f773b0000 ffffc9000cc13cd8
Mar 26 22:46:01 Cortex kernel: 0000000000426123 ffff880f773b0000 00000222da289e1f ffffc9000cc13cf8
Mar 26 22:46:01 Cortex kernel: Call Trace:
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0164cce>] ? kvm_get_rflags+0x15/0x26 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0182ccd>] ? kvm_apic_has_interrupt+0x3e/0x8f [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0182d06>] ? kvm_apic_has_interrupt+0x77/0x8f [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0182ea8>] ? kvm_get_apic_interrupt+0xf3/0x1af [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffff813ad654>] ? __delay+0xa/0xc
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0182635>] ? wait_lapic_expire+0xdf/0xe4 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa016e602>] ? kvm_arch_vcpu_ioctl_run+0x1e2/0x1165 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa13acaa5>] ? svm_vcpu_load+0xe1/0xe8 [kvm_amd]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0168fc8>] ? kvm_arch_vcpu_load+0xea/0x1a0 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa015ed6a>] ? kvm_vcpu_ioctl+0x178/0x499 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffffa0161273>] ? kvm_vm_ioctl+0x3aa/0x6d2 [kvm]
Mar 26 22:46:01 Cortex kernel: [<ffffffff8112fe72>] ? vfs_ioctl+0x13/0x2f
Mar 26 22:46:01 Cortex kernel: [<ffffffff811303a2>] ? do_vfs_ioctl+0x49c/0x50a
Mar 26 22:46:01 Cortex kernel: [<ffffffff81138f7f>] ? __fget+0x72/0x7e
Mar 26 22:46:01 Cortex kernel: [<ffffffff8113044e>] ? SyS_ioctl+0x3e/0x5c
Mar 26 22:46:01 Cortex kernel: [<ffffffff8167d2b7>] ? entry_SYSCALL_64_fastpath+0x1a/0xa9

So I removed both System Temp and Autofan, and rebooted. For now, I'm leaving the temp & fan plugins unloaded until we get either a plugin or OS update. In my opinion, this is at least one known problem.

Next, I deleted my old test Win10 VM and started a new one. Full install + updates, then shut it down for the night. System ran for ~10 hours, but ran into problems in the morning when I tried to load a webpage from the LAMP VM. Nothing unusual in the log, in fact nothing at all posted around the time of the crash. I'd say this seems to point towards "system idle" being the culprit, so another known problem. Reboot...

Now I have completed the Windows 10 VM configuration to pass through the GPU and USB cards. I have this VM loaded with the Task Manager open to the "Performance" tab, configured to show Logical Processors (right-click on the graph, "Change graph to", then "Logical Processors"). As long as it's constantly updating the screen, it has at least something to do. Been running for 2 hours and 30 minutes, will let it go and hope this keeps things stable.

- Bill

Quote

March 27, 20179 yr

45 minutes ago, ufopinball said:

Quick update ... so mine crashed again last night within an hour. I looked and although I had no sensors configured, there still seems to be some issue with the Dynamix System Temp plugin. When I went to the Dashboard tab, unRAID showed all 16 CPUs available, but only the first three showed % activity ... the rest were blank when they should be showing 0%. I had seen this before prior to removing Dynamix System Temp the first time. I was also seeing oddball behavior in the System Log:

I would say the sensors issues you are describing are motherboard specific. My motherboard has different sensors. I've also never seen the bad CPU core info on the Dashboard, but I'll look more closely for this behavior, perhaps I just missed it. I've also not seen log entries like those.

With my Win10 VM running, passing through only the Ryzen CPU, and connected through VNC, my server crashed within 2.5 hours. Looks like this may not be the homerun I was hoping for.

Like Bill, I had the Task Manager open showing the Performance tab.

I then enabled PCIe ACS override, and while it moved some groups around, I wasn't able to pass through the video card yet (only 1 installed, need to pop in a 2nd). The only thing I could pass through was an Intel USB device, which turned out to be the Bluetooth adapter, I see it now in Windows. Doubtful that passing the Bluetooth adapter through will make a difference, but perhaps ACS override will.

Once it crashes again, I'll have to do some hardware reconfig before the next test.

-Paul

Quote

March 27, 20179 yr

Before I start my Ryzen build, has anyone gotten the unraid S3 plugin and WOL to work properly with Ryzen? If you can shared your build, that would be much appreciated.

Thanks.

Quote

March 28, 20179 yr

As far as the possibility of a Windows VM fixing things I've actually been wondering that myself... I've been getting rare crashes on an idle machine, but so far it SEEMS to only happen if I shut down my windows VM, if I just let that idle I've been up for over 48 hours. Will post if I get a proper set of logs from a crash.

Quote

March 28, 20179 yr

19 hours ago, Pauven said:

I understand the desire for water cooling, but it sounds like something is wrong with your waterblock retention mechanism. I've never seen a cooler mounting that allowed it to be over tightened. This is extra concerning since you indicated you have to remove the specified gasket in order to get it to work at all, this makes me think you have some parts wrong. You should talk to the manufacturer to make sure you have all the right components and are assembling it correctly.

You also need to ask yourself, is water cooling worth it? With my Noctua NH-C14S, running dual 140mm fans, at full speed they are fairly quiet and my 1800X never went over 38c during benchmarking/stress testing. At idle it was closer to 30c or less. With the fans running in silent mode, I still pretty much stay below 40-45c at peak. It's amazing for what it does, and it isn't even the best cooler Noctua sells - this was just the biggest that would fit my server case.

I had a DIY water cooling rig years ago (maybe back in 2003) and it was great until it leaked. Funny how water and electronics don't mix so well, let's out the magic smoke that powers the processor. I swore I'd never do it again. Though I did use an all-in-one Corsair H100 on my office build 5 years ago. It still works okay, but I never felt it lived up to the billing.

-Paul

I can assure you im doing as it says in the book https://www.ekwb.com/shop/EK-IM/EK-IM-3831109800065.pdf While i'm waiting for a response from EKWB i've been testing on bare metal win10 to see how its handling asus software. Noticed two things

Firstly the 5way optimization came out with a 4.044ghz overclock and running fairly quiet for 18hrs avg 45-50C temp (the fan software isnt working correctly.. The 2nd attempt gave a 3.8ghz overclocked cpu with whisper quiet system, look at the screen shot, 04C cpu temp? nah this software is buggy as hell. I belive it to be 24C idle

Amd offical ryzen master software used:

https://www.amd.com/en/technologies/ryzen-master

Once i find out what is going on with the watercooling hardware via ekwb support, i'm testing jumping into unraid testing asap.

Wish we had full ssd support for storage drives. Can post some more pics regarding the system i've built if need be in a seperate thread. Thermaltake x5 is a bit different to build in.

Edited March 28, 20179 yr by Akio
typo

Quote

March 28, 20179 yr

23 hours ago, Pauven said:

That's without a Win10 VM running, correct?

Looks like you have the IT8665E sensor chip, and sensors-detect is failing to find the right driver for it. Had it detected them, you would have been prompted to install them. You can try to manually modprobe the drivers from the command line, since you know what they are, assuming they are included in unRAID 6.3.2. In all likelihood, you're playing the waiting game for a newer unRAID version.

Oh no, another one bites the dust. I felt lonely when it was just me, but now this is getting to be a crowded room.

All of my crashes have been when the server was pretty much doing absolutely nothing. I've wondered if idle states are contributing to the problem. It's also crossed my mind that a Win10 VM is helping simply because it is giving the unRAID OS some work to do...

I understand the desire for water cooling, but it sounds like something is wrong with your waterblock retention mechanism. I've never seen a cooler mounting that allowed it to be over tightened. This is extra concerning since you indicated you have to remove the specified gasket in order to get it to work at all, this makes me think you have some parts wrong. You should talk to the manufacturer to make sure you have all the right components and are assembling it correctly.

You also need to ask yourself, is water cooling worth it? With my Noctua NH-C14S, running dual 140mm fans, at full speed they are fairly quiet and my 1800X never went over 38c during benchmarking/stress testing. At idle it was closer to 30c or less. With the fans running in silent mode, I still pretty much stay below 40-45c at peak. It's amazing for what it does, and it isn't even the best cooler Noctua sells - this was just the biggest that would fit my server case.

I had a DIY water cooling rig years ago (maybe back in 2003) and it was great until it leaked. Funny how water and electronics don't mix so well, let's out the magic smoke that powers the processor. I swore I'd never do it again. Though I did use an all-in-one Corsair H100 on my office build 5 years ago. It still works okay, but I never felt it lived up to the billing.

-Paul

Hi Paul,

Correct, I am no longer running any VM's in the background and have had the system lock up twice since my last post. Both times frozen at the boot console. With no logs to report.

Thanks,

Chad

Quote

March 28, 20179 yr

So yeah, I had another crash overnight, definitely no useful logs even with troubleshooting mode. Frozen console, nothing appearing out of the ordinary except that everything stopped (including further logging).

Quote

Anybody planning a Ryzen build?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)