Anybody planning a Ryzen build?


Recommended Posts

I updated my list above with Bureaucromancer's crashes:  https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?do=findComment&comment=548546

 

Latest State of the unRYZEN:

  • Of 11 currently running Ryzen unRAID systems, 7 have experienced crashes with unRAID (well, technically 6 have reported, but I'm adding in jonp/Lime-Tech rumored issues, really wish they would confirm here).  
  • 2 of those 7 with crashes include Beancounter and Bureaucromancer, who seem to not have crashes when running Win10 VM, for whatever reason.

Getting ugly...

 

-Paul

Link to comment
23 hours ago, ufopinball said:

Now I have completed the Windows 10 VM configuration to pass through the GPU and USB cards.  I have this VM loaded with the Task Manager open to the "Performance" tab, configured to show Logical Processors (right-click on the graph, "Change graph to", then "Logical Processors").  As long as it's constantly updating the screen, it has at least something to do.  Been running for 2 hours and 30 minutes, will let it go and hope this keeps things stable.

 

Just a quick update.  Using the above configuration, my server has been running stable for 1 day, 2 hours, 21 minutes.  If you think it's worth the effort, disable/remove the Dynamix System Temp and AutoFan.  I primarily had problems with the former, but for the moment I am running without either.  As I understand it, nobody has a full supported driver yet?

 

My list of plugins is Community Applications, Dynamix webGui, Nerd Tools, Open Files, Preclear Disks, Statistics, unRAID Server OS and User Scripts.

 

Otherwise, I have a LAMP server running, and a Windows 10 VM that's passing through the GPU & integrated HDMI sound, keyboard/mouse and the USB card.  Need to add the USB sound device eventually, and route wires to the proper monitors ... but right now I'm mainly going for uptime.

 

- Bill

Link to comment

So last night around 9pm the server crashed/rebooted again.  This was with PCIe ACS Override enabled, and the Bluetooth adapter passed through to the running (and fully patched) Win10 VM.  So as of yet, I don't have Win10 VM providing any additional stability on my system.

 

After it rebooted the server came up in offline mode (i.e. array stopped, no VM's running). And it was still running when I finally looked at it this afternoon, over 16 hours of uptime - a new record for me....

 

This behavior is part of what makes troubleshooting this issue so frustrating - not just the fact we are not getting error messages in the log files, but also that the crashes are completely random.  

 

For those of you that have had 48+ hours of uptime, how do you know your Ryzen server is really stable?  How do we know it won't crash at 96 hours or 2 weeks?  My little Celeron ran months without crashing, never crashed once in 4 years of use, no matter what I threw at it.  We've all only had our Ryzens for a few weeks at most, and we're all still tweaking and playing and restarting servers and trying new things.  I think the jury is still out on whether ANY of our Ryzen systems are truly stable.  My hunch is that some of us have a configuration or usage pattern that is simply keeping the server up longer, delaying the inevitable crash, and within the coming weeks the few who haven't yet experienced issues will join the majority who have.  I don't think any Ryzen system is production validated for unRAID at this time.

 

jonp of Lime-Tech cautioned us to allow them to test Ryzen out first, before we make the plunge.  That is really good advice, and based upon the results the majority of use early adopters are experiencing, everyone should follow Jon's advice.  I doubt anyone expected the stability issues that are being reported here.

 

I do think it is time for Lime-Tech to weigh in here, even with a short note.  I think they should share their experiences with Ryzen and their current appraisal of the stability situation.  I'm open to beta testing fixes, running troubleshooting steps, providing diagnostics and whatever I can do to assist.  It would be really nice if they stated their path forward, even if the path forward is that they will not be able to support Ryzen, or they believe it is defective memory, or that they're as completely befuddled as we are.

 

Sorry, end of rant.

 

I went ahead and stopped the server to proceed with my next test.  I installed my GTX 670 GPU back into the server (I've been running the past week on the GT 710, a low end GPU that only uses a PCIe x1 slot).  With two GPU's I hoped to successfully pass a GPU into the Win10 VM.

 

Luckily enough, with ACS Override helping, the GTX 670 was in a separate IOMMU group, and I passed both the nVIDIA video and audio through.

 

The only problem was that when the server booted, it wanted to use the GTX 670 as the primary GPU, not the GT 710.  When I passed the GTX 670 through and started the VM, my console display went dark, showing neither Windows or the console.  Also, now that I'm passing the GPU through, the ability to remote into the desktop via VNC is now gone.  So while the Win10 VM is running fine, I cannot see anything on the Windows desktop.  I had looked in my BIOS settings for a way to set one GPU as the primary, but could not find a way.  Not sure how other users accomplish this.  Oh, almost forgot to mention, I put the GT 710 in PCIe slot 1, and the GTX 670 in slot 3, hoping that the lower slot # would give it priority.

 

I also disabled a lot of stuff in the BIOS, turning off anything related to low power states that I could find.  And I also turned off multi-threading.  I'm basically just throwing everything at the wall to see if anything sticks, and if something does then I can methodically tweak settings to find out what made the difference.

 

Anyway, I'll let it run in the current config and see what happens.  4 hours up and counting...

 

-Paul

 

 

Link to comment
6 hours ago, Pauven said:

For those of you that have had 48+ hours of uptime, how do you know your Ryzen server is really stable?  How do we know it won't crash at 96 hours or 2 weeks?

 

Prior to upgrading the BIOS (from 0504 to 0511), I had an uptime of 5 days, 21 hours, 35 minutes.  While there is work to be done on the Windows 10 VM, I may be able to accomplish most (if not all) of it without rebooting unRAID itself.  To the degree that I can accomplish that, I'll keep the server up and see how far it gets.

 

While I understand this has been frustrating, I figure if you can push past 3-5 days, that's a pretty good sign.  Outside of that, time will tell ... but those tests will hopefully be cut short by a new BIOS, or an update from unRAID, even if it's only a Beta or RC.

 

For me personally, things are busy enough that I'm unlikely to do much with the Windows 10 VM this week.  Sunday looks to be the best option for such activities, so we'll see if the server runs stable at least through then.

 

Otherwise, I agree that it would be nice if the unRAID team could join us in this conversation.  At least there seems to be enough of us here to show it affects a good number of people.  Also, the Ryzen 5 CPU line comes out in 2 weeks, so we could see an additional influx of Ryzen builds.

 

- Bill

Link to comment

Okay, something I did on my last round of changes is making a difference.  I'm at 24 hours of uptime and counting, which is a significant new record for my server, especially considering it usually crashes within 2 hours when running unRAID.  It's still possible that this is a fluke, as on two occasions the server did not crash before I restarted it, once at about 13 hours, and once at about 16 hours.  Both of those previous long uptime events came after crashes, and with no changes since the previous crash, so my assumption is that they would have eventually crashed, just no way to know when.  It will probably take multiple rounds of testing to confirm if I've really made a difference.

 

Before I list what I changed on my last test, I should detail the previous test that failed:

 

Running a non-overclocked Ryzen 7 1800X with 64GB DDR4-2400 at 2400 (QVL'd by ASRock, and which has successfully passed 16 passes of Memtest86+ on my server).  unRAID was configured with a single drive (no parity) array.  The only PCIe devices installed were a single GPU (nVIDIA GT 710), and a Samsung 960 M.2 drive.  My UPS is plugged into the server via USB, and is configured in unRAID.  I was running a fully patched Win10 Pro VM (with no programs installed), with PCIe ACS Override enabled, 8 (of 16) Ryzen cores passed through (so not a virtual CPU), 16GB RAM, and a single USB device passed through (Intel Bluetooth adapter that is built into motherboard).  The GPU was NOT passed through (using VNC), and Logitech USB keyboard/trackpad was also not passed through.  I was connected to the Win10 VM via VNC, with Task Manager opened to the Performance tab, monitoring CPU utilization.  I also had the unRAID GUI on the Dashboard tab, monitoring CPU load, fan speeds and temps.

 

The above configuration crashed twice, both times within 4 hours. 

 

I then changed all of the following for the next test:

  • Moved the 1st GPU, the GT 710, from PCIe x1 slot #4 to PCIe x1 slot #1 (trying to make it the primary GPU, but GTX 670 came up primary)
  • Installed a 2nd GPU, a GTX 670, into PCIe x16 slot named PCIe #5 (which is the second of three x16 slots) (first time dual-GPU's installed).
  • BIOS:  Disabled "Cool 'n' Quiet" (first time disabling)
  • BIOS:  Disabled "AMD fTPM Switch" - Trusted Platform Module (first time disabling)
  • BIOS:  Disabled "C6 Mode" (first time disabling)
  • BIOS:  Disabled "Deep Sleep" (first time disabling)
  • BIOS:  Disabled "Suspend to RAM" (first time disabling)
  • BIOS:  Disabled "Security Device Support" (first time disabling)
  • BIOS:  Disabled "SMT" multi-threading (first time disabling)
  • BIOS:  Disabled "Core Performance Boost" (first time disabling)
  • BIOS:  Disabled "Global C-state Control" (first time disabling)
  • BIOS:  Disabled "HPET In SB" - High Precision Event Timer (but I had disabled this in earlier tests and it made no difference)
  • BIOS:  Changed a PCIe/NVMe parameter from splitting x2 lanes to each to dedicating x4 lanes to NVMe (first time changing)
  • BIOS:  Disabled WiFI (but I had disabled this in earlier tests, and it still showed up in Windows, seems the BIOS switch is broken)
  • Modified the Win10 VM to use 4 (of now just 8 available) Ryzen cores, to pass through the GTX 670 (both video and audio) and pass through the Logitech USB keyboard/trackpad (which was already installed in earlier tests).
  • Started the Win10 VM.  I could not connect to Windows desktop, as the GTX 670 monitor displayed no output, and VNC connection option was now gone.  Assumed Windows was booted and sitting on desktop (no password), as CPU/RAM utilization implied, but no way to verify.

 

The above has now been running for 24 hours, and is still going.

 

The last log entry was from just shortly after I started up the server, so no logging activity is occurring, but this is not a new development, as outside of initial log entries on startup, I typically only see entries when I actively change something in unRAID. Avg. CPU load primarily stays at 0%, with occasional spikes to 1%.  I see light CPU usage on the 4 CPU cores passed through to the Win10 VM, but also see even lighter CPU activity on the 4 CPU cores allocated to unRAID.  Even though the Win10 VM is running, since I can't get to the desktop it is sitting there idle.  Pretty much the whole server is idle, outside of me monitoring the Dashboard to make sure it is alive.

 

For those wondering, I have the following plugins installed/configured (please note, I've tested in Safe Mode with no plugins and still get crashes, so I'm currently testing with plugins).  None of these have changed or updated between tests, they've been set this way for over a week:

  • Nerd Tools / Nerd Pack (with Perl, Screen, and Utempter enabled)
  • CA Auto Update Applications
  • CA Backup / Restore Appdata
  • CA Cleanup Appdata
  • Community Applications
  • Dynamix Auto  Fan Control (configured to control my array fan, with otherwise default params)
  • Dynamix System Information
  • Dynamix System Temperature (with nct6775 drivers loaded and provided good temps for CPU/MB and fan speeds)
  • Dynamix webGui
  • Fix Common Problems
  • Tips and Tweaks (CPU Scaling Governor set to Performance)

That's it.

 

So obviously I made changes like throwing a box of grenades, just hoping to make a difference, and it appears that one (or more) of those grenades may have actually found a target.  I find it hard to believe that adding a 2nd GPU and passing it through to the Win10 VM would improve anything, but I can't rule this out.  No idea on the GPU driver status in Windows since I can't get to the desktop, but I assume Windows automatically installed default drivers included in Win10. 

 

It seems more logical to me that one of the BIOS changes I made was the actual difference maker, especially one of the settings related to power management, since all of my crashes have come when the server is idling and the CPU is presumably going into lower power sleep states.  This also would align with the idea that running a Win10 VM is preventing the CPU from idling and causing problems with unRAID.

 

My plan is to let it run as-is for another 24 hours.  If the server is still running after 48 hours, I will then stop the Win10 VM, and let it run another 24 hours.  If still no crashes, then I will start backing out some of my BIOS changes.

 

One last observation:  In the Win10 VM, I do NOT see the Processors in the Device Manager.  I found this odd, as I definitely passed through half of the available cores/threads, and in Task Manager on the Performance tab, it clearly stated the processor is the Ryzen 7 1800X.  My expectation was that, in passing through the processor cores, they would show up in Device Manager, and Windows would begin to handle CPU power saving features.  It might be that because, for whatever reason, the Processors did not show up in Windows Device Manager, Windows 10 isn't actually managing the CPU power saving features.  In that case, if unRAID is having a compatibility problem related to Ryzen's power saving features, on my server a Win10 VM might not address that problem, but disabling the features in BIOS may have fixed it.  

 

In the "Fix Common Problems", I now see a new "Other Comment" indicating that the CPU is running at 100% because there is not CPU Scaling Driver Installed.  I did not see this message before tweaking the BIOS.

 

Which makes me wonder, for those of you where your Windows VM's are solving your stability problems, do you see the "Processors" section in the Windows Device Manager?

 

I think it would also be helpful if someone can go ahead and test with CPU power saving features disabled in BIOS, and with no Win VM running.  I think we may be closing in on the root issue.

 

-Paul

Edited by Pauven
Added a couple of BIOS parameters I forgot to include.
Link to comment
2 hours ago, Pauven said:

Which makes me wonder, for those of you where your Windows VM's are solving your stability problems, do you see the "Processors" section in the Windows Device Manager?

 

I think it would also be helpful if someone can go ahead and test with CPU power saving features disabled in BIOS, and with no Win VM running.  I think we may be closing in on the root issue.

 

Not on mine, there is no "Processors" section in the Device Manager for my Windows 10 VM.

 

As a consequence, "Ryzen Master" will also not run in my Windows 10 VM.  Kind of a bummer, I'd still like some sort of visibility into things like processor temp and fan speeds.  The system sounds pretty quiet, so it doesn't seem like the CPU is being heavily loaded or anything.  The only legitimate measurement I have is a reading of 54 watts as measured on the UPS, which is noticeably better when compared to my previous set of numbers.  This certainly bodes well moving forward, even if not related to stability.

 

At present my uptime is 2 days, 4 hours, 24 minutes.  When I have a reason to reboot, I'll look again at the BIOS settings.  I don't think the ASUS Prime X370-PRO has quite the granularity of settings that your ASRock Fatal1ty X370 board offers.  At a minimum, I don't remember seeing "Cool 'n' Quiet" as an option before.

 

- Bill

Link to comment
1 hour ago, ufopinball said:

 

Not on mine, there is no "Processors" section in the Device Manager for my Windows 10 VM.

 

As a consequence, "Ryzen Master" will also not run in my Windows 10 VM.  Kind of a bummer, I'd still like some sort of visibility into things like processor temp and fan speeds.  The system sounds pretty quiet, so it doesn't seem like the CPU is being heavily loaded or anything.  The only legitimate measurement I have is a reading of 54 watts as measured on the UPS, which is noticeably better when compared to my previous set of numbers.  This certainly bodes well moving forward, even if not related to stability.

 

At present my uptime is 2 days, 4 hours, 24 minutes.  When I have a reason to reboot, I'll look again at the BIOS settings.  I don't think the ASUS Prime X370-PRO has quite the granularity of settings that your ASRock Fatal1ty X370 board offers.  At a minimum, I don't remember seeing "Cool 'n' Quiet" as an option before.

 

- Bill

 

It should - certainly the Maximus Hero does, there's just very little documentation and surprising locations for a bunch of stuff.

Link to comment
3 hours ago, ufopinball said:

I don't think the ASUS Prime X370-PRO has quite the granularity of settings that your ASRock Fatal1ty X370 board offers.  At a minimum, I don't remember seeing "Cool 'n' Quiet" as an option before.

 

The following are standard features on Ryzen and should be exposed in all BIOS firmwares.  Especially Cool 'n' Quiet, which has been an AMD standard for over a decade.

  • Cool 'n' Quiet (standard AMD power saving functionality since the Athlon days, controls the P states)
  • AMD fTPM Switch (this is part of the PSP co-processor inside the CPU, ARM architecture, ARM TrustZone)
  • C6 Mode (very deep sleep for individual cores, has been a required feature for Windows 7 certification so it's not new.  Intel now goes up to C10)
  • Deep Sleep (I think this is equivalent to C4, a step beyond C3.  Don't see any reference to C5/Deeper Sleep in my BIOS)
  • Suspend to RAM (should be standard on all PC's, been around forever, ACPI S3 Sleep, sometimes shown as a toggle between S1 & S3)
  • Security Device Support (used to enable TPM for hard drives)
  • SMT (biggest new feature in Ryzen, a lot of motherboard makers are late to implement this switch in BIOS)

This is not an exhaustive list, just what I found in my motherboard's BIOS.  I feel there are many settings missing.

 

Of course, just because it should be there, doesn't mean it is.  Motherboard manufacturers can be so lazy sometimes...

 

Server is at 31 hours and counting!!!

 

-Paul

Link to comment

I'm sorry guys, but this all seems way too premature!  You're trying to get older tech to work with the newest tech object, without any compatibility updates specific to the new tech.  I would not expect JonP or anyone else to participate here until they had first added what they could, a Ryzen friendly kernel and kernel hardware support and Ryzen tweaked KVM and its related modules, and various system tweaks to optimize the Ryzen experience.  After that, then they can join you and participate.  It's like having an old version with bugs, and an update with fixes.  Why would a developer want to discuss problems with the old.  They are always going to want you to update first and test, then you can talk.

 

There's so much great work in this thread, especially from Paul, but it's based on the old stuff, not on what you will be using, so it seems to me that much of the effort is wasted.  Patience!!!

  • Upvote 2
Link to comment

So I just bumped my board up to bios 1002.  Will see if it changes anything, but I'm kinda expecting this to take a kernel update to sort out properly.

 

That said, other than the VM/stability thing my only major issue at this stage is IOMMU.  It works perfectly for card two, but I can't for the life of me get card 1 (an RX 460) to initialize when attached to a vm.  Terminal turns off fine, but black screen.  Anybody got any thoughts?  Might this be as simple as being a side effect of basically everything being one IOMMU group without ACS?  Not a huge priority, but there's not a whole lot else to tinker with on this thing while waiting to see if it's basically stable ;)

 

Huh.  Gonna have to try flipping things around, seeing some stuff at Level 1 that suggests it might be worth trying with Nvidia in primary and AMD secondary, which isn't thrilling for power consumption while idle, but an intersesting possibility.  Will try it in a few days when I have some real time available.

Edited by Bureaucromancer
Link to comment
13 hours ago, RobJ said:

I'm sorry guys, but this all seems way too premature!  You're trying to get older tech to work with the newest tech object, without any compatibility updates specific to the new tech.  I would not expect JonP or anyone else to participate here until they had first added what they could, a Ryzen friendly kernel and kernel hardware support and Ryzen tweaked KVM and its related modules, and various system tweaks to optimize the Ryzen experience.  After that, then they can join you and participate.  It's like having an old version with bugs, and an update with fixes.  Why would a developer want to discuss problems with the old.  They are always going to want you to update first and test, then you can talk.

 

There's so much great work in this thread, especially from Paul, but it's based on the old stuff, not on what you will be using, so it seems to me that much of the effort is wasted.  Patience!!!

 

While I agree with your sentiment for patience, especially for those who have yet to purchase a Ryzen based system, I don't think any of this is premature, especially considering Lime-Tech is already on record indicating that unRAID on Ryzen works:

 

On 3/8/2017 at 1:13 PM, jonp said:

Thought I'd chime in with some feedback here on our Ryzen test build that we put together last night and are continuing to test with today.  First and foremost:  YES!  IT WORKS!  Booting up unRAID OS, assigning some storage, starting up a VM, passing through a GPU...all working as intended!  For those of you who were biting your nails on this, you can give them a rest.

 

The emphasis on "YES!  IT WORKS!" is Jon's, not mine.

 

Beyond that, here are some additional considerations:

 

First, unRAID 6.3.2 is not "old tech".  It is current gen tech, released just a short while ago.  It is also the latest and greatest, there are no new Beta's posted to try or even discuss.  

 

Speaking of old tech, Windows 7 is definitely old tech, yet still performs well on Ryzen (in some cases better than Windows 10).  AMD's (and Intel's) primary goal when creating new versions of the venerable x86/x64 CPU design is backwards compatibility.  Creating a new product that does not work on existing infrastructure is a multi-billion dollar mistake (case in point, Itanium).  AMD has done a stupendous job of maintaining backwards compatibility, though obviously older software cannot enjoy the latest architecture enhancements without updates.

 

Lime-Tech has done a fantastic job of late with keeping up on kernel and module updates, my hat goes off to them - salute, and they only keep improving in this regard.

 

I'm not sure what compatibility updates you are anticipating.  The Linux kernel became Ryzen architecture aware in 4.9.10 (the changes from 4.10 were back-ported).  That's pretty much it, it's a done deal in 4.9.10.  Sure, we may see a new CPU scaling governor in a future kernel, just like AMD has promised one for Windows 10 in the May time-frame, and boost support seems sorely lacking, but those are merely optimization tweaks, not core architecture support.  And since unRAID 6.3.2 is on Linux 4.9.10, in theory it is fully compatible with Ryzen.

 

Yes, the forthcoming Linux 4.11 kernel is promising some nice KVM enhancements, much of which isn't even targeted at x86 but rather ARM and PowerPC.  While x86 KVM performance is expected to improve and add some new capabilities, none of this is required to use KVM today on Ryzen.  4.11 will also be bringing some notable new driver inclusions, but this is important only so far as enabling the latest motherboard features like Realtek's new ALC S1220A audio codec, which is of small consequence right now and doesn't even affect half the Ryzen motherboards out there.  For those on the fence thinking a newer Linux kernel is going to magically make a difference, that's simply not going to happen.

 

Which brings us back to the current state of affairs:  If unRAID is built on a Linux kernel that supports Ryzen, and Lime-Tech has indicated it works, then why are the majority of us, yes more than half, experiencing random crashes in unRAID?

 

The problem is not the Linux kernel itself.  I've tested my hardware on Linux kernels 4.8.x, 4.9.10, and 4.10.2 in other Linux distros (like openSUSE Tumbleweed and Ubuntu) and the hardware performed flawlessly.  My hardware has also proven itself in Windows 10, again performing flawlessly.

 

But for whatever reason, in unRAID my server experiences constant crashes.  And I'm not alone, this is a systemic issue affecting most if not all Ryzen owners attempting to run unRAID.  And because we are not getting error messages pointing us in the right direction, we are all playing a guessing game as to what the problem really is.

 

Lime-Tech has already participated here, in this thread, stating that Ryzen works, with the only caveat they provided is that it may not be the right choice for a 2 Gamers 1 PC type build due to non-ideal IOMMU groupings.  What Lime-Tech has not done is acknowledge that there is a problem, nor have they shared any details on issues that they themselves have reportedly experienced, regardless of the cause.  Lime-Tech is essentially allowing their "IT WORKS!" proclamation to stand, misleading potential buyers.

 

I do expect Lime-Tech to discuss problems with the "current" (not "old") tech, especially since the future (new Linux kernels and module updates) offers nearly zero hope of fixing these problems, because the problems appear to be only affecting the unRAID Linux port.  Lime-Tech has done "something" in their build that turns out to be incompatible with Ryzen.  What we are experiencing should be considered a newly discovered "bug" that only affects a specific CPU, not a need to wait for optimizations.

 

So what we are doing here is trying to substantiate that yes, there is a problem with unRAID on Ryzen (and not just defective hardware as I was initially believing), and going a step further we are trying to identify the root cause of the problem.  As a side benefit, we may even discover a simple workaround that allows us to use Ryzen for unRAID without crashes, giving Lime-Tech more time to investigate and determine the solution and provide a fix (if they so choose).

 

I think it is also inappropriate for us to just sit back and wait for Lime-Tech to make everything better.  Lime-Tech is a very, and I mean very, small company.  For years it was just Tom, and while they are much bigger now, Lime-Tech still just a handful of employees.  It is truly incredible what Tom and his small company has produced.  And to be honest, I'm surprised that they even have a Ryzen system in-house.  They have zero obligation to support Ryzen (though obviously it is in their best interest to do so), and they could easily leave things as-is for years.  Or forever.  There is no reason to expect that the natural progression of new unRAID versions with new Linux kernels and all else will automatically resolve this bug.

 

On a side note, I am a technology consultant by trade, self-employed for the past 20 years.  I primarily do work for Fortune 500 and Fortune 100 companies.  While PC hardware and Linux is certainly not my specialty, if I were performing this troubleshooting for a client then for my services rendered over the past couple weeks, I would be submitting to them a bill for over $15k, and the tab is still growing.  And those companies would gladly pay it.  Lime-Tech is the recipient of these services for free, and not just mine, we've got half a dozen or so Ryzen adopters here that are all providing Quality Assurance testing services.  There is a substantial value to this effort, which is not a wasted effort.  I do not think it too much to expect the developer to pay attention and participate when they are the beneficiary of such a significant contribution.

 

-Paul

  • Upvote 2
Link to comment

Paul, I take back much of what I said - I never saw that comment by JonP, or any similar comments that there was *ANY* Ryzen support available yet, at all!  And I also was completely unaware that any Ryzen support had been added to 4.9.10.  ALL of the comments related to that seemed to be that they were waiting for 4.10 or 4.11.  I do apologize for that.

 

But aren't there comments that LimeTech wasn't completely successful yet?  I take that to mean they aren't done making appropriate changes.  Also, have you seen any info on whether KVM/QEMU and related are updated for Ryzen yet?  That's fairly important I think.

 

There is certainly a lot of interest in this thread, probably a lot more than anyone here realizes.  And Paul, while we can't possibly pay you for the investigative work you have done, it is invaluable, has been and will be very helpful!

Link to comment
10 minutes ago, RobJ said:

But aren't there comments that LimeTech wasn't completely successful yet?  I take that to mean they aren't done making appropriate changes.

 

Not in this thread.  Perhaps elsewhere, if so please point me in the right direction, I'd like to read them.  Here's the last two posts from JonP in this thread:

 

jonp on March 8:  https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?do=findComment&comment=543514

 

 

jonp on March 10:  https://forums.lime-technology.com/topic/55150-anybody-planning-a-ryzen-build/?do=findComment&comment=544132

 

 

I haven't seen anything else in this thread from Lime-Tech for the past 3 weeks.  They also didn't respond to a personal message I sent.

 

17 minutes ago, RobJ said:

Also, have you seen any info on whether KVM/QEMU and related are updated for Ryzen yet?  That's fairly important I think.

 

I'm not the best person to ask. 

 

That said, you can read the ChangeLog for the upcoming QEMU 2.9 release here:  http://wiki.qemu-project.org/ChangeLog/2.9

 

A quick scan of that page finds no mention of "AMD" or "ZEN".  The "x86" section is pretty short.  I'm also not aware of any ZEN related issues that are requiring an update to QEMU.

 

You can read the pull request for 4.11 KVM changes here:  http://lkml.iu.edu/hypermail/linux/kernel/1702.2/04145.html

 

Again, no mention of "AMD" or "ZEN", and the x86 section is pretty short.  I'm also not aware of any ZEN related issues that are requiring an update to KVM.

 

As most of the Ryzen adopters here can attest, unRAID is able to run Linux and Windows VM's on Ryzen, often better than the Intel systems being replaced.  Ryzen is a fantastic processor at a great price.  In fact, those running Windows 10 VM's are having much better stability than those of us that are not running one.  Just read through this thread, it is very enlightening.

 

24 minutes ago, RobJ said:

There is certainly a lot of interest in this thread, probably a lot more than anyone here realizes.  And Paul, while we can't possibly pay you for the investigative work you have done, it is invaluable, has been and will be very helpful!

 

I'll freely admit that I have an ulterior motive here.  I am a major AMD shareholder, and I want Ryzen, Naples and Vega to be smashing successes, to my own financial reward.  I'm loving the enthusiasm for unRAID Ryzen builds that I am witnessing here, and I am greatly concerned that there is an issue on unRAID.  If I wasn't invested in AMD so heavily, I could easily turn my back and throw an Intel server board into my server and call it a day.

 

-Paul

Link to comment

My system just reached 48 hours of uptime.  I went ahead and stopped the Win10 VM.  I'll let the system run for another 24 hours in the current state.

 

There were a handful of log entries from stopping the VM.  Nothing too interesting, but thought I would record them here in case the prove to be useful for troubleshooting later.

 

-Paul

 

Mar 30 11:01:27 Tower kernel: br0: port 2(vnet0) entered disabled state
Mar 30 11:01:27 Tower kernel: device vnet0 left promiscuous mode
Mar 30 11:01:27 Tower kernel: br0: port 2(vnet0) entered disabled state
Mar 30 11:01:27 Tower kernel: input: Logitech Logitech BT Mini-Receiver as /devices/pci0000:00/0000:00:07.1/0000:11:00.3/usb3/3-2/3-2.3/3-2.3:1.0/0003:046D:C714.0006/input/input10
Mar 30 11:01:27 Tower kernel: logitech 0003:046D:C714.0006: input,hiddev0,hidraw3: USB HID v1.11 Mouse [Logitech Logitech BT Mini-Receiver] on usb-0000:11:00.3-2.3/input0
Mar 30 11:01:27 Tower kernel: input: Logitech Logitech BT Mini-Receiver as /devices/pci0000:00/0000:00:07.1/0000:11:00.3/usb3/3-2/3-2.2/3-2.2:1.0/0003:046D:C713.0007/input/input11
Mar 30 11:01:27 Tower kernel: hid-generic 0003:046D:C713.0007: input,hidraw4: USB HID v1.11 Keyboard [Logitech Logitech BT Mini-Receiver] on usb-0000:11:00.3-2.2/input0
Mar 30 11:01:28 Tower kernel: vgaarb: device changed decodes: PCI:0000:0e:00.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem

 

Link to comment

Hey all, sorry I haven't chimed in sooner.  Didn't see this thread spiraling out of control like this, but having read the last several posts, phew!  You guys have had a lot of discussion.  Here's the update:

 

We have a Ryzen system in house that went through basic compatibility testing, not endurance testing.  Since then, we HAVE noticed some issues with stability, but we are still tracking them down.  Initially memtest gave us bad results, but reseating the memory and trying again have shown no errors yet hard crashes on the host do occur.  We are continuing to investigate this but do not yet have an ETA on a resolution.

Link to comment
 
I'll freely admit that I have an ulterior motive here.  I am a major AMD shareholder, and I want Ryzen, Naples and Vega to be smashing successes, to my own financial reward.  I'm loving the enthusiasm for unRAID Ryzen builds that I am witnessing here, and I am greatly concerned that there is an issue on unRAID.  If I wasn't invested in AMD so heavily, I could easily turn my back and throw an Intel server board into my server and call it a day.
 
-Paul


I thought you were pretty dedicated to this thread.$$$

Just poking a little fun here---$15k for in your words throwing a box of grenades as a troubleshooting method?

I'm sorry just having a little fun. I'm glad there are people like you that have the time to buy the bleeding edge stuff. I don't do it anymore. I just expect issues and I need stability just from a time perspective. I'll admit I have an ulterior motive in hoping you figure it out. This frees up Limetech for other enhancements while also potentially bringing more customers and growth and giving me a constantly improving product.
Carry on!

Link to comment
31 minutes ago, rxnelson said:

Just poking a little fun here---$15k for in your words throwing a box of grenades as a troubleshooting method?

 

Haha!  Absolutely!  Grenades are expensive.  As they say, "Close only counts in horse-shoes and hand grenades and government work".  My clients don't get to question my methods, only my results.  Though I suppose I would have chosen my words more carefully for a paying client, perhaps something like "an Accelerated First Pass Test Algorithm designed to minimize total troubleshooting time and client expense."

 

23 minutes ago, Beancounter said:

Update:

 

Unraid died after 8 days at exactly 4:30AM. Event logs in my Windows 10 VM indicate that the crash took place just as the OS was trying to update. 

 

ill leave win10 VM running tonight without completing the update to see if it crashes at the same time for the same reason.

 

Wow!  That's huge!  Thanks for sharing.  I think this is the first confirmed crash while running a Windows 10 VM with GPU passthrough.  This may confirm my suspicion that the Win10 VM's aren't preventing the crashes, merely delaying them, but the update activity throws a wrench into the mix. 

 

When you say just as the OS was trying to update, at what step in the update process was it performing?  Downloading, Verifying, Installing, Shutdown Installing, Rebooting, Booting Installing, or something else?  Or can you not tell?

 

Hmmm, I wonder if the Windows VM was resuming from a deep sleep at 4:30a in order to begin the update.  That would tie in well with power saving features being part of the issue.  Maybe it isn't going into a low power state that is causing the crash, but rather coming out of it.  As the Windows VM woke up, it forced unRAID and the CPU to wake up too, so to speak.

 

Still doesn't make sense why this would only affect unRAID.  

 

Jon, care to comment on anything special Lime-Tech is doing related to power management in unRAID?

 

-Paul

Edited by Pauven
A touch of clarity...
Link to comment
4 minutes ago, Pauven said:

Haha!  Absolutely!  Grenades are expensive.  As they say, "Close only counts in horse-shoes and hand grenades and government work".  My clients don't get to question my methods, only my results.  Though I suppose I would have chosen my words more carefully for a paying client, perhaps something like "an Accelerated First Pass Test Algorithm designed to minimize total troubleshooting time and client expense."

Heh heh Good, glad you took it in good fun as that was the intent. 

Link to comment

Just chiming in to mention that once again my IOMMU issues turn out to have been, by all appearances, quirks with my GPUs rather than anything Ryzen related.  Switched the order, using the Nvidia bios passthrough thing and all is good.  Probably something about UEFI reset and the 460 with my limited understanding of what's actually going on under the hood in terms of initializing a GPU under IOMMU.

 

All of which is to say that I'm now getting everything behaving exactly as expected in one of the odder configuration possible for this stuff bar the stability issues everyone's having.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.