UNRAID 6.6.6 - Soft Lockup - Ryzen 1800X - Asus X370 Pro


Recommended Posts

All, while testing out Unraid (haven't paid yet) as a potential replacement for my QNAP's I'm getting the below infrequently, but frequently enough that I won't be able to use this as a NAS / VM or whatever else I need to do if I don't fix it. I have disabled c states and am running latest BIOS 4207 with Agesa 1006.

 

I think it's a soft lockup as the keyboard lights were still on, but actually didn't 100% confirm.  I did get a hard lockup the other day, but that was before I disabled c states in the bios.

 

Can anybody help with any ideas / suggestions?  If it's a hardware fault, I 'should' be able to return it - I assume the AMD warranties are fairly decent in length.

 

Thanks,

 

tower-diagnostics-20190117-2140.zip

 

 

UNADJUSTEDNONRAW_thumb_bda.thumb.jpg.8848d21e490eccbef974c736063beac2.jpg

Link to comment

I did find the below link, which suggests if I find the deep sleep setting and enable it I can then turn back on these other settings.  Can't say I've noticed a deep sleep setting, but would be interested in your thoughts on this too.  Sounds like you've figured out how to make yours nice and stable.  I'll see how your suggestion works anyway.  Thanks.

 

 

Link to comment
  • 2 weeks later...

It just happened again overnight - where I suspect it was near idle.  I've added the rcu_nocbs=0-15 because I read this was still required and rebooted and now it crashes before it even boots up.  So will have to remove that.  But the question remains, what's crashing the system and how do I fix it.  Any other options you can think of?

Link to comment

Good day. New to Unraid and enjoying the learning. I built an Asrock x470itx with a Ryzen 2700x. My first build !

Picked the parts based on...... ordered them and then started reading about the lock ups ;-)

Anyway, took me 2 days to work out how to plug everything and get it up and running. Whilst updating the MOBO bios, I decided that it made sense to set up the RAM frequency, since it was stated in the package and I was there........

Plugged Unraid and started running; whilst executing pre-clearing the server would drop about every 4 hours. 

To summarise, I tried every CPU setting in the bios with no success; until in the end I remembered fiddling with the RAM frequency...

Put it back in Auto, which I should have never touched and surprise, surprise..... server running without problem for 2 straight days.

 

Perhaps my outcome may have been obvious, but since no doubt there are many more newcomers like me, I thought worthwhile sharing this experience. 

 

 

  • Upvote 1
Link to comment

Thanks, I have mine on Auto too - And it's been running fine since my last post.  However yesterday and today I have again had lockups.

 

Haven't started googling yet, in case it's something new - though I don't see why it would be - any ideas?  These Ryzens are seemingly quite a hassle.  To recap, I've set the power supply idle state and set the c-states and added the rcu_nocbs=0-15 (15 in my case) to the kernel.  Getting tired of having to do parity checks, which then find errors :(

 

 

UNADJUSTEDNONRAW_thumb_c3c.thumb.jpg.ef083265b6afd68555215703c79c47bf.jpg

UNADJUSTEDNONRAW_thumb_c3d.thumb.jpg.b49dbf2e5e5a26b59ec419b47acea566.jpg

Link to comment

Updated to 6.7.0-rc2 and it has already crashed again.  That's probably going to make it easier to track down - I'm suspicious about disk io now.... this can't be specific to Ryzen or it would have always been happening.  I have added a Dell Perc H310 card though and probably now I'm actually beginning to write to those disks.  Hmmm

 

UNADJUSTEDNONRAW_thumb_c3e.thumb.jpg.7d3e12a9626e486c9b58af9ad3eae5f4.jpg

Link to comment

Yeah, even in single device, reformatted as nfs, it changed it to BTRFS.  Even with the pool previously removed.  I did see something about Btrfs caches being a problem before and one of those errors took me to an XFS issue which sounded similar.  Stopping some large writes I was doing, it is now not crashing.  Therefore I have removed the cache and restarted the writes.  Will see what happens.  If it goes away, I'll try to recreate it again I guess.  I'd rather it was cache than something hardware related.

Link to comment

Yes, I formatted XFS first (confirmed that), but then it reformatted to Btrfs as soon as it was added.  I recall last time I could bury into the settings and change it - but didn't seem to find it this time - not as simple as I would have thought.  And I AM on the RC.  Not sure if it's still in the latest release candidate.

Link to comment

I also removed the 2nd NIC a few days ago.  However, I woke up this morning and it had crashed.  Maybe it's running on bitcoin lol.

 

I'm really struggling to get good logs on this one.  It lasted for a few days longer, however as I had seen before this seems to happen more quickly at times of high I/O which has not been happening lately.  Hmmm, I wonder if I have one of those Marvell controllers.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.