Jump to content

[SOLVED] New 1700X Build routinely crashing


Recommended Posts

Hey y'all! I'm currently in the process of putting Unraid through its paces to see if I can finally have it replace my Win10-based NAS and have been running into some odd crashing issues. Basic setup I have for it right now will be listed in diags, but it's a Ryzen 1700X on a B450 Asus motherboard with 64GB of RAM running a handful of containers (Sonarr/Plex/Unifi/Jackett/qB). Diagnostics and most recent syslog are attached, but a couple pieces of behavior:

Most of the crashes I've seen have varied in their "appearance" - sometimes it'll result in a hard reset, whereas other times it'll just lock up and I'll need to force a reset. The most recent one happened while I was asleep at ~7:19am (according to logs), getting a couple call traces and this (which is at line 13512 in the attached syslog if you want to see full results):

May  4 07:19:58 BEEG-BOX kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
May  4 07:19:58 BEEG-BOX kernel: PGD 0 P4D 0 
May  4 07:19:58 BEEG-BOX kernel: Oops: 0002 [#1] SMP NOPTI
May  4 07:19:58 BEEG-BOX kernel: CPU: 11 PID: 3685 Comm: mdrecoveryd Tainted: G           O      4.19.107-Unraid #1
May  4 07:19:58 BEEG-BOX kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 3003 12/09/2019
May  4 07:19:58 BEEG-BOX kernel: RIP: 0010:get_free_stripe+0x73/0x8e [md_mod]
May  4 07:19:58 BEEG-BOX kernel: Code: 89 5b 08 7e 10 48 8b 73 20 48 c7 c7 29 41 10 a0 e8 f6 eb f8 e0 48 8b 53 f8 48 85 d2 74 20 48 8b 43 f0 48 85 c0 48 89 02 74 04 <48> 89 50 08 48 c7 43 f0 00 00 00 00 48 c7 43 f8 00 00 00 00 4c 89

I had been getting other MCE errors but those all seemed to line up with the "standard" Ryzen Zen 1 errors that I'd seen on some of the bugzilla threads, and interestingly usually during active use (i.e, me messing with docker containers) as shown here:

May  3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: Machine check events logged
May  3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea0000000000108
May  3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8168b28a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 
May  3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1588491481 SOCKET 0 APIC d microcode 8001138

 

 I've already added the zenstates -disable-c6 into my go config, as well as disabling c-states within BIOS (I do not have an entry for the power supply idle, however, no matter where I looked) which helped them seem to not happen as frequently, but that could also just be a placebo effect.

 

I really want to run Unraid and NOT True/FreeNAS if at all possible - I'm going to pick up a 3600 tomorrow to see if that helps but I wanted to get input from others here to see if they had any insight as to why I'm getting these errors, and/or if getting a 3600 would even help. I've heard that the newer Ryzen chips have better general support/fewer errors but I really don't want to be running something that'll only crash every once in a while - I'd ideally like it to crash not at all :) 

 

Thanks in advance!

beeg-box-diagnostics-20200504-2343.zip syslog

Edited by Mandersoon
Fixed
Link to comment
Just now, Mandersoon said:

should I be actively underclocking them to 1866 in this case then?

I would start there, if stable you can try 2133Mhz, but make sure there are no sync errors during a parity check even if stable, it's one of the possible side effects of overclocking RAM with Ryzen, even if memtest doesn't detect any errors.

Link to comment

Gotcha, sounds good. I underclocked to 1866 this morning and am running another parity check and it looks like it's corrected ~140 over the last two hours so that's likely because of the memory overclock/unexpected shutdowns I'd guess. I'll leave it running as-is for now and monitor.

 

Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? The main reason I ask is because I'm worried about potential Plex transcoding performance with Ryzen's horsepower being inherently tied to memory speed in some capacity.

 

Thanks for your help! :)

Link to comment
7 hours ago, Mandersoon said:

Gotcha, sounds good. I underclocked to 1866 this morning and am running another parity check and it looks like it's corrected ~140 over the last two hours so that's likely because of the memory overclock/unexpected shutdowns I'd guess. I'll leave it running as-is for now and monitor.

 

Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? The main reason I ask is because I'm worried about potential Plex transcoding performance with Ryzen's horsepower being inherently tied to memory speed in some capacity.

 

Thanks for your help! :)

I'm in just about the same boat as you. I'm running a Zen+ APU and was having the same issue. I've got a 3600 on order that should be here next week. I can report back and let you know.

 

I may be wrong (hopefully not), but I'd expect the 1700x wouldn't have issue with RAM as spec'd, since I suspect the issue is with the increased voltage for overclocking. 

Link to comment
22 minutes ago, coffeeroasted said:

I may be wrong (hopefully not), but I'd expect the 1700x wouldn't have issue with RAM as spec'd, since I suspect the issue is with the increased voltage for overclocking. 

Yeah I'm hoping that it should be fine - I've got a 3600 ready for pickup tomorrow that I can try as well if I get any more crashes over the next couple days. I might end up just using the 3600 anyways just so I can have the RAM run at its rated speed (and performance-wise it's basically the same/slightly better than the 1700 anyways).

Link to comment
11 hours ago, Mandersoon said:

Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel?

Yep

Link to comment
10 hours ago, Mandersoon said:

Yeah I'm hoping that it should be fine - I've got a 3600 ready for pickup tomorrow that I can try as well if I get any more crashes over the next couple days. I might end up just using the 3600 anyways just so I can have the RAM run at its rated speed (and performance-wise it's basically the same/slightly better than the 1700 anyways).

That's why I went ahead and just ordered a 3600. The RAM I ordered was supposed to be 2933 at 1.2V, but what was sent was OC'd at 2933, which caused me issues. I'd like the ability to utilize my system to it's max capacity, plus the 3600 gives me a couple more cores to utilize for VMs.

Link to comment

As an update, 1700X was stable at 1866 but my Plex performance hurt a bit (as somewhat expected) - put in the 3600 two nights ago and applied the DOCP profile to its rated 2666 and still pretty stable! :) Will just need/want to do one more parity check to see if there's anything odd but things seem to be working! Thanks all for the help! 

Link to comment
  • JorgeB changed the title to [SOLVED] New 1700X Build routinely crashing

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...