Mandersoon Posted May 5, 2020 Share Posted May 5, 2020 (edited) Hey y'all! I'm currently in the process of putting Unraid through its paces to see if I can finally have it replace my Win10-based NAS and have been running into some odd crashing issues. Basic setup I have for it right now will be listed in diags, but it's a Ryzen 1700X on a B450 Asus motherboard with 64GB of RAM running a handful of containers (Sonarr/Plex/Unifi/Jackett/qB). Diagnostics and most recent syslog are attached, but a couple pieces of behavior: Most of the crashes I've seen have varied in their "appearance" - sometimes it'll result in a hard reset, whereas other times it'll just lock up and I'll need to force a reset. The most recent one happened while I was asleep at ~7:19am (according to logs), getting a couple call traces and this (which is at line 13512 in the attached syslog if you want to see full results): May 4 07:19:58 BEEG-BOX kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 May 4 07:19:58 BEEG-BOX kernel: PGD 0 P4D 0 May 4 07:19:58 BEEG-BOX kernel: Oops: 0002 [#1] SMP NOPTI May 4 07:19:58 BEEG-BOX kernel: CPU: 11 PID: 3685 Comm: mdrecoveryd Tainted: G O 4.19.107-Unraid #1 May 4 07:19:58 BEEG-BOX kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 3003 12/09/2019 May 4 07:19:58 BEEG-BOX kernel: RIP: 0010:get_free_stripe+0x73/0x8e [md_mod] May 4 07:19:58 BEEG-BOX kernel: Code: 89 5b 08 7e 10 48 8b 73 20 48 c7 c7 29 41 10 a0 e8 f6 eb f8 e0 48 8b 53 f8 48 85 d2 74 20 48 8b 43 f0 48 85 c0 48 89 02 74 04 <48> 89 50 08 48 c7 43 f0 00 00 00 00 48 c7 43 f8 00 00 00 00 4c 89 I had been getting other MCE errors but those all seemed to line up with the "standard" Ryzen Zen 1 errors that I'd seen on some of the bugzilla threads, and interestingly usually during active use (i.e, me messing with docker containers) as shown here: May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: Machine check events logged May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea0000000000108 May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8168b28a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1588491481 SOCKET 0 APIC d microcode 8001138 I've already added the zenstates -disable-c6 into my go config, as well as disabling c-states within BIOS (I do not have an entry for the power supply idle, however, no matter where I looked) which helped them seem to not happen as frequently, but that could also just be a placebo effect. I really want to run Unraid and NOT True/FreeNAS if at all possible - I'm going to pick up a 3600 tomorrow to see if that helps but I wanted to get input from others here to see if they had any insight as to why I'm getting these errors, and/or if getting a 3600 would even help. I've heard that the newer Ryzen chips have better general support/fewer errors but I really don't want to be running something that'll only crash every once in a while - I'd ideally like it to crash not at all Thanks in advance! beeg-box-diagnostics-20200504-2343.zip syslog Edited May 8, 2020 by Mandersoon Fixed Quote Link to comment
JorgeB Posted May 5, 2020 Share Posted May 5, 2020 You're overclocking the RAM, known issue, see here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Quote Link to comment
Mandersoon Posted May 5, 2020 Author Share Posted May 5, 2020 Got it, must have skipped over that. If I turn off DOCP my motherboard will default to running all 4 DIMMs at 2133, should I be actively underclocking them to 1866 in this case then? Quote Link to comment
JorgeB Posted May 5, 2020 Share Posted May 5, 2020 Just now, Mandersoon said: should I be actively underclocking them to 1866 in this case then? I would start there, if stable you can try 2133Mhz, but make sure there are no sync errors during a parity check even if stable, it's one of the possible side effects of overclocking RAM with Ryzen, even if memtest doesn't detect any errors. Quote Link to comment
Mandersoon Posted May 5, 2020 Author Share Posted May 5, 2020 Gotcha, sounds good. I underclocked to 1866 this morning and am running another parity check and it looks like it's corrected ~140 over the last two hours so that's likely because of the memory overclock/unexpected shutdowns I'd guess. I'll leave it running as-is for now and monitor. Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? The main reason I ask is because I'm worried about potential Plex transcoding performance with Ryzen's horsepower being inherently tied to memory speed in some capacity. Thanks for your help! Quote Link to comment
coffeeroasted Posted May 6, 2020 Share Posted May 6, 2020 7 hours ago, Mandersoon said: Gotcha, sounds good. I underclocked to 1866 this morning and am running another parity check and it looks like it's corrected ~140 over the last two hours so that's likely because of the memory overclock/unexpected shutdowns I'd guess. I'll leave it running as-is for now and monitor. Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? The main reason I ask is because I'm worried about potential Plex transcoding performance with Ryzen's horsepower being inherently tied to memory speed in some capacity. Thanks for your help! I'm in just about the same boat as you. I'm running a Zen+ APU and was having the same issue. I've got a 3600 on order that should be here next week. I can report back and let you know. I may be wrong (hopefully not), but I'd expect the 1700x wouldn't have issue with RAM as spec'd, since I suspect the issue is with the increased voltage for overclocking. Quote Link to comment
Mandersoon Posted May 6, 2020 Author Share Posted May 6, 2020 22 minutes ago, coffeeroasted said: I may be wrong (hopefully not), but I'd expect the 1700x wouldn't have issue with RAM as spec'd, since I suspect the issue is with the increased voltage for overclocking. Yeah I'm hoping that it should be fine - I've got a 3600 ready for pickup tomorrow that I can try as well if I get any more crashes over the next couple days. I might end up just using the 3600 anyways just so I can have the RAM run at its rated speed (and performance-wise it's basically the same/slightly better than the 1700 anyways). Quote Link to comment
JorgeB Posted May 6, 2020 Share Posted May 6, 2020 11 hours ago, Mandersoon said: Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? Yep Quote Link to comment
coffeeroasted Posted May 6, 2020 Share Posted May 6, 2020 10 hours ago, Mandersoon said: Yeah I'm hoping that it should be fine - I've got a 3600 ready for pickup tomorrow that I can try as well if I get any more crashes over the next couple days. I might end up just using the 3600 anyways just so I can have the RAM run at its rated speed (and performance-wise it's basically the same/slightly better than the 1700 anyways). That's why I went ahead and just ordered a 3600. The RAM I ordered was supposed to be 2933 at 1.2V, but what was sent was OC'd at 2933, which caused me issues. I'd like the ability to utilize my system to it's max capacity, plus the 3600 gives me a couple more cores to utilize for VMs. Quote Link to comment
Mandersoon Posted May 8, 2020 Author Share Posted May 8, 2020 As an update, 1700X was stable at 1866 but my Plex performance hurt a bit (as somewhat expected) - put in the 3600 two nights ago and applied the DOCP profile to its rated 2666 and still pretty stable! Will just need/want to do one more parity check to see if there's anything odd but things seem to be working! Thanks all for the help! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.