Mandersoon

Members
  • Posts

    10
  • Joined

  • Last visited

Mandersoon's Achievements

Noob

Noob (1/14)

0

Reputation

  1. 😭😭😭 Well I thought it was going well since the update on Monday but it just had another random reboot this morning with no errors in logs yet again. I'm running a memtest now to see if anything is funky there but I guess I should start looking at ordering a different motherboard or something - I hate just ordering hardware in the hopes that it'll fix issues but I don't really have a choice here at this point.
  2. Yeah it definitely smells like something hardware-related. As far as I can tell nothing temperature-wise has changed substantially. The ambient temp near the server is a bit high, but not enough for it to be anywhere close to thermal shutdown or anything like that. When the 1700X was in that machine, it was peaking at about 80C after a sustained Plex transcode load. I can't check thermals while booted into Unraid, but the couple times that I've rebooted and immediately looked at BIOS after reboot it's usually been in upper 30s/low 40s (which I expect given ambient temps). I was talking to a friend earlier who had the same motherboard I have (ROG Strix B450-F Gaming) and he mentioned that he swapped it out as he was getting reboots every couple weeks (which I was also having) so there's a possibility there's some incompatibility there but so far it's unsubstantiated. There was a new BIOS update published last month so I updated to its latest version today (3103 at time of writing) and started it back up/letting parity check run its course. I had the machine open at the time and it looked like all fans were operating & the heatsink was latched on appropriately. I'm thinking I might run a memtest again if it crashes/reboots again to see if it's RAM or something but if it ends up being the motherboard at fault I'm worried that memtest might pick up errors that are caused by the mobo instead of RAM.
  3. Yeah I read through all of that when I was initially running into issues - my motherboard doesn't have that setting visible anywhere in BIOS that I can recall, though I can try double-checking again later tonight. I just find it odd that it went from pretty stable to practically unusable over the course of two days with no configuration changes.
  4. As a slight update, it seems to only do it occasionally (got really unlucky at time of posting because it had been doing it soon after boot), the server had been mostly fine over the week but it's happened twice today. It "recovered" in that it rebooted this time, but I have no clue whatsoever as to how to troubleshoot this/isolate what the issue is given that syslog says basically nothing.
  5. Hey there! My machine had been running pretty stable since my last post, just had a hard freeze or two since (which I can stomach if it's once every other month or something). Ever since yesterday, I've started having hard freezes within a couple hours of me turning on the machine and I can't discern why. I have attached both diagnostics and the syslog I pulled from my flash drive. I can't make heads or tails of anything specific on reboot and there aren't any errors that the syslog is recording prior to the hang so I'm at a loss Any help would be appreciated!! I'm running several docker instances but none of them auto-update, so there had been no configuration changes at all for almost a week prior. I also updated my docker containers to see if that'd help, as well as disabling some of them but still happened sometime last night. One of my friends mentioned that it crashed seemingly around the same time he watched a particular episode on Plex, but after rebooting and trying to watch that episode neither of us had an issue afterwards, so unclear if that's related. General specs: 3600X, 64GB of RAM (was running at 2666 XMP but also tried turning off XMP and running at 2133 but same thing), ROG Strix B450-F Gaming flashsyslog.log beeg-box-diagnostics-20200728-1826.zip
  6. As an update, 1700X was stable at 1866 but my Plex performance hurt a bit (as somewhat expected) - put in the 3600 two nights ago and applied the DOCP profile to its rated 2666 and still pretty stable! Will just need/want to do one more parity check to see if there's anything odd but things seem to be working! Thanks all for the help!
  7. Yeah I'm hoping that it should be fine - I've got a 3600 ready for pickup tomorrow that I can try as well if I get any more crashes over the next couple days. I might end up just using the 3600 anyways just so I can have the RAM run at its rated speed (and performance-wise it's basically the same/slightly better than the 1700 anyways).
  8. Gotcha, sounds good. I underclocked to 1866 this morning and am running another parity check and it looks like it's corrected ~140 over the last two hours so that's likely because of the memory overclock/unexpected shutdowns I'd guess. I'll leave it running as-is for now and monitor. Side question on memory - if I get a 3600/other Zen2 CPU, should I be able to enable DOCP again to get me to the rated 2667 memory speeds and theoretically be OK, given that's the max that Zen2 should support with 4 DIMMs in dual channel? The main reason I ask is because I'm worried about potential Plex transcoding performance with Ryzen's horsepower being inherently tied to memory speed in some capacity. Thanks for your help!
  9. Got it, must have skipped over that. If I turn off DOCP my motherboard will default to running all 4 DIMMs at 2133, should I be actively underclocking them to 1866 in this case then?
  10. Hey y'all! I'm currently in the process of putting Unraid through its paces to see if I can finally have it replace my Win10-based NAS and have been running into some odd crashing issues. Basic setup I have for it right now will be listed in diags, but it's a Ryzen 1700X on a B450 Asus motherboard with 64GB of RAM running a handful of containers (Sonarr/Plex/Unifi/Jackett/qB). Diagnostics and most recent syslog are attached, but a couple pieces of behavior: Most of the crashes I've seen have varied in their "appearance" - sometimes it'll result in a hard reset, whereas other times it'll just lock up and I'll need to force a reset. The most recent one happened while I was asleep at ~7:19am (according to logs), getting a couple call traces and this (which is at line 13512 in the attached syslog if you want to see full results): May 4 07:19:58 BEEG-BOX kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 May 4 07:19:58 BEEG-BOX kernel: PGD 0 P4D 0 May 4 07:19:58 BEEG-BOX kernel: Oops: 0002 [#1] SMP NOPTI May 4 07:19:58 BEEG-BOX kernel: CPU: 11 PID: 3685 Comm: mdrecoveryd Tainted: G O 4.19.107-Unraid #1 May 4 07:19:58 BEEG-BOX kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 3003 12/09/2019 May 4 07:19:58 BEEG-BOX kernel: RIP: 0010:get_free_stripe+0x73/0x8e [md_mod] May 4 07:19:58 BEEG-BOX kernel: Code: 89 5b 08 7e 10 48 8b 73 20 48 c7 c7 29 41 10 a0 e8 f6 eb f8 e0 48 8b 53 f8 48 85 d2 74 20 48 8b 43 f0 48 85 c0 48 89 02 74 04 <48> 89 50 08 48 c7 43 f0 00 00 00 00 48 c7 43 f8 00 00 00 00 4c 89 I had been getting other MCE errors but those all seemed to line up with the "standard" Ryzen Zen 1 errors that I'd seen on some of the bugzilla threads, and interestingly usually during active use (i.e, me messing with docker containers) as shown here: May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: Machine check events logged May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 5: bea0000000000108 May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8168b28a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 May 3 00:38:22 BEEG-BOX kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1588491481 SOCKET 0 APIC d microcode 8001138 I've already added the zenstates -disable-c6 into my go config, as well as disabling c-states within BIOS (I do not have an entry for the power supply idle, however, no matter where I looked) which helped them seem to not happen as frequently, but that could also just be a placebo effect. I really want to run Unraid and NOT True/FreeNAS if at all possible - I'm going to pick up a 3600 tomorrow to see if that helps but I wanted to get input from others here to see if they had any insight as to why I'm getting these errors, and/or if getting a 3600 would even help. I've heard that the newer Ryzen chips have better general support/fewer errors but I really don't want to be running something that'll only crash every once in a while - I'd ideally like it to crash not at all Thanks in advance! beeg-box-diagnostics-20200504-2343.zip syslog