Jump to content

Unraid 6.6.7 consistently crashing, totally unresponsive


gillkohl

Recommended Posts

Hello community, here is where I am at.

 

since day one (~ one week now) the following has been true

* Parity check finds and "corrects" 84 errors

* somewhere between 7 - 13 hours of uptime the server will become completely unresponsive 

* my FCP plugin does not appear to have troubleshooting mode

 

Things I have done

* overnight memtest found no errors after 3 passes

* swapped all my old random sata cables for brand new sata cables

 

System specs

ryzen r7 1700x

4 x 8gb g.skill ripjaws V 3200MHz

gigabyte gaming 7 mobo

EVGA 650W CQ

MSI 750ti

EVGA 710

3 x 4TB WD red

intel 256G NVME

samsung 500G ssd

1TB 2.5" barracuda drive

And a usb expansion in my 5.25" bay

 

I'd appreciate any help because I'm at a complete loss now.

 

Link to comment

I've never had any problems with 2000-series Ryzens but I have a very early 1700 that suffers from the C6-state issue. Essentially, when it gets into that state the core voltage drops so low that it can't wake up again. Make sure you're using the latest BIOS and follow Johnnie's suggestion in that thread:

I have found that option to work perfectly. There are other suggestions (and some misinformation) in that thread. In decreasing order of effectiveness I've found them to be:

  1. Power supply idle control in BIOS
  2. Running zenstates from the go file
  3. Adding the rcu_nocbs syslinux option
  4. Disabling C6 state
  5. Disabling C states globally

I used to follow the mega bug report that Tom (limetech) refers to but I gave up on it when it descended into chaos. There's a lot of ranting, a lot of speculation and guessing and little discipline. I believe the problem affects only 1000-series chips, and not all of those - I have a newer 1600 that isn't affected. As AMD stopped making them a year ago it's probably time to move on. The 2000-series is much better in every respect and the 3000-series is coming soon. Similarly, X470 and B450 motherboards are better than X370 and B350 ones. That's all to be expected and with the latest BIOS the differences are minimised.

 

One other thing, don't overclock your memory controller. With four sticks of RAM you won't get anywhere close to 3200 MT/s. Look up the specs for your motherboard. Due to bus loading (the number of physical DRAM chips hanging across the bus) it's likely to be 2133 MT/s or lower.

Link to comment

Sounds like a great next step, I never considered that Ryzen had some compatibly issues. 

 

I think the mobo defaults to xmp mode and that is why the memory is clocked at 3200MHz but good point, I'll spend some time over the weekend to manually set the bios and hopefully that makes the server much more stable. 

I don't know if this will fix any of the 84 parity errors though. Am I mistaken or are there thoughts on that?

Link to comment
Just now, gillkohl said:

memory is clocked at 3200MHz

Ryzen with overclocked RAM is known to corrupt data resulting for example in parity sync errors, though if you always get 84 errors there could be other issues at play, post the diagnostics after two successive parity checks, either way you need to respect max RAM speed based on your config.

 

240805646_1stgen.png.387d242436b368abbdbf8a03241bfb27.png

Link to comment

I'll need to get the system stable first. I don't know why by my plugin for FCP does not appears to have a trouble shooting mode so I lose the logs when it crashes which is not equal to but around the same time parity is done. 

Thanks for that table! I probably would have used the MOBO default which is 2133 I believe but based on that table I need to use 1866.

Link to comment
21 minutes ago, gillkohl said:

I probably would have used the MOBO default which is 2133 I believe but based on that table I need to use 1866.

8 GB DDR4 DIMMs could be either single or dual rank. You can look up your precise part number and check. I always use Corsair DIMMs that are on the motherboard's QVL and are single rank and I would use a pair of larger capacity DIMMs in preference to four lower capacity ones. FWIW Intel processors have similar limitations - it's a consequence of being designed to use cheap unbuffered RAM - it's just that Ryzen (well, the 1000- and 2000- series) benefits more from running the fastest RAM it can because the Infinity Fabric is clocked at the same speed as the memory controller. The 3000-series will have the Infinity Fabric clock decoupled from the memory controller.

Link to comment
3 minutes ago, johnnie.black said:

Are you sure about this? Never had any troubles running 4 DIMMS at max base speed with Intel CPUs, also never read about similar limits.

I thought so, but now you've made me question it. I've never built an Intel system with DDR4 memory. I thought there were similar limitations on UDIMM clock speeds due to bus loading but I'm struggling to confirm whether this is a practical issue or just a theoretical one. The Ark entry for, say, the i7 8700K isn't helpful:

Quote

Memory Specifications

Max Memory Size (dependent on memory type)    128 GB

Memory Types    DDR4-2666

Max # of Memory Channels    2

Max Memory Bandwidth    41.6 GB/s

ECC Memory Supported ‡    No

Motherboard manufacturers are similarly unhelpful. Maybe Intel use strong enough output transistors to be able to drive four dual rank UDIMMs at the full quoted speed or maybe you're actually overclocking yours and getting away with it. A lot of people run their Ryzen memory controllers beyond the limits set out in that table and get away with it, but it's technically an overclock and therefore not the best way to run a server. I'll keep digging and see if I can find any hard information.

 

Some time later, this is the best I've been able to find. It's about Haswell Xeon servers using RDIMMs/LRDIMMs so it's not what I was looking for at all. Pages 3 and 4 do show that the more DIMMs you add, the slower you have to clock them though.

 

788104584_ScreenShot2019-03-08at22_09_59.png.c57e53b0dba1ff73e186084d597b6409.png

 

Link to comment
1 hour ago, John_M said:

It's about Haswell Xeon servers using RDIMMs/LRDIMMs so it's not what I was looking for at all.

I've seen it before for those CPUs, but for desktop CPUs and UP Xeons I believe there's no such limits, e.g., Kabylake supports up to 2400Mhz, you can use 1 or 4 DIMMs at that speed without issues, I would guess the main difference is Intel announces the max speed with all DIMMs populated and AMD, possibly to announce higher speeds, does it but they are only valid for when 1 or 2 DIMMs are in use, nothing wrong with that as long as they publish the max speeds depending on the config, and that should be done mostly by the boards manufactures, like Asrock who shows that on the specs for most AMD boards, but I do prefer the way Intel does it, and this isn't new, I remember the exact same issue with AMD Athlon 64 CPUs, they also had similar limitations.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...