unRAID Random Crashes (v 6.5.3)


Recommended Posts

Seeing random crashes becoming more and more constant. When I first booted up I got about 2 days uptime before the first crash, the most recent one happened after only 11 minutes. Hardware listed below:

 

Motherboard: ASRock ab350m Pro4 (BIOS v4.70 – there is one newer version)

CPU: AMD Ryzen 3 2200g

Memory: ADATA 8gb ddr4-3000 xpg z1  x 2

Storage: 240GB SSD, 4TB HDD

 

Attaching the FCP log, as well as the diagnostics. Also grabbed this photo of the unRAID console when the crash happened. Sometimes the console is non-responsive. Also I have C-States disabled and the line of code (which of course now I can't find) that lime-tech recommended Ryzen users add.

 

edit: Guess I should ask an actual question.. should I start by updating bios? Are there any other steps I can take here? I'm not finding anything specific about the errors I see in the console.

 

New here so let me know if any info is missing

tayshserve-diagnostics-20180805-1852.zip

FCPsyslog_tail.txt

IMG_4347.jpeg

Edited by tayshserve
Add a couple of more specific questions
Link to comment

The screenshot shows file system corruption on the hard disk and worse on the SSD - "IO failure" looks like a hardware fault.

 

Aug  5 18:38:46 tayshserve kernel: pcieport 0000:00:01.2: AER: Corrected error received: id=0008
Aug  5 18:38:46 tayshserve kernel: pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Receiver ID)
Aug  5 18:38:46 tayshserve kernel: pcieport 0000:00:01.2:   device [1022:15d3] error status/mask=00000080/00006000
Aug  5 18:38:46 tayshserve kernel: pcieport 0000:00:01.2:    [ 7] Bad DLLP              

See here: 

 

ASRock's support for Raven Ridge has been poor until  very recently, with a number if broken BIOSes, so I would certainly upgrade it to the very latest as @johnnie.black suggests.

 

If that doesn't help you might try adding the

pci=noaer

boot option. See the discussion here.

Link to comment

Alright so I tried adding 

pci=noaer

To my syslinux.cfg file, as well as nomsi and pcie_aspm=off. None of which helped to solve my problems.. I also ended up seeing another error that read "SATA link down". I have also removed the SSD (which was a cheapo thing) from my setup, and now I'm going to actually return my motherboard for a new one since my current one was open box. I just want to eliminate all the possibilities here. Also when I put everything back together tonight I'll be using some fresh SATA cables. Really hoping this isn't pointing to an HDD problem.

 

I'm hoping that switching from a b350 to a b450 board helps shore up any remaining issues I may have related to Ryzen.

Link to comment
37 minutes ago, PSYCHOPATHiO said:

as a Ryzen server owner with my fare share of crashes, check memory compatibility as it also can cause some freezes & crashes.

I had to down-clock my memory speed to increase stability,

Just another quick update @PSYCHOPATHiO – I went through the memory QVL and I actually can only find my memory listed as supported for pre Raven Ridge CPUs. When I look to the 450 motherboards they all show support. Motherboard upgrade is looking promising.

 

Link to comment

@tayshserve I have the corsair 3200 MHz that is incompatible  with my board or CPU, I downclocked them to 3000 MHz I was able to get over 8 days without a crash then i had to do a reboot for some updates.

try downclocking as well or revert to memory stock speed and that might help, unfortunately with Ryzen memory speed makes a big difference when it comes to performance.

 

also check bios updates

 

good luck

Edited by PSYCHOPATHiO
Link to comment

@tayshserve Did you update the BIOS? That was the first thing to do before trying the boot option. As I said, ASRock's support for Raven Ridge has been poor and all but the most recent (as in, released in the past few days) BIOSes have been broken. I have an ASRock Fatal1ty B350 mITX board with an R5 2400G (not using it for unRAID) and I've experienced this and it's confirmed on the ASRock support site. Anything earlier than August 2018 is bad if you're using a 2000-series processor.

 

Edit: I see that you did update the BIOS. I missed that earlier.

 

@PSYCHOPATHiO Even "downclocking" your 3200 MHz-capable memory to 3000 MHz is actually still overclocking your processor's memory management unit. 1000-series Ryzens are designed to run their MMUs at maximum 2666 MHz and 2000-series at 2933 MHz (and, depending on bus loading, could be significantly lower - I notice you have four DIMMs but I don't know their part numbers so can't tell exactly how many SDRAM chips are loading the bus) so anything faster is technically an overclock and might need an increase in SoC voltage for stability. If you run it at 2666 instead, your uptime should be better. There are good reasons why some DIMMs are not on the QVL!

 

Edited by John_M
BIOS was updated; typo
Link to comment

Sorry for the late reply here, we started our move last week so I didn't have much time for tinkering. Thanks for all the help and input @John_M and @PSYCHOPATHiO, I really appreciate it.. So I went a head and tried down clocking my memory speed to about 2400mhz and got the same result. At that point I was really frustrated – but I did some more research on motherboards and using the information John gave me about ASRock's compatibility issues with Ryzen I noticed that other manufacturers are seeing the same problems. Add on top of that the fact that I was using an open box motherboard from Microcenter and it started to be clear to me that I was "getting what I paid for". So, super frustrated, I went to Microcetner and bought myself a brand new Asus TUF b450m-Plus motherboard and swapped out the (extremely) budget Inland Professional SSD for a Samsung model. I realized that the b450 chipset boards just came out only a few weeks ago, so these guys come from Asus ready to roll with Ryzen 2000. So I put everything together – for the time being with one stick of ram, still downclocked to 2400mhz.. and presto, things seems to be working fine.

 

I know this isn't much of a resolution, as all I did was replace my motherboard (rather blindly :D).. but I wanted to post my story here so others going Ryzen can follow suit. Just spend the extra money (honestly they're like $10-15 more) and get a b450 board. Much better compatibility with Ryzen, and plays nicer with memory. Also, if you're going to save some money and buy open box retail parts.. maybe spend real money on the motherboard. Since most of us are looking to run 24/7 and expect some reliability there's no reason to cheap out on the motherboard. I started to realize that retailers (or ebayers) aren't going to stress test their item for hours on end.

 

Anyway.. TLDR: If you're going with a Ryzen processor, go with a motherboard that has supported it from it's initial release. You'll have a much better experience in my opinion.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.