Unraid constantly crashing (round 2)


Recommended Posts

Hi All,

 

I came to the forum back in April with this issue and despite some seriously good effort on the community's part was unable to resolve the issue of my unraid setup hard crashing - with nothing in the sys logs to denote an issue. The crash drops everything - graphics, networking even usb power but fans continue to spin and requires to be powered off and back on via the power button.

Since the last topic I've replaced the mobo (previously an ASRock B450 that was driving me nuts with hokey BIOS updates) the PSU (I'd gone cheap and figured it was worth a punt) and most recently the RAM (favouring a slightly slower speed and dual channel 2x8GB).

 

I ended up running Windows Server for 3 months with perfect reliability but it was never the solution I wanted to be left with.

 

In the last 48 hours, I've since used a clean USB with a fresh install (6.9.0-rc2) but continue to be plagued by crashes anywhere between 10 minutes to several hours in - I even had a few days of uptime last week that gave me a false sense of security!

 

FWIW, I've also run over 30hrs of memtest with clean results.

 

Any guidance would be enormously appreciated.

 

Current Hardware

- MSI B450 Tomahawk MAX

- Ryzen 7 2700

- 16GB (2x8GB) Corsair Vengeance LPX 2400Mhz

- Seasonic GX550 PSU

- Corsair 240GB NVME - Cache drive

- 2 x 4TB WD Red

 

BIOS

- Latest update

- Set to Typical Current Idle

- IOMMU Enabled

- SVM Enabled

 

Attached are diagnostics.

 

 

tower-diagnostics-20210204-0051.zip

Edited by BishBashBodge
Link to comment

Hey @JorgeB does the following look right for BIOS?

 

image.png.15d0e9206c109a25fb52baa7937dbf85.png

 

image.png.44dfc59a3b23a5d362950e000bb5e541.png 

 

@Vr2Io, followed your advice last night (actually, I disabled VMs and Docker w/out going into safe-mode for some reason) and I got a little over 16hrs of uptime before shutting it down. Good news.

 

In light of that, I reinstalled the parity disk and when starting the array it pretty much immediately crashed.

I can see a smart test on the parity disk is reporting a read error. Could that cause a hard lockup at such a fundamental level? I'm currently of the (possibly naive) assumption that UnRaid is designed to ride out shoddy drives! 

 

image.png.a0b6eb011ca198c116d25a75fde1fab8.png

 

I've installed the preclear plugin and am currently running a preclear on this drive to see if it resolves the read error.

Aware that I probably want to be shot it altogether, but perhaps the read error could be caused by multiple nasty crashes? I'll certainly look to replace it, but would like to know if it's the culprit before doing so. Let me know if this is foolhardy.

Furthering my curiosity in this regard, in hindsight, the 3 Mo on Windows server used a different (motley) assortment of drives for the purposes of testing.

Link to comment
19 minutes ago, BishBashBodge said:

I'm currently of the (possibly naive) assumption that UnRaid is designed to ride out shoddy drives! 

Nope. Unraid requires ALL remaining drives to be read perfectly from end to end when rebuilding a failed drive. Having a failing drive in the parity array jeopardizes the ability to properly rebuild a dead drive. Since drives can and do fail with little or no warning, it's imperative that any drive that warns you it might fail must be replaced ASAP.

Link to comment
21 minutes ago, BishBashBodge said:

common, or even plausible, for a failing drive to cause unraid to crash so drastically though?

No. Hard crashes are typically hardware. Have you been through all the ryzen troubleshooting? Unfortunately ryzen systems are much more prone to crashes due to the factory settings being too aggressive, and some boards just seem to be worse than others.

Link to comment

I believe so, BIOS settings as above followed that advice and the replacement of the RAM was also spurred by it - picking a validated set for the Mobo and with lower speeds (XMP or whatever its called is turned off too). I took advice from a colleague that the B450 Tomahawk mobos were well regarded but I've no direct knowledge.

Given that its a 2nd Gen (2700) Ryzen I think the need to disable "Global C-States" along with the unraid boot param (for c-state 6) are not applicable.

 

I'd estimated that Ryzen, with its generous core count, was going to be the only way I could achieve a multi-purpose nas/plex/vm box at an affordable price, but given that I've replaced pretty much everything but the CPU I'm beginning to accept that I chose poorly!

Link to comment
22 hours ago, BishBashBodge said:

I ended up running Windows Server for 3 months with perfect reliability but it was never the solution I wanted to be left with.

Problrm likely relate software issue rather than hardware, If it crash only at Unraid.

My previous 1st gen Ryzen work great with Unraid.

Link to comment

And to counter that.... I had a 2200G with a Asus prime b450 - I tried all the AMD tricks.  None of them really made any difference one way or another.

The only thing that seemed to help was putting RAM at its stock speed rather then what it was capable of (3200).

 

I never had any crashes when in general use, just when the system was pushed it would often freeze.

 

In the end I gave up and got a Intel 10400 system and it has no issues and is much more powerful then the 2200G.

My old AMD system became a general purpose desktop with Ubuntu 20.04 and has no issues no matter how hard its pushed.

 

From reading these forums, it seems AMD ryzen is pretty hit or miss with unraid.

My previous system as an AMD Athlon and it had no issues.

 

Link to comment
  • 3 weeks later...

Thought I'd check in to hopefully finally close of the topic.

 

I've abandoned Ryzen and have bought a new mobo w/ 10th Gen Intel i3. Everything, as you'd expect, is all a little less "exciting" with this set up. I've lost several cores in the process and will likely drop the ambition of running a passthrough VM, but with so much hardware spare as a result of this endeavour, I've now more than enough to build a standalone pc :D

On the plus side now having access to Intel QuickSync hardware transcoding means that, if anything, Plex is even more performant during transcodes.

 

Many thanks for everyone's time and assistance.

Link to comment
1 hour ago, BishBashBodge said:

Thought I'd check in to hopefully finally close of the topic.

 

I've abandoned Ryzen and have bought a new mobo w/ 10th Gen Intel i3. Everything, as you'd expect, is all with this set up. I've lost several cores in the process and will likely drop the ambition of running a passthrough VM, but with so much hardware spare as a result of this endeavour, I've now more than enough to build a standalone pc :D

On the plus side now having access to Intel QuickSync hardware transcoding means that, if anything, Plex is even more performant during transcodes.

 

Many thanks for everyone's time and assistance.

Good news !!

 

Seems some post about B450 report crash but no problem on other OS, i.e @rilles . I have Gigabyte B450i ( 2400G ), but never run with Unraid, it rock stable with Windows.

 

1 hour ago, BishBashBodge said:

is all a little less "exciting"

Later upgrade CPU also fine.

 

I buy a used CPU in recent,  Windows stress test show have memory error, this drive me nervous in few days because the CPU can't return .....

Lucky, finally rule out one of RAM module was the cause. 👏👏

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.