Jump to content

Random Crashes/Freezes with seemingly no rhyme or reason


Clobes

Recommended Posts

Greetings, fine folks of the unRAID forum.

 

I have used unRAID for a number of years now but have never needed to make a call for help until now because, up until now, things were running flawlessly. I really do mean flawlessly.

 

unRAID version 6.9.2

 

It all started, as you have probably heard a thousand times already, when I upgraded my aging 4970K based server to the hardware from my previous gaming PC after recently upgrading. Current hardware is as follows:

 

  • ASUS Crosshair VI Hero
  • Ryzen 2700X (stock speed)
  • 32GB Corsair Vengeance LPX 3600 (stock speed)
  • 1000 watt Corsair RM1000
  • GTX 1070 (for transcoding and whatnot)
  • GT 1030 (for passthrough / general desktop use in a Windows VM)
  • 3x 4tb WD Red drives (1 parity, 2 storage)
  • 2x ADATA SU800 512GB SSDs (Mirrored) (I've heard that ADATA is not so great these days. I bought these in 2018 and have been reliable)
  • Arctic Freezer 50 (beast of an air cooler. everything is silent and chilly)

 

I think that covers the important hardware bits.

 

I have attached the diagnostics file (anonymized). I also added to the zip a copy of the syslog that I have been mirroring to flash as of today. Since enabling the mirror to flash, it has captured I believe 2 crashes/freezes. Hopefully there's something good in there, though I doubt it. I skimmed through it and couldn't find anything particularly notable on either side of a crash. I assume that's because the server can't log anything if it's crashed, but maybe something was captured just prior.

 

A bit more info:

When I first assembled the new Ryzen system, the crashes were immediate and frequent. Within about 2 minutes of getting into the dashboard. I had already implemented the usual Ryzen fix of disabling C-states per Spaceinvader One's somewhat recent video of upgrading unRAID to a Ryzen system. I determined these initial crashes to be a hardware/bios issue due to the Q-code on my motherboard flashing some codes suggesting as such. I eventually discovered the "Power Supply Idle Control" fix. After about 2 hours combing through the bios options, I discovered that newer versions of the bios have removed that option for some unfathomable reason. I downgraded the bios to the (as far as I know) last version of the bios that still had the Power Supply Idle Control option. (Set to "Typical", if that matters)

 

That seemed to resolve the issue! I was able to get through a full parity check with no issues. No issues the rest of the day.

 

Queue sadness: When I woke up the next morning, I checked the motherboard Q-code readout for any problems. No problems reported. All looks good. I go to access the dashboard and it won't load. I realize there's no hard drive LED activity. I try pinging, no response. I plugged in a monitor to the primary GPU. The monitor backlight comes on but there's nothing on the screen, pretty much what it looks like when the unRAID command line goes to sleep; it blanks the screen but doesn't actually turn off the monitor. But of course now I can't wake it back up.

 

After exhausting my options I hold the power button and it goes down. I power it back on, I get back into the dashboard, everything seems normal for a while. Later in the day, I powered on an Ubuntu VM and shortly after accessing the VNC I attempt to move some files around and the whole thing freezes. Go through the same checks as I did that morning, unRAID is frozen again. Reboot. A bit wary, I don't boot the Ubuntu VM again. 

 

Later again that day, I am messing around with a Docker container I found: webtop. A full Linux desktop in Docker. No real reason other than I thought it was neat. Again, unRAID crashes.

 

I figure something about my system doesn't like Linux VMs, though that seems kind of ridiculous. Also at that time I change the "Power Supply Idle Control" setting from "Typical" to something like "low voltage" or whatever the other option is called. I don't boot up another Linux VM/Linux Docker/nothing. I then have just under a week of smooth sailing, using a Windows VM with passed-through GT 1030 as my daily driver. Couldn't be happier.

 

Today, Oct 26, shortly after lunch time (~1:30 pm CST), I go to access a share from my work PC (not an unRAID VM), and the shares are all dead. Check unRAID. Sure enough, it's crashed again. Same symptoms as before. At that point, I didn't even have the Windows VM running. I had not made any changes to the server since the previous bout of crashes. I assumed that I was in the clear (unless I ever wanted to run another Linux VM, I thought. I didn't have time for troubleshooting then. Or maybe it was the "Power Supply Idle Control" setting change I had made that stopped it. Was to afraid to test.). Then this evening, 2 more crashes basically back-to-back. The first one occurred while watching a Youtube video in the Windows VM. Basically the thing I do every evening after dinner, and had done every night on this Windows VM up until now. I reboot the server. When it comes back up I reboot again just 'cuz. When it comes back online, I click Start Array, and it doesn't even start the array before crashing again.

 

A bit baffling, but I'm using the Windows VM right now to write this long-winded post. I did change the "Power Supply Idle Control" back to "Typical", mainly out of a sense of desperation. I doubt that has helped anything, but I've been using it for about an hour now without issue. We are now at present time.

 

So that's my story, long as it is. Hopefully that's enough information to go off of. If you need anything else, just let me know. I appreciate y'all.

cortana-diagnostics-20211026-2120.zip

 

edited to add:

- CPU cooler

- clarified a point

- add unRAID version

Edited by Clobes
Link to comment

Bumping.

 

Also an update: unRAID has been stable since my original post, but I don't trust it given how things have gone the last week.

 

Update: ffs…. It crashed not even 10 minutes after posting the previous update. Bizarre timing. Was not doing anything extraordinary. Just watching a YouTube video in the Windows VM. 

Edited by Clobes
unRAID crashed again after I had said that it hadn’t crashed in a while.
Link to comment
8 minutes ago, ChatNoir said:

You should set up a syslog server to capture what happens prior to a crash.

 

I don't know if you mean a syslog server separate from the syslog that is being mirrored to the flash (which I included in the diagnostic zip in the OP), but I have one of those as well set up on another computer. I've attached those logs to this post. It contains many more days' worth of logs from before I started mirroring it to flash. There have been several crashes since setting this up, so hopefully there's some good info in there.

 

I didn't originally attach the syslogs that were going to an external syslog server since I had included the syslog that was being mirrored to flash. I included that in the diagnostic zip file in my original post.

syslog.zip

Link to comment

Had probably 6 or 7 crashes in total yesterday, then 1 this morning. Still cannot find any patterns of activity that cause the crashes. Some crashes happen when I'm not using unRAID in any way, some happen while using it. It's random. I've looked through the logs myself but I don't see anything note-worthy. Then again, maybe I just don't know what to look for. And also I still don't know how the syslogs would be able to catch anything if the server is crashed unless something happens before it completely locks up. idk.

 

Is there another place I should post to get help?

I traded most of my old hardware so I can't revert back to what I know was stable so I'm kinda just stuck with this mess for now unfortunately.

Link to comment
46 minutes ago, JorgeB said:

There's nothing in those logs, in the first one you posted there's something but nothing relevant before the crashes, that usually suggests a hardware problem, unfortunately it's difficult to diagnose without starting to swap some parts, and if possible try with Intel parts, much better for Linux/Unraid.

 

Damn. Not what I was hoping to hear, but I appreciate you taking a look. Guess I'll hit up ebay for some Intel stuff.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...