Server Becoming Unresponsive/Stopping


Recommended Posts

Hello All,

 

Over the past few months I've been having strange issues with my Unraid box ever since upgrading to 6.10.3. The issue initially started as suddenly loosing network access to my server. I couldn't get to it via host name or its IP. I could however get to the box directly so the first time it happened I issued a reboot command via the CLI. However, nothing happened so I was forced to hold the power button to get it to come back up. It rebooted and did a parity check with zero issues, I thought it was weird but since I've never had any issues with my Unraid box in the 3 years I've been running it, figured it was a fluke.

 

Then maybe two weeks later the same exact thing happened, this time I had done enough googling to grab the diag which is attached below. I looked through it but have to admit I didn't find anything glaring. I was doing some very light mining on my rig with the 1080ti that's in there and figured maybe that was the culprit even though I had no issues with this for months at this point. Regardless after going through the same reboot process of not being able to get it to reboot through the CLI I had to hard shut it down. It came back up did its parity check and I uninstalled the miner.

 

Today after a little over a full month, my server was completely unresponsive. Even plugging directly into it didn't work like it had before so I was unable to pull a diag file this time. I'm hoping something in the previous diag file can shed some light on the issue. I'm getting worried that if this continues eventually it'll lead to errors on my disks from all the reboots. I hard rebooted again and its currently doing a parity check which I'm hoping comes back with no errors.

 

A brief background, the rack is battery backed up, and is on a dedicated breaker that all of my networking gear is on that is also battery backed up, so I do not believe it is power failure related or anything like that. After the second time it happened, I figured since I couldn't reach it via the network it had something to do with me enabling the Unraid issued certificate, so I disabled that and went back to my original local host setup. I'm not sure what else to look at and I'm hoping someone more savvy than I am can show me the way.

 

Thanks in advance for any help

edi-diagnostics-20220722-1610.zip

Link to comment

A lot of crashing going on, but not clear what the reason is, I would recommend running memtest for a few hours to rule out any major RAM issue, if that doesn't help try the below, though the crashes don't mention it directly:

 

Switch docker network to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right))

 

 

Link to comment

Thank you @JorgeB for the quick reply. Once the parity check is done I'll reboot into the bios and run a memtest. Luckily, I do have some spare RAM I think if that turns out to be what the issue is. But again the system has been rock solid for over 3 years now so who knows.

 

On the docker network change, the reason its configured with macvlan is to facilitate inter-docker communication with the host. If I remember correctly, I followed one of spaceinvaderone's guide a few years back. Have things changed and should macvlan not be used anymore? Also, will ipvlan still allow for the same type of communication without me having to change my docker settings?

Link to comment
8 minutes ago, SamuraiMarv said:

Have things changed and should macvlan not be used anymore?

It's a recurring reason for server crashes, doesn't affect all users, but when it does it's usually after an Unraid update.

 

9 minutes ago, SamuraiMarv said:

Also, will ipvlan still allow for the same type of communication without me having to change my docker settings?

It should.

 

Link to comment

@JorgeB I ran 8 passes of memtest86 over 24 hours after the parity check came back with no errors. Both passes (can only do 4 at a time with the free version of memtest86) came back with zero errors. I feel pretty confident in saying the memory is good, I've made the ipvlan change like you suggested and everything is back up and running.

 

My question is, is there anything I should be looking out for or anyway to predict if this will occur again? If so and there are specific signs is there anything I can do to prevent it?

 

Thanks again for all the help.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.