Disappointed in stability


Recommended Posts

I have been running an Unraid server for about 3 years, and recently completed a build that was built around lower energy consumption, lower temps and more energy efficient parts. Prior to this build, I was running Unraid on an old gaming pc that was loud, sucked down energy and huge. So my new server footprint is much better (smaller) and all around happy with the hardware side of things.

 

BUT: Something is tremendously wrong with my current set up. Specifically, my docker set up seems tremendously unstable and frankly quite fragile. 

 

I am running the following: 

 

AdGuard-Home
binhex-krusader
Cloudflare-DDNS
HomeBridge
NginxProxyManager
Plex-Media-Server
Portainer
scrypted

 

And I *THINK* it's either scrypted or Plex that is causing problems. For the past few weeks, each morning I wake up and 25% of the time - the UNRAID webui is not responding and I can't SSH in to the box. I need to do a hard power reset.
Of course after the system comes back it wants to do a parity check. 

 

During one of these resets it corrupted my Plex DB (that I migrated from my v1 server without issue) - while the media files are fine, I lost all the additional metadata (lists, closed captions, cover art) :( So I don't know how to recover that part of the equation. 

 

 

I thought the way Docker was designed is that it's not supposed to bring down the host. Why does this keep happening? 

 

 

Any idea where to start trying to get things back to a stable place? I am stumped. 

 

Thanks. 

 

Edited by Nexus
Update
Link to comment

Damn it. It did it again. I was streaming a movie from my plex container, checked the UI to see if it was using hardware encoding and went to the UI and it was unstable. 

My system crashed - AGAIN.
I can't SSH into the box and now I have to do ANOTHER power cycle. 

This is supremely frustrating. Is there another way I can direct & timely support from Limetech?  While I appreciate the peer to peer model, I'd prefer a more direct line to the company

 

Attached are logs after this crash

 

 

 

More logs.zip

Link to comment

Your system reports hardware errors (that you've apparently ignored), might want to check mcelog / run a memtest.

Your 2 sticks of RAM are mismatched and support different speeds, could be worth trying them separately and seeing if you're stable that way.

Edited by Kilrah
  • Like 1
  • Thanks 1
Link to comment
1 hour ago, Nexus said:

Is there another way I can direct & timely support from Limetech?  While I appreciate the peer to peer model, I'd prefer a more direct line to the company

There is a paid support from Limetech.

But it seems that the link for it that I saved is being rebuild and is currently lacking relevant information. :(

Link to comment
56 minutes ago, Kilrah said:

Your system reports hardware errors (that you've apparently ignored), might want to check mcelog / run a memtest.

Your 2 sticks of RAM are mismatched and support different speeds, could be worth trying them separately and seeing if you're stable that way.

Curious. Thanks. Where can I see that error in the logs I submitted?  They are showing the same speed in the BIOS but clearly they are not?

Link to comment
10 minutes ago, Nexus said:

Where can I see that error in the logs I submitted?  

e.g.

Aug  2 17:25:32 Altair8800 kernel: mce: [Hardware Error]: Machine check events logged
Aug  2 17:42:04 Altair8800 root: Fix Common Problems: Error: Machine Check Events detected on your server ** Ignored

For the 2nd line you likely have gotten a notification and then manually set it to be ignored.

 

10 minutes ago, Nexus said:

They are showing the same speed in the BIOS but clearly they are not?

They are both shown to be running at 2133MHz but looking up part number the 32GB one is rated 3000MHz and the 16GB one is 2400. Sometimes running fast RAM too slow can cause issues (rarely), but mostly having different rated speeds means all timings are likely to be different and in this case mixing can sometimes be iffy. 

 

 

Edited by Kilrah
Link to comment
3 minutes ago, Kilrah said:

e.g.

Aug  2 17:25:32 Altair8800 kernel: mce: [Hardware Error]: Machine check events logged
Aug  2 17:42:04 Altair8800 root: Fix Common Problems: Error: Machine Check Events detected on your server ** Ignored

 

 

They are both shown to be running at 2133MHz but looking up part number the 32GB one is rated 3000MHz and the 16GB one is 2400. Sometimes running fast RAM too slow can cause issues (rarely), but mostly having different rated speeds means all timings are likely to be different and in this case mixing can sometimes be iffy. 

 

 

I’ll pull the 16 G out. 
 

Re the machine error - where can I find the logged? I enabled mce - but don’t know where to get details 

Link to comment
Aug  2 11:26:53 Altair8800 kernel: macvlan_broadcast+0x116/0x144 [macvlan]
Aug  2 11:26:53 Altair8800 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

 

Beside the mentioned possible hardware issues this will also make Unraid crash, these are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)).

  • Thanks 1
Link to comment
8 hours ago, Kilrah said:
mcelog --client

should list what happened, some threads for reference:

 

 

 

 

 


I looked in the log, and all I see is the machine event error. I turned on mcelog and I don't see anything specific being logged.
 

Edited by Nexus
Link to comment
8 hours ago, JorgeB said:
Aug  2 11:26:53 Altair8800 kernel: macvlan_broadcast+0x116/0x144 [macvlan]
Aug  2 11:26:53 Altair8800 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

 

Beside the mentioned possible hardware issues this will also make Unraid crash, these are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)).

Thanks. I made that change. Thank you

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.