Nexus Posted August 2, 2022 Share Posted August 2, 2022 (edited) I have been running an Unraid server for about 3 years, and recently completed a build that was built around lower energy consumption, lower temps and more energy efficient parts. Prior to this build, I was running Unraid on an old gaming pc that was loud, sucked down energy and huge. So my new server footprint is much better (smaller) and all around happy with the hardware side of things. BUT: Something is tremendously wrong with my current set up. Specifically, my docker set up seems tremendously unstable and frankly quite fragile. I am running the following: AdGuard-Home binhex-krusader Cloudflare-DDNS HomeBridge NginxProxyManager Plex-Media-Server Portainer scrypted And I *THINK* it's either scrypted or Plex that is causing problems. For the past few weeks, each morning I wake up and 25% of the time - the UNRAID webui is not responding and I can't SSH in to the box. I need to do a hard power reset. Of course after the system comes back it wants to do a parity check. During one of these resets it corrupted my Plex DB (that I migrated from my v1 server without issue) - while the media files are fine, I lost all the additional metadata (lists, closed captions, cover art) So I don't know how to recover that part of the equation. I thought the way Docker was designed is that it's not supposed to bring down the host. Why does this keep happening? Any idea where to start trying to get things back to a stable place? I am stumped. Thanks. Edited August 2, 2022 by Nexus Update Quote Link to comment
JorgeB Posted August 2, 2022 Share Posted August 2, 2022 Enable the syslog server and post that together with the complete diagnostics after the next crash. Quote Link to comment
Nexus Posted August 2, 2022 Author Share Posted August 2, 2022 Thanks Jorge. I have done that, and I will share that when the inevitable crash happens Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 (edited) Following up: Today the Scrypted container siezed up. I had to stop the Docker services and then they would not restart. So I took the array offline and did a clean reboot. Attached are logs from the USB boot drive and two system diagnostics: One before reboot and one after Archive 2.zip Edited August 3, 2022 by Nexus Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 Damn it. It did it again. I was streaming a movie from my plex container, checked the UI to see if it was using hardware encoding and went to the UI and it was unstable. My system crashed - AGAIN. I can't SSH into the box and now I have to do ANOTHER power cycle. This is supremely frustrating. Is there another way I can direct & timely support from Limetech? While I appreciate the peer to peer model, I'd prefer a more direct line to the company Attached are logs after this crash More logs.zip Quote Link to comment
Kilrah Posted August 3, 2022 Share Posted August 3, 2022 (edited) Your system reports hardware errors (that you've apparently ignored), might want to check mcelog / run a memtest. Your 2 sticks of RAM are mismatched and support different speeds, could be worth trying them separately and seeing if you're stable that way. Edited August 3, 2022 by Kilrah 1 1 Quote Link to comment
ChatNoir Posted August 3, 2022 Share Posted August 3, 2022 1 hour ago, Nexus said: Is there another way I can direct & timely support from Limetech? While I appreciate the peer to peer model, I'd prefer a more direct line to the company There is a paid support from Limetech. But it seems that the link for it that I saved is being rebuild and is currently lacking relevant information. Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 56 minutes ago, Kilrah said: Your system reports hardware errors (that you've apparently ignored), might want to check mcelog / run a memtest. Your 2 sticks of RAM are mismatched and support different speeds, could be worth trying them separately and seeing if you're stable that way. Curious. Thanks. Where can I see that error in the logs I submitted? They are showing the same speed in the BIOS but clearly they are not? Quote Link to comment
Kilrah Posted August 3, 2022 Share Posted August 3, 2022 (edited) 10 minutes ago, Nexus said: Where can I see that error in the logs I submitted? e.g. Aug 2 17:25:32 Altair8800 kernel: mce: [Hardware Error]: Machine check events logged Aug 2 17:42:04 Altair8800 root: Fix Common Problems: Error: Machine Check Events detected on your server ** Ignored For the 2nd line you likely have gotten a notification and then manually set it to be ignored. 10 minutes ago, Nexus said: They are showing the same speed in the BIOS but clearly they are not? They are both shown to be running at 2133MHz but looking up part number the 32GB one is rated 3000MHz and the 16GB one is 2400. Sometimes running fast RAM too slow can cause issues (rarely), but mostly having different rated speeds means all timings are likely to be different and in this case mixing can sometimes be iffy. Edited August 3, 2022 by Kilrah Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 3 minutes ago, Kilrah said: e.g. Aug 2 17:25:32 Altair8800 kernel: mce: [Hardware Error]: Machine check events logged Aug 2 17:42:04 Altair8800 root: Fix Common Problems: Error: Machine Check Events detected on your server ** Ignored They are both shown to be running at 2133MHz but looking up part number the 32GB one is rated 3000MHz and the 16GB one is 2400. Sometimes running fast RAM too slow can cause issues (rarely), but mostly having different rated speeds means all timings are likely to be different and in this case mixing can sometimes be iffy. I’ll pull the 16 G out. Re the machine error - where can I find the logged? I enabled mce - but don’t know where to get details Quote Link to comment
Kilrah Posted August 3, 2022 Share Posted August 3, 2022 (edited) mcelog --client should list what happened, some threads for reference: Edited August 3, 2022 by Kilrah Quote Link to comment
JorgeB Posted August 3, 2022 Share Posted August 3, 2022 Aug 2 11:26:53 Altair8800 kernel: macvlan_broadcast+0x116/0x144 [macvlan] Aug 2 11:26:53 Altair8800 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan] Beside the mentioned possible hardware issues this will also make Unraid crash, these are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)). 1 Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 (edited) 8 hours ago, Kilrah said: mcelog --client should list what happened, some threads for reference: I looked in the log, and all I see is the machine event error. I turned on mcelog and I don't see anything specific being logged. Edited August 3, 2022 by Nexus Quote Link to comment
Nexus Posted August 3, 2022 Author Share Posted August 3, 2022 8 hours ago, JorgeB said: Aug 2 11:26:53 Altair8800 kernel: macvlan_broadcast+0x116/0x144 [macvlan] Aug 2 11:26:53 Altair8800 kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan] Beside the mentioned possible hardware issues this will also make Unraid crash, these are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)). Thanks. I made that change. Thank you Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.