T_Matz Posted January 15 Share Posted January 15 As the topic states my unraid server becomes unresponsive after a few days and I do not know what is causing it to crash. Im unable to log into the webui and if i go to the actual computer the shell is also unresponsive. Just started having the logs mirrored to the flash. running unraid 12.6 In the past weeks ive had 5 crashes. The only change i made to the server is adding a homeassistant VM and removing 2 cores from a windows VM. My setup i have a homeassistant VM, a windows VM running blue iris with a NVIDIA gpu passed thru to the VM, and I have 14 dockers running. Normal operation the server only runs at 8% utilization I can give more details anywhere needed. I will post logs if it happens again. Any advice that i can start looking into in the mean time would be greatly be appreciated Rig details Intel Core i7-12700K | ASUS ROG Strix Z690-G Gaming WiFi | CORSAIR Vengeance 64GB (2 x 32GB) RAM DDR5 5200 | 5 - Seagate IronWolf Pro 12TB | SAMSUNG 980 PRO M.2 2TB (cache drive) | SAMSUNG 970 EVO PLUS m.2 250GB (VM drive) | Gigabyte NVIDIA 3050 (VM GPU) | WD Purple Pro 14 TB (VM Passthrough) | Sandisk Ultra Dual -64 GB (boot device) Quote Link to comment
itimpi Posted January 15 Share Posted January 15 You should post your system's diagnostics zip file in your next post in this thread to get more informed feedback. It is always a good idea to post this if your question might involve us seeing how you have things set up or to look at recent logs. Quote Link to comment
T_Matz Posted January 15 Author Share Posted January 15 I apologize I should have posted the logs while it was running. The system did crash again last night so here are the mirrored logs from the flash drive, i did notice that just before the crash all my cores were pinged to 100% utilization. Also, i tried to ping the IP address of the server and it comes back unresponsive. Any help in fixing this would greatly be appreciated. Also just recently the Fix Common Problems plugin identified that Macvlan and bridging has been found. This might cause issues with stability on your server. if i switch to ipvlan like the 6.12.4 2023-08-31 upgrade notes suggest most of my dockers do not work. if i disable bridging again my dockers stop working. so im not sure what I should be doing. syslog Quote Link to comment
T_Matz Posted January 15 Author Share Posted January 15 Update: did figure out one configuration issue. I accidently had my Homeassistant VM and Windows VM utilizing 2 of the same cores. I accidently selected the wrong ones. I will update if there are any changes to the system crashing Quote Link to comment
T_Matz Posted January 15 Author Share Posted January 15 Sorry thought I included it in the last post. Attached diagnostics to this post tower-diagnostics-20240115-0957.zip Quote Link to comment
awkwrrd Posted January 15 Share Posted January 15 This sounds similar to an issue I was having where I could not connect to Unraid and had to hard reset the server to gain access. I posted what ended up fixing it for me here, Check your Docker network settings and your BIOS settings. Quote Link to comment
T_Matz Posted January 16 Author Share Posted January 16 Ill have to check the C states later tonight that is a great idea. I had mine disabled but after a recent rebuild not sure if it was reset Quote Link to comment
T_Matz Posted January 16 Author Share Posted January 16 I edited my bios to disable EIST and ensured no VMs have utilized the same cores. I will update if the system locks up again Quote Link to comment
T_Matz Posted January 17 Author Share Posted January 17 -Update even with the changes the Unraid server becomes unresponsive. Woke up this morning to the server webui down, also when i went to the physical machine im unable to do anything. Had to hard shut down the system. attached are the latest logs and diagnostic files. System was up with zero issues for 2 days and then it just goes down. this makes the 7th hard shutdown in a little over a week. syslog tower-diagnostics-20240117-0838.zip Quote Link to comment
JorgeB Posted January 17 Share Posted January 17 Unfortunately there's nothing relevant logged, this usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
T_Matz Posted January 17 Author Share Posted January 17 Is there any options to capture the logs to figure this out? I have the logs currently flashed to my usb. Quote Link to comment
T_Matz Posted January 17 Author Share Posted January 17 Fix common problems plug in keeps warning about mcvlan and bridging found. In the release notes it describes that this can cause issues with crashes. But if i disable bridging or enable ipvlan my dockers are unreachable. If i make these changes do i need to edit my dockers as well? Quote Link to comment
JorgeB Posted January 17 Share Posted January 17 32 minutes ago, T_Matz said: In the release notes it describes that this can cause issues with crashes. It can, though usually there are macvlan call traces pointing to that. 33 minutes ago, T_Matz said: But if i disable bridging or enable ipvlan my dockers are unreachable. That's not normal, you should just need to make the change. Quote Link to comment
T_Matz Posted January 17 Author Share Posted January 17 What files are you looking at to find issues Id like to be able to look/watch to find issues in the future Quote Link to comment
JorgeB Posted January 17 Share Posted January 17 The syslog from the syslog server, I assume that's where the syslog you posted came from? Quote Link to comment
T_Matz Posted January 18 Author Share Posted January 18 The system froze again last night. Its strange because the system was stable for weeks and just randomly it started to freeze like this and its always at night when the system is idle. sometimes daily other times it goes a day or two. ill be putting the system on safe mode, how long should i run it in safe mode for before turning on services? syslog Quote Link to comment
JorgeB Posted January 18 Share Posted January 18 There is a NIC related issue in the last syslog, was at this time that you lost the server? Jan 18 09:05:48 Tower kernel: NETDEV WATCHDOG: eth0 (igc): transmit queue 0 timed out Quote Link to comment
T_Matz Posted January 18 Author Share Posted January 18 For sure my server went down around 1035 my time if the webui became unresponsive before that I do not know. I looked at my blue iris recording and all recordings stopped at 1035. But i had a parity check started and it seems to have finished around 07 today, numerous hours after the VM lost connection. So im super confused Quote Link to comment
JorgeB Posted January 18 Share Posted January 18 So possibly the NIC hanged but recovered, try in safe mode, run it without services as long as you think it's needed to confirm if it still hangs or not, then start enabling them one by one and retesting. Quote Link to comment
T_Matz Posted January 18 Author Share Posted January 18 So i was bringing down my server but looked at Pihole real quick and noticed that it was still running normally until i powered down the server. So things that i know for sure- Server webui is unresponsive if i plug in directly to the server cant shutdown with command Server does not shut down gracefully by just pressing the power button once windows VM running blue iris stops recording at 1030PM Pihole continued to run until i shut the server down by holding the power button Quote Link to comment
trurl Posted January 19 Share Posted January 19 2 minutes ago, T_Matz said: Question should i be able to access my webui in safe mode? I can access my server on unraid connect but not via the webui Do you mean with a browser on another computer on your network? Quote Link to comment
T_Matz Posted January 19 Author Share Posted January 19 14 minutes ago, trurl said: Do you mean with a browser on another computer on your network? I was able to access it and deleted the comment, I made a mistake and for some reason the laptop i was on was trying to access the server using the old static IP address. once I realized it and input the correct IP address on my browser on another computer on my network i got in. 24 hours into safe mode and the server did not crash. Im going to start dockers tomorrow 2 at a time. I just wish i knew what was causing this and why it always happens at night. Quote Link to comment
T_Matz Posted January 21 Author Share Posted January 21 Unraid has officially been up for 48 in safe mode. I turned on most of my dockers today. There are a few still off, but usually by now the server would have crashed. I'm going to turn on the last 4 dockers tomorrow and let the server run for two days. After that if everything is running and not crashing I want to start bringing my plugins online but don't want to start them all at the same time is there a way to do this? Quote Link to comment
JorgeB Posted January 21 Share Posted January 21 You can uninstall them all and then re-install one by one, or a few at a time, and retest. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.