count-zero Posted December 10, 2023 Share Posted December 10, 2023 Hi, I recently upgraded to 6.12.6 from 6.10.something, and since then, my server has been consistently becoming unresponsive. Remote access goes dead, the web UI times out, and I get connection refused when trying to SSH. I have a monitor connected to the server, and there's simply no display when this happens, and keyboard inputs do nothing. This has happened now 3 times in the last 5 days or so. The only way to get it out of this state seems to be a hard reboot, which I'm always hesitant to do, because the array seems to still be started when this happens, and the first time I did the hard reboot, I corrupted one of my disks and had to repair the filesystem to get it back into the array. Parity check that ran after the second time it was hard rebooted found ~700 errors across 8TB of parity, so I'm worried about further corruption if this continues to happen. After the second crash, I have configured syslog to dump onto the array, but in looking at the log, there doesn't seem to be anything logged during the problem. I've attached both the diagnostics file and the syslog file that is on the array. syslog-192.168.1.2.log cherry-diagnostics-20231210-1249.zip 1 Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 Nothing relevant in the syslog, does it cover a crash? Quote Link to comment
count-zero Posted December 11, 2023 Author Share Posted December 11, 2023 Yes. It crashed some time between Dec 9 19:42:03 (I can see I logged in remotely) and Dec 10 12:30:59 (when I rebooted the server). There's simply nothing logged between those times, so I can't tell exactly when it went unresponsive. This is the only crash that has happened since I started the syslog. I've set up external monitoring now, so I should get a better idea of when exactly it crashes, but I don't know how much help that will be if nothing is logged during the crash... Quote Link to comment
count-zero Posted December 11, 2023 Author Share Posted December 11, 2023 Quote There's simply nothing logged between those times I misspoke - it sent an email between those times, but that email said everything was green. Quote Link to comment
JorgeB Posted December 11, 2023 Share Posted December 11, 2023 Having nothing relevant logged usually points to a hardware issue, one thing you can try is to boot the server in safe mode with all docker/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
count-zero Posted December 12, 2023 Author Share Posted December 12, 2023 It just crashed again this morning. again, nothing in the syslog at the time of the crash. When you say hardware issue, do you mean the USB, or my physical hardware (motherboard/etc)? Any diags I can run to test the hardware and see what might be failing? Not sure if unraid has anything like that built in... my motherboard is old enough that it doesn't have built-in diagnostics that I'm aware of... Quote Link to comment
JorgeB Posted December 12, 2023 Share Posted December 12, 2023 32 minutes ago, count-zero said: When you say hardware issue, do you mean the USB, or my physical hardware (motherboard/etc)? Usually the latter, try running memtest or using just one stick of RAM, if the same try a different one, that will basically rule out RAM issues. Quote Link to comment
bastl Posted December 13, 2023 Share Posted December 13, 2023 @count-zero I have similiar issues, random crashes and nothing in the logs. For me all started with the 6.12.x builds. What I noticed during the last weeks as long as I'am not logged in Unraids webui it doesn't crash. I now close any VNC VM windows and logout from the webui if I don't need to use it. Firefox on a Windows machine I use tu administrate the server. @JorgeB did you heared about that "phenomenon"? I have tested basically everything. Disks are ok, no errors. Memtest no errors. Switched to different power outlets, configured all sorts of stuff in the BIOS, with or without virtualisation, different power saving modes, disabled all sorts of devices like wifi or BT cards. Also disabling Docker or VMs or even running Unraid in Save Mode for a couple days didn't help. It always crashes between 4hours uptime up to 3-4 days with nothing logged. Now the interesting part. During all that crashes I had a Firefox window on another Windows box with different Unraid pages opened. Sometimes the Docker page, next time the Dashboard or the Main tab. As soon as I log out and close the Firefox Tabs surprise surprise crashes are gone. I saw a lot of threads opened with random freezes and crashes with the 6.12.x Unraid builds. Most people had issues with MacVLAN, switching to IPVLAN didn't help for me. Maybe this is something you can tell the people to test to pin down the problem. Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 12 minutes ago, bastl said: As soon as I log out and close the Firefox Tabs surprise surprise crashes are gone. There have been other reports of issues when browsers are left open, but usually there's are nchan or similar errors logged. Quote Link to comment
bastl Posted December 13, 2023 Share Posted December 13, 2023 1 hour ago, JorgeB said: usually there's are nchan or similar errors logged unfortunately not for me. I already checked all the logs for nchan errors. Is there a chance to increase the log level from unraid to get more output? Quote Link to comment
JorgeB Posted December 13, 2023 Share Posted December 13, 2023 10 minutes ago, bastl said: Is there a chance to increase the log level from unraid to get more output? Not that I know of. Quote Link to comment
count-zero Posted December 13, 2023 Author Share Posted December 13, 2023 1 hour ago, bastl said: What I noticed during the last weeks as long as I'am not logged in Unraids webui it doesn't crash. Interesting… I do leave the unraid remote management window open often. I will try closing it and see if my system is stable. If still unstable, I will continue with hardware diags, but this did all start after I upgraded to 6.12.x, so I’m interested to see if it’s the same issue. Quote Link to comment
count-zero Posted December 14, 2023 Author Share Posted December 14, 2023 Okay, so leaving the remote management window closed did not help it from crashing. still crashed. THIS TIME, I have a bunch of kernel errors in the syslog from before the crash. I noticed it start becoming unresponsive around Dec 14 15:00:00, but looking at the logs, the kernel errors started well before that, around Dec 14 00:10:32. I've attached the logs. @JorgeB could you take another look and let me know what you think of these logs? Still suspected hardware issues, or something with Unraid? I haven't been able to leave it in basic NAS mode yet, so that test is still outstanding, but maybe these logs will shed more light on the issue. syslog-192.168.1.2(3).log Quote Link to comment
Solution JorgeB Posted December 15, 2023 Solution Share Posted December 15, 2023 Dec 14 15:13:40 cherry kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Dec 14 15:13:40 cherry kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces will usually end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)), then reboot. Quote Link to comment
count-zero Posted December 15, 2023 Author Share Posted December 15, 2023 Thanks, I've made that change and will monitor my system to see if it's stable. Quote Link to comment
JorgeB Posted December 15, 2023 Share Posted December 15, 2023 Don't forget to reboot. Quote Link to comment
count-zero Posted December 15, 2023 Author Share Posted December 15, 2023 1 minute ago, JorgeB said: Don't forget to reboot. yep, I rebooted after the docker vlan change. stable so far, but the crashes usually took 18 hours or more to happen, so i’ll be monitoring over the next few days. I appreciate your help so far! 1 Quote Link to comment
SomeoneOnLine Posted December 18, 2023 Share Posted December 18, 2023 Just came across this thread. Good to know its not just me. My server seems to be locking up or something every few days or so. Then all of a sudden without doing anything its just fine. Rebooting doesnt seem to matter. Also "Fix Common Issues" plugin seems to be broken as well. Haven't even looked into that yet. Quote Link to comment
returnofblank Posted December 19, 2023 Share Posted December 19, 2023 Can confirm I'm having the exact same issues in the post, really annoying. Quote Link to comment
RoboWatch Posted December 19, 2023 Share Posted December 19, 2023 I think I may be in the same boat. Everything was working fine until today when I found I could no longer connect to the unraid. The box was running and I started the GUI on the server after a reboot and started the array to find that networking is failing going out and now it says that it can't mount the XFS drives which was odd as I thought they were using ZFS before I upgraded to 6.12.6 Quote Link to comment
JorgeB Posted December 20, 2023 Share Posted December 20, 2023 We need the persistent syslog to see if it's the macvlan issue. Quote Link to comment
count-zero Posted December 22, 2023 Author Share Posted December 22, 2023 I'm going to mark this as solved. I've been stable for a week after changing from macvlan to ipvlan and rebooting. Thank you for the help @JorgeB! Appreciate it! 2 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.