Martsmac Posted July 17, 2022 Share Posted July 17, 2022 (edited) I'm hoping someone can make sense out of this more than me, So I've been running Unraid for a while now and everything has been great, however just recently (since I swapped my off the shelf router for a 2U PC running OPNSense), I've been having this weird issue, It doesn't seem to manifest itself after a set amount of time (although its a good few hours until it happens) where, despite the Unraid interface working and being accessible from my desktop PC (connected to the same switch as the UNRAID box) via 10 GIG SFP (Mikrotik 10GIG switch), Unraid will appear to max out 4 of 5 of the cores on my Xeon CPUS (running 2 x Xeon E5-2650 v2's) and for all intents and purposes seems to freeze , despite being able to access it via a browser and open different pages I'm unable to access any of the shares etc, also the restart and power off (gracefully) buttons refuse to work, the ONLY way i can get it working again at this point is to reboot the system the physical way and thus incur a parity check every time. No idea what is causing this so I have included my diagnostics to see if any of you smart people can see anything somewhat obvious that someone untrained in the dark arts of UNRAID has as of yet been unable to decipher. So I reverted to 6.10.1 which is what I am running now. Thanks tons in advance any help/suggestions are welcome. Another weird thing I just noticed my shares used to be stored at \\UNRAID\XXXXX where XXXXX is the name of the folder shared , now it appears they are named \\TOWER\XXXXX which makes literally Zero sense to me. tower-diagnostics-20220717-1853.zip tower-syslog-20220717-2321.zip Edited July 17, 2022 by Martsmac Edit: posted the syslog too Quote Link to comment
trurl Posted July 18, 2022 Share Posted July 18, 2022 56 minutes ago, Martsmac said: \\UNRAID\XXXXX where XXXXX is the name of the folder shared , now it appears they are named \\TOWER\XXXXX which makes literally Zero sense to me. You had named your server "UNRAID", but it reset to the default name "TOWER" You can change that in Settings - Identification. Possibly you set things back to default when you reverted to 6.10.1 . How exactly did you rollback? The syslog you attached is the same as the one in diagnostics, which only has information since last reboot. You should setup syslog server Quote Link to comment
Martsmac Posted July 18, 2022 Author Share Posted July 18, 2022 I'm not at my machine atm but from memory I rolled back by using the built in function to do it , either that or I looked up a guide online which I can't seem to locate now. I am very interested in setting up sys log server so thanks for that , I will get it set up tonight and report back. Quote Link to comment
Martsmac Posted July 18, 2022 Author Share Posted July 18, 2022 (edited) So I enabled the Syslog server and I looked in the logs folder on the flash drive but the latest file it has stored on there is a syslog which was "last updated" at 10.51am today , which doesn't make much sense unless that's when the system issue last occurred. (I'm having the same issue now BTW, dockers unresponsive , unable to restart the system through he UI etc ,but the web interface still loads various menu options, such as cycling between settings and home for e.g. Attatched the latest two syslog and tower-diagnostics-20220717-1853.zip files. tower-diagnostics-20220717-1853.zip syslog syslog Edited July 18, 2022 by Martsmac uploaded a newer syslog after a crash Quote Link to comment
trurl Posted July 18, 2022 Share Posted July 18, 2022 Jul 17 15:50:09 Tower emhttpd: unclean shutdown detected Do you know why you had unclean shutdown? Quote Link to comment
Martsmac Posted July 18, 2022 Author Share Posted July 18, 2022 Probably because the only way for me to shut down/restart the machine (because despite the menu interface responding it wont respond to certain functions in the GUI such as restart, shutdown or turning on/off dockers) when the issue occurs is to hard reset (meaning using the reset button on the server) which is far from ideal for obvious reasons. As an interesting aside I actually attempted to reboot the system using a terminal and the sudo shutdown -r and that also failed to do anything at all. I have now updated to 6.10.3 with none of the issues I had previously so time will tell (and I will post back here if it happens again) but so far so good, as a disclaimer when i got the "lock ups" before, it usually happened after being online for a fair few hours so i am not out of the woods yet. One of the easiest tell tell signs I've discovered for knowing when it happens is that like 3-4 of my CPU threads are constantly maxxed out when it happens, not running anything too taxing either just the base system and a few dockers. Quote Link to comment
Martsmac Posted July 19, 2022 Author Share Posted July 19, 2022 Yeah I'm aware u want to do a clean shutdown when you can , but when unRAID isnt responding there isnt a whole lot you can do (for the record I had made most of the above changes shortly after adopting unRAID) this issue didn't allow me to do a graceful shutdown , despite the above changes having been made. Like I said time will tell but as of yet I haven't had the issue since I updated to 6.10.3 , we will see. Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 Way down in that thread I explain how flash drive problems can also result in "false" unclean shutdowns: https://forums.unraid.net/topic/69868-dealing-with-unclean-shutdowns/?do=findComment&comment=1087704 And flash drive problems might explain how your servername got reset. What do you get from command line with this? ls -lah /boot Quote Link to comment
Martsmac Posted July 19, 2022 Author Share Posted July 19, 2022 18 hours ago, trurl said: Way down in that thread I explain how flash drive problems can also result in "false" unclean shutdowns: https://forums.unraid.net/topic/69868-dealing-with-unclean-shutdowns/?do=findComment&comment=1087704 And flash drive problems might explain how your servername got reset. What do you get from command line with this? ls -lah /boot drwx------ 10 root root 4.0K Dec 31 1969 ./ drwxr-xr-x 20 root root 440 Jul 19 14:31 ../ drwx------ 8 root root 4.0K Jul 19 14:33 .git/ -rw------- 1 root root 180 Apr 15 16:38 .gitattributes drwx------ 3 root root 4.0K Jun 16 2020 EFI/ -rw------- 1 root root 4.0K Dec 31 1979 FSCK0000.REC drwx------ 2 root root 4.0K Jun 16 2020 System\ Volume\ Information/ -rw------- 1 root root 111M Jun 14 10:32 bzfirmware -rw------- 1 root root 65 Jun 14 10:37 bzfirmware.sha256 -rw------- 1 root root 5.9M Jun 14 10:32 bzimage -rw------- 1 root root 65 Jun 14 10:37 bzimage.sha256 -rw------- 1 root root 18M Jun 14 10:32 bzmodules -rw------- 1 root root 65 Jun 14 10:37 bzmodules.sha256 -rw------- 1 root root 135M Jun 14 10:37 bzroot -rw------- 1 root root 26M Jun 14 10:33 bzroot-gui -rw------- 1 root root 65 Jun 14 10:37 bzroot-gui.sha256 -rw------- 1 root root 65 Jun 14 10:37 bzroot.sha256 -rw------- 1 root root 30K Jun 14 10:32 changes.txt drwx------ 11 root root 4.0K Jul 19 14:48 config/ -r-------- 1 root root 120K Jun 16 2020 ldlinux.c32 -r-------- 1 root root 68K Jun 16 2020 ldlinux.sys -rw------- 1 root root 7.8K Jun 14 10:32 license.txt drwx------ 2 root root 8.0K Jul 18 09:01 logs/ -rw------- 1 root root 1.8K Jun 14 10:32 make_bootable.bat -rw------- 1 root root 3.3K Jun 14 10:32 make_bootable_linux -rw------- 1 root root 2.4K Jun 14 10:32 make_bootable_mac -rw------- 1 root root 147K Jun 14 10:32 memtest drwx------ 2 root root 4.0K Aug 14 2020 preclear_reports/ drwx------ 2 root root 4.0K Jul 18 11:58 previous/ drwx------ 2 root root 4.0K Mar 4 2021 syslinux/ -rw------- 1 root root 492 Jun 14 10:32 syslinux.cfg- Quote Link to comment
trurl Posted July 19, 2022 Share Posted July 19, 2022 That FSCK*.REC file is the result of checkdisk on the flash drive, so some file must have been corrupt, such as ident.cfg where Identification settings are saved. You might put flash in your PC and checkdisk again. While there make a backup. Might be better if you don't mirror syslog server to flash, you don't want to write a lot to it. When I use that I always put it on a share on cache or other fast pool. Quote Link to comment
Martsmac Posted July 22, 2022 Author Share Posted July 22, 2022 (edited) Hmm so it started happening again, I must have setup the syslog server incorrectly though as I'm not getting a log saved to the flash, the share ive chosen or the remote location i entered the IP of. (Edit: I figured out what I was doing wrong with syslog....I had set the remote IP to a different PC that wasnt running the syslog program for windows. i have since used the "trickery" method to force the syslog file to my cache share on the same unraid box) Now i just need to wait for my issue to happen again and grab the syslog. When you say make a backup do you mean literally just clone the usb? Edited July 22, 2022 by Martsmac Quote Link to comment
itimpi Posted July 22, 2022 Share Posted July 22, 2022 3 hours ago, Martsmac said: When you say make a backup do you mean literally just clone the usb? Yes. The flash drive is in a standard FAT32 file system so easy to copy all the files off onto a location on the PC. Quote Link to comment
kizer Posted July 22, 2022 Share Posted July 22, 2022 9 hours ago, itimpi said: Yes. The flash drive is in a standard FAT32 file system so easy to copy all the files off onto a location on the PC. Exactly what I did. I have a PC I mainly deal with unraid on and I have a folder sitting burried in everything a folder called FlashBackup and every once in a while I'll update it just incase. I've had to do the OH'Crap recovery a couple times luckily having the files handy. Now that I'm deploying a backup server I'll have a copy of each others Flash Drive stored on each other just in case as well. Quote Link to comment
Martsmac Posted July 24, 2022 Author Share Posted July 24, 2022 Yeah I was doing that as a troubleshooting step, it wasn't on for long I enabled it yesterday but I havent had the same issue in the original post since I upgraded to 6.10.3 I do need to invest in a UPS though our power company is terribly unrealiable, and also want to replace my flash drive with a much newer/better one. So just as I'm writing this it did it again, I can navigate the menus just fine from another PC/my Android phone, but as soon as I try to interact with any of my Dockers/VM's/ etc it's all just hung, 3 cores on the CPU are maxxed out and to top it all off the last reported syslog recorded from the syslog server was at like 12 noon today (despite me knowing it was working much later than that this evening) I tried to run a manual diagnostic to the flash drive and it just stays on downloading but never actually creates a log file in boot/logs. I'm at a complete loss with this it seems it's rendered even obtaining log/diagnostic files impossible. Quote Link to comment
Martsmac Posted July 26, 2022 Author Share Posted July 26, 2022 So this issue hasn't happened since I replaced my flash drive with a newer bigger one, 8GB as opposed to 4GB although I suspect the size of it has a lot less to do with the issue being seemingly resolved than the fact that its much newer does. Will update if the issue persists but atm it seems to be resolved. Quote Link to comment
Martsmac Posted July 28, 2022 Author Share Posted July 28, 2022 and of course just when I think all is working fine again, it decides to lock up again. Of course the last log generated was at 11pm tonight so It's not even giving me a log I can use to diagnose the issue. Quote Link to comment
Martsmac Posted July 29, 2022 Author Share Posted July 29, 2022 syslog-10.10.1.114.logSo I managed to get a syslog this is from just before the latest hang happened and also afterwards the hang happened late last night! Quote Link to comment
JorgeB Posted July 29, 2022 Share Posted July 29, 2022 Jul 21 20:30:06 UNRAID kernel: macvlan_broadcast+0x116/0x144 [macvlan] Jul 21 20:30:06 UNRAID kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan] Macvlan call traces are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ P.S. you should update to latest stable, v6.10.3 Quote Link to comment
Martsmac Posted July 30, 2022 Author Share Posted July 30, 2022 16 hours ago, JorgeB said: Jul 21 20:30:06 UNRAID kernel: macvlan_broadcast+0x116/0x144 [macvlan] Jul 21 20:30:06 UNRAID kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan] Macvlan call traces are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/ P.S. you should update to latest stable, v6.10.3 Thanks, I appreciate that and have followed your instructions I will post back should the issue re-arise. Also regarding updating to 10.6.3 I AM on 10.6.3 stable already. 1 Quote Link to comment
Martsmac Posted August 1, 2022 Author Share Posted August 1, 2022 So the issue hasn't reared its ugly head again yet, I must give you my thanks @JorgeB seems to be working great now, thanks again, I guess its to do with my mellanox NIC and not playing nice with the Macvlan setting. Thanks again, Martin 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.