Long Time reader new-ish poster issue with UNRAID


Recommended Posts

I'm hoping someone can make sense out of this more than me, So I've been running Unraid for a while now and everything has been great, however just recently (since I swapped my off the shelf router for a 2U PC running OPNSense), I've been having this weird issue, It doesn't seem to manifest itself after a set amount of time (although its a good few hours until it happens) where, despite the Unraid interface working and being accessible from my desktop PC (connected to the same switch as the UNRAID box) via 10 GIG SFP (Mikrotik 10GIG switch), Unraid will appear to max out 4 of 5 of the cores on my Xeon CPUS (running 2 x Xeon E5-2650 v2's) and for all intents and purposes seems to freeze , despite being able to access it via a browser and open different pages I'm unable to access any of the shares etc, also the restart and power off (gracefully) buttons refuse to work, the ONLY way i can get it working again at this point is to reboot the system the physical way and thus incur a parity check every time.

 

No idea what is causing this so I have included my diagnostics to see if any of you smart people can see anything somewhat obvious that someone untrained in the dark arts of UNRAID has as of yet been unable to decipher.

 

So I reverted to 6.10.1 which is what I am running now.

Thanks tons in advance any help/suggestions are welcome.

 

Another weird thing I just noticed my shares used to be stored at

 

\\UNRAID\XXXXX where XXXXX is the name of the folder shared , now it appears they are named \\TOWER\XXXXX which makes literally Zero sense to me.

 

 

tower-diagnostics-20220717-1853.zip

 

 

tower-syslog-20220717-2321.zip

Edited by Martsmac
Edit: posted the syslog too
Link to comment
56 minutes ago, Martsmac said:

\\UNRAID\XXXXX where XXXXX is the name of the folder shared , now it appears they are named \\TOWER\XXXXX which makes literally Zero sense to me.

You had named your server "UNRAID", but it reset to the default name "TOWER"

 

You can change that in Settings - Identification.

 

Possibly you set things back to default when you reverted to 6.10.1 . How exactly did you rollback?

 

The syslog you attached is the same as the one in diagnostics, which only has information since last reboot. You should setup syslog server

Link to comment

I'm not at my machine atm but from memory I rolled back by using the built in function to do it , either that or I looked up a guide online which I can't seem to locate now. I am very interested in setting up sys log server so thanks for that , I will get it set up tonight and report back.

Link to comment

So I enabled the Syslog server and I looked in the logs folder on the flash drive but the latest file it has stored on there is a syslog which was "last updated" at 10.51am today , which doesn't make much sense unless that's when the system issue last occurred. (I'm having the same issue now BTW, dockers unresponsive , unable to restart the system through he UI etc ,but the web interface still loads various menu options, such as cycling between settings and home for e.g.

 

Attatched the latest two syslog and tower-diagnostics-20220717-1853.zip files.

 

tower-diagnostics-20220717-1853.zip syslog

syslog

Edited by Martsmac
uploaded a newer syslog after a crash
Link to comment

Probably because the only way for me to shut down/restart the machine (because despite the menu interface responding it wont respond to certain functions in the GUI such as restart, shutdown or turning on/off dockers) when the issue occurs is to hard reset (meaning using the reset button on the server) which is far from ideal for obvious reasons.

As an interesting aside I actually attempted to reboot the system using a terminal and the sudo shutdown -r and that also failed to do anything at all.


I have now updated to 6.10.3 with none of the issues I had previously so time will tell (and I will post back here if it happens again) but so far so good, as a disclaimer when i got the "lock ups" before, it usually happened after being online for a fair few hours so i am not out of the woods yet.

One of the easiest tell tell signs I've discovered for knowing when it happens is that like 3-4 of my CPU threads are constantly maxxed out when it happens, not running anything too taxing either just the base system and a few dockers.

 

Link to comment

Yeah I'm aware u want to do a clean shutdown when you can , but when unRAID isnt responding there isnt a whole lot you can do (for the record I had made most of the above changes shortly after adopting unRAID) this issue didn't allow me to do a graceful shutdown , despite the above changes having been made.

 

Like I said time will tell but as of yet I haven't had the issue since I updated to 6.10.3 , we will see.

Link to comment
18 hours ago, trurl said:

Way down in that thread I explain how flash drive problems can also result in "false" unclean shutdowns:

 

https://forums.unraid.net/topic/69868-dealing-with-unclean-shutdowns/?do=findComment&comment=1087704

 

And flash drive problems might explain how your servername got reset.

 

What do you get from command line with this?

ls -lah /boot

drwx------ 10 root root 4.0K Dec 31  1969 ./
drwxr-xr-x 20 root root  440 Jul 19 14:31 ../
drwx------  8 root root 4.0K Jul 19 14:33 .git/
-rw-------  1 root root  180 Apr 15 16:38 .gitattributes
drwx------  3 root root 4.0K Jun 16  2020 EFI/
-rw-------  1 root root 4.0K Dec 31  1979 FSCK0000.REC
drwx------  2 root root 4.0K Jun 16  2020 System\ Volume\ Information/
-rw-------  1 root root 111M Jun 14 10:32 bzfirmware
-rw-------  1 root root   65 Jun 14 10:37 bzfirmware.sha256
-rw-------  1 root root 5.9M Jun 14 10:32 bzimage
-rw-------  1 root root   65 Jun 14 10:37 bzimage.sha256
-rw-------  1 root root  18M Jun 14 10:32 bzmodules
-rw-------  1 root root   65 Jun 14 10:37 bzmodules.sha256
-rw-------  1 root root 135M Jun 14 10:37 bzroot
-rw-------  1 root root  26M Jun 14 10:33 bzroot-gui
-rw-------  1 root root   65 Jun 14 10:37 bzroot-gui.sha256
-rw-------  1 root root   65 Jun 14 10:37 bzroot.sha256
-rw-------  1 root root  30K Jun 14 10:32 changes.txt
drwx------ 11 root root 4.0K Jul 19 14:48 config/
-r--------  1 root root 120K Jun 16  2020 ldlinux.c32
-r--------  1 root root  68K Jun 16  2020 ldlinux.sys
-rw-------  1 root root 7.8K Jun 14 10:32 license.txt
drwx------  2 root root 8.0K Jul 18 09:01 logs/
-rw-------  1 root root 1.8K Jun 14 10:32 make_bootable.bat
-rw-------  1 root root 3.3K Jun 14 10:32 make_bootable_linux
-rw-------  1 root root 2.4K Jun 14 10:32 make_bootable_mac
-rw-------  1 root root 147K Jun 14 10:32 memtest
drwx------  2 root root 4.0K Aug 14  2020 preclear_reports/
drwx------  2 root root 4.0K Jul 18 11:58 previous/
drwx------  2 root root 4.0K Mar  4  2021 syslinux/
-rw-------  1 root root  492 Jun 14 10:32 syslinux.cfg-

 

Link to comment

That FSCK*.REC file is the result of checkdisk on the flash drive, so some file must have been corrupt, such as ident.cfg where Identification settings are saved.

 

You might put flash in your PC and checkdisk again. While there make a backup.

 

Might be better if you don't mirror syslog server to flash, you don't want to write a lot to it. When I use that I always put it on a share on cache or other fast pool.

 

 

 

 

 

 

Link to comment

Hmm so it started happening again, I must have setup the syslog server incorrectly though as I'm not getting a log saved to the flash, the share ive chosen or the remote location i entered the IP of. (Edit: I figured out what I was doing wrong with syslog....I had set the remote IP to a different PC that wasnt running the syslog program for windows. i have since used the "trickery" method to force the syslog file to my cache share on the same unraid box)

 

Now i just need to wait for my issue to happen again and grab the syslog.

When you say make a backup do you mean literally just clone the usb?

 

Edited by Martsmac
Link to comment
9 hours ago, itimpi said:

Yes.

 

The flash drive is in a standard FAT32 file system so easy to copy all the files off onto a location on the PC.

 

Exactly what I did. I have a PC I mainly deal with unraid on and I have a folder sitting burried in everything a folder called FlashBackup and every once in a while I'll update it just incase. I've had to do the OH'Crap recovery a couple times luckily having the files handy. 

 

Now that I'm deploying a backup server I'll have a copy of each others Flash Drive stored on each other just in case as well. 

Link to comment

Yeah I was doing that as a troubleshooting step, it wasn't on for long I enabled it yesterday but I havent had the same issue in the original post since I upgraded to 6.10.3 I do need to invest in a UPS though our power company is terribly unrealiable, and also want to replace my flash drive with a much newer/better one.

 

So just as I'm writing this it did it again, I can navigate the menus just fine from another PC/my Android phone, but as soon as I try to interact with any of my Dockers/VM's/ etc it's all just hung, 3 cores on the CPU are maxxed out and to top it all off the last reported syslog recorded from the syslog server was at like 12 noon today (despite me knowing it was working much later than that this evening) I tried to run a manual diagnostic to the flash drive and it just stays on downloading but never actually creates a log file in boot/logs. I'm at a complete loss with this it seems it's rendered even obtaining log/diagnostic files impossible.

Link to comment

So this issue hasn't happened since I replaced my flash drive with a newer bigger one, 8GB as opposed to 4GB although I suspect the size of it has a lot less to do with the issue being seemingly resolved than the fact that its much newer does. Will update if the issue persists but atm it seems to be resolved.

Link to comment
Jul 21 20:30:06 UNRAID kernel: macvlan_broadcast+0x116/0x144 [macvlan]
Jul 21 20:30:06 UNRAID kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

See also here:

https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/

 

P.S. you should update to latest stable, v6.10.3

Link to comment
16 hours ago, JorgeB said:
Jul 21 20:30:06 UNRAID kernel: macvlan_broadcast+0x116/0x144 [macvlan]
Jul 21 20:30:06 UNRAID kernel: macvlan_process_broadcast+0xc7/0x110 [macvlan]

 

Macvlan call traces are usually the result of having dockers with a custom IP address, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info.

https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/

See also here:

https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/

 

P.S. you should update to latest stable, v6.10.3

Thanks, I appreciate that and have followed your instructions I will post back should the issue re-arise. Also regarding updating to 10.6.3 I AM on 10.6.3 stable already.

 

Screenshot 2022-07-29 231013.png

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.