Jump to content

snowmirage

Members
  • Posts

    105
  • Joined

  • Last visited

Posts posted by snowmirage

  1. I've had the wireguard vpn built into unraid running for a while but haven't needed it in months.  At some point, probably after an update of the Unraid OS if I had to guess.  It stopped working.  I assumed I probably busted something in the process of moving from the onboard NIC to a new 10gig interface I installed, so I set out to set up the VPN again from scratch.

    After reading the docs and several guides I'm fairly confident I have everything setup correctly.

    But I don't see unraid respond to any connection requests on port 51820.  Running tcpdump on my firewall I can see the traffic hit my external interface and get passed to the unraid IP, but no response.

    I noticed after setting things up this active/inactive switch doesn't stay "active"

    image.thumb.png.cc8c249e1527712da8d81a9eb35d7830.png

    and noticed similar messaging on the dashboard.

    image.png.eb888225897ade0670c7451cc4a5962f.png

    I initially thought that just meant "no one is connected yet".

    Something I noticed that may be related.  Recently when I rebooted the Unraid Host I noticed on its directly connected display that when the login prompt comes up it no longer lists an IP address.  It did before I migrated to my new 10Gig NIC.

    Additional information:  I setup the 10Gig nic with a trunk port on my local switch and have several VLAN's connected directly to Unraid.

    My guess here is that the VPN service isn't listening on all Unraid's interfaces, or at least not the one I intend (VLAN 2 aka br2.2 10.2.0.16)

    I've attached the diag file.

    This is all I see directly in the system logs when I try to flip that "active/inactive" slider

     

    Mar 18 17:36:43 phoenix wireguard: Tunnel WireGuard-wg0 started
    Mar 18 17:36:43 phoenix network: update services: 1s
    Mar 18 17:36:45 phoenix network: reload service: nginx
    Mar 18 17:44:40 phoenix wireguard: Tunnel WireGuard-wg0 started
    Mar 18 17:44:40 phoenix network: update services: 1s
    Mar 18 17:44:41 phoenix network: reload service: nginx


    Doing some searching another post referenced "/var/log/wg-quick.log"

    There I found this

     

    wg-quick down wg0
    wg-quick: `/etc/wireguard/wg0.conf' does not exist
    
    wg-quick up wg0
    [#] ip link add wg0 type wireguard
    [#] wg setconf wg0 /dev/fd/63
    [#] ip -4 address add 10.253.0.1 dev wg0
    [#] ip link set mtu 1420 up dev wg0
    [#] logger -t wireguard 'Tunnel WireGuard-wg0 started';/usr/local/emhttp/webGui/scripts/update_services
    [#] iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o br0 -j MASQUERADE;iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o vhost0 -j MASQUERADE
    
    wg-quick down wg0
    [#] ip link delete dev wg0
    [#] logger -t wireguard 'Tunnel WireGuard-wg0 stopped';/usr/local/emhttp/webGui/scripts/update_services
    [#] iptables -t nat -D POSTROUTING -s 10.253.0.0/24 -o br0 -j MASQUERADE;iptables -t nat -D POSTROUTING -s 10.253.0.0/24 -o vhost0 -j MASQUERADE
    
    wg-quick up wg0
    [#] ip link add wg0 type wireguard
    [#] wg setconf wg0 /dev/fd/63
    [#] ip -4 address add 10.253.0.1 dev wg0
    [#] ip link set mtu 1420 up dev wg0
    [#] ip -4 route add 10.253.0.2/32 dev wg0
    [#] logger -t wireguard 'Tunnel WireGuard-wg0 started';/usr/local/emhttp/webGui/scripts/update_services
    [#] iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o br0 -j MASQUERADE;iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o vhost0 -j MASQUERADE
    [#] ip -4 route flush table 200
    [#] ip -4 route add default via 10.253.0.1 dev wg0 table 200
    [#] ip -4 route add 0.0.0.0/0 via  dev br0 table 200
    Error: inet address is expected rather than "dev".
    [#] ip link delete dev wg0
    
    wg-quick down wg0
    wg-quick: `wg0' is not a WireGuard interface
    
    wg-quick down wg0
    wg-quick: `wg0' is not a WireGuard interface
    
    wg-quick down wg0
    wg-quick: `wg0' is not a WireGuard interface
    
    wg-quick down wg0
    wg-quick: `wg0' is not a WireGuard interface
    
    wg-quick up wg0
    [#] ip link add wg0 type wireguard
    [#] wg setconf wg0 /dev/fd/63
    [#] ip -4 address add 10.253.0.1 dev wg0
    [#] ip link set mtu 1420 up dev wg0
    [#] ip -4 route add 10.253.0.2/32 dev wg0
    [#] logger -t wireguard 'Tunnel WireGuard-wg0 started';/usr/local/emhttp/webGui/scripts/update_services
    [#] iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o br0 -j MASQUERADE;iptables -t nat -A POSTROUTING -s 10.253.0.0/24 -o vhost0 -j MASQUERADE
    [#] ip -4 route flush table 200
    [#] ip -4 route add default via 10.253.0.1 dev wg0 table 200
    [#] ip -4 route add 0.0.0.0/0 via  dev br0 table 200
    Error: inet address is expected rather than "dev".
    [#] ip link delete dev wg0
    
    wg-quick down wg0
    wg-quick: `wg0' is not a WireGuard interface


    Maybe I'm running into a problem simply because I'm using VLANs as my main interface for Unraid?

    I certainly appreciate any ideas.  Maybe its just simpler for me to run a dedicated VM, or docker container?

    phoenix-diagnostics-20240318-1748.zip

  2. I've had the NUT V2 plugin up and running for sometime now, recently I seem to stop getting data from it after a few mins to hours.

    phoenix-diagnostics-20240318-1711.zip

    I checked the USB cable, and ran the auto detect in the plugin again which seems to work but after a bit it fails again.

    I even tried replacing the USB cable with a new one.

    Might someone have some insight into what is breaking?

    I see this repeated in the logs.

     

    Mar 18 17:12:51 phoenix upsmon[1164]: Poll UPS [[email protected]] failed - Data stale
    Mar 18 17:12:53 phoenix usbhid-ups[978]: libusb1: Could not open any HID devices: insufficient permissions on everything
    Mar 18 17:12:55 phoenix usbhid-ups[978]: libusb1: Could not open any HID devices: insufficient permissions on everything


    Which from my searching seems could be a failed USB cable?

    I'm just hoping the UPS (or rather the USB controller / port on it) isn't going bad.

  3. What I'm trying to do

    I recently did a network upgrade and installed a 10gig NIC in my unraid server.

    My intention is to separate my network into several VLANs. 

    I have all the networking stuff working, router + switches etc.

    I believe I have setup the networking in unraid as I should. 

    Settings > Networking 

    image.thumb.png.a1ab1dd77ce7fafdcc27118cde8d71c5.png

    I can reach the unraid server on that VLAN (2) as expected.

    I would like to be able to assign a given Docker Container (or VM but troubleshooting just docker here) to that VLAN.

    I started with the defaults I already head in the docker settings

    Docker custom network type: ipvlan
    Host access to custom networks: Disabled    

    But added a new custom network on interface br2.2.
    image.thumb.png.f727a4fb6c0dae7669afad15fbd0b3a9.png


    And removed the previous custom network

    I made sure my local physical DHCP server was passing out IP address in a different range on the same subnet than the DHCP pool configured in docker settings.

    Then in each container I had running I changed, the network type and set a fixed ip address.

    image.thumb.png.cfeac9896ae004935efd41ac829460cc.png

     

    For several of my docker containers this seemed to work as expected.
     




     

    However I have two issues that I can't seem to explain.


    Issue 1

    I have an nzbgetvpn container running, when it starts logs indicate its working as normal.  I can ping the containers fixed IP address.  But when I attempt to connect to the web interface on its ip + port the connection times out.  Digging deeper looking at wireshark I can see the containers IP respond as expected to ICMP ping requests, but it doesn't respond at all to TCP traffic to the port for the web interface.

    During my troubleshooting I change that containers network type, back to "bridge"

    image.png.bd832dd04f409b2f9db94e5fcd30a132.png

    When I do so in the docker page I see it now has this IP address

    image.png.69ef81c30e42055694c13dde45af60cf.png

    To my knowledge I don't have anything on my network using a 172.17.x.x address so I have no idea where it is getting that from.....

    On top of that if I then try to access http://10.2.0.16:6789  I can pull up the web interface of the container???  (Thats the IP address of the unraid host on VLAN2 + the port of the web interface of the container).


    Issue 2

     

    Moving on to another container... I am running "nut-influxdb-exporter"  which tries to connect to the unraid host running the NUT plugin (UPS power stuff) on port :3493

    I had this container running with the same custom network I just setup

    image.thumb.png.63d08c81a945b939467a8bce4f516ceb.png

    When the container boots I see errors that I can't connect to the unraid host.

    Dropping into a console for the container I noticed that it can successfully ping other docker containers configured on the same network (10.2.0.68 for example is my InfluxDB container) and it can ping other hosts on that network 10.2.0.1 for example my physical routers interface on this VLAN.  But it can't ping 10.2.0.16 the IP of the Unraid host on that VLAN.



    Even with all my searching over the last week I suspect I'm missing something fundamental about how to get this type of configuration correct.  Might anyone be able to point me in the right direction?

    phoenix-diagnostics-20240311-0624.zip

  4. On 3/9/2024 at 1:08 PM, snowmirage said:

    I think my nzbgetvpn container may have stopped working sometime ago.  I got back to doing some upgrades to my home lab.  I've setup Unraid with multiple VLANs.  I've just moved all my docker containers to one of the new VLANS.  "Network br2.2" 

    I can't seem to access the nzbget web ui.

    When I look at the logs 

     

    2024-03-09 12:57:09,145 DEBG 'start-script' stdout output:
    [info] Successfully assigned and bound incoming port '25361'
    
    2024-03-09 12:57:39,435 DEBG 'watchdog-script' stdout output:
    [info] nzbget not running
    
    2024-03-09 12:57:39,435 DEBG 'watchdog-script' stdout output:
    [info] Nzbget config file already exists, skipping copy
    
    2024-03-09 12:57:39,447 DEBG 'watchdog-script' stdout output:
    [info] Attempting to start nzbget...
    
    2024-03-09 12:57:39,456 DEBG 'watchdog-script' stdout output:
    [info] Nzbget process started
    [info] Waiting for Nzbget process to start listening on port 6789...
    
    2024-03-09 12:57:39,460 DEBG 'watchdog-script' stdout output:
    [info] Nzbget process is listening on port 6789


    The service seems to be listening on the port.

    and I did update the LAN_NETWORK variable and added all my local networks.

    image.thumb.png.5c3ed9543f08d7263bca45de7bf9a3a0.png

     

    image.thumb.png.01e816304afc2e78d1e4f76c71b28d7a.png

    I can ping the docker containers IP address (10.2.0.73) and get a response but I get no response at all when trying to access the webgui.  Looking at wireshark, when I try to connect to the webui there's not return traffic at all from the container.

    Any ideas what might be wrong here?

    Well... I think I fixed it... but I haven't a clue how...

    I change the docker container from the above network type to give it an IP address on my VLAN interface (br2.2) to the "Bridge" network type.
    image.png.180c4775495c3efd0821eb7145369c8d.png
    after doing this

    The UI reports the container has an IP of 172. something.... I have nothing on my network or unraid server using that IP range to my knowledge....

    image.thumb.png.aa18b052bcec1ad96490f7c589a872f0.png

    But I happen to try putting in the IP address of my unraid server and the nzbget port in the browser and ... there it is working at 10.2.0.16:6789...

    No idea what that did or how its working unfortunately ...

  5. I think my nzbgetvpn container may have stopped working sometime ago.  I got back to doing some upgrades to my home lab.  I've setup Unraid with multiple VLANs.  I've just moved all my docker containers to one of the new VLANS.  "Network br2.2" 

    I can't seem to access the nzbget web ui.

    When I look at the logs 

     

    2024-03-09 12:57:09,145 DEBG 'start-script' stdout output:
    [info] Successfully assigned and bound incoming port '25361'
    
    2024-03-09 12:57:39,435 DEBG 'watchdog-script' stdout output:
    [info] nzbget not running
    
    2024-03-09 12:57:39,435 DEBG 'watchdog-script' stdout output:
    [info] Nzbget config file already exists, skipping copy
    
    2024-03-09 12:57:39,447 DEBG 'watchdog-script' stdout output:
    [info] Attempting to start nzbget...
    
    2024-03-09 12:57:39,456 DEBG 'watchdog-script' stdout output:
    [info] Nzbget process started
    [info] Waiting for Nzbget process to start listening on port 6789...
    
    2024-03-09 12:57:39,460 DEBG 'watchdog-script' stdout output:
    [info] Nzbget process is listening on port 6789


    The service seems to be listening on the port.

    and I did update the LAN_NETWORK variable and added all my local networks.

    image.thumb.png.5c3ed9543f08d7263bca45de7bf9a3a0.png

     

    image.thumb.png.01e816304afc2e78d1e4f76c71b28d7a.png

    I can ping the docker containers IP address (10.2.0.73) and get a response but I get no response at all when trying to access the webgui.  Looking at wireshark, when I try to connect to the webui there's not return traffic at all from the container.

    Any ideas what might be wrong here?

  6. I recently moved.  In the process a bunch of my "old" hardware got mixed up with my "new" hardware....

    I now find myself with a box of hard drives (More then 30).  I would like to setup some kind of test where I can get a simple "Pass" or "Fail" as to if I should trust each drive with data.

    Short of just looking at a SMART report is there something else I can do to test each drive?

    I have an old dell server I can install a few drives in at a time.  It actually has server blades, each with 3x 3.5 drive bays.  I was going to boot up unraid on each, and systematically try to run a preclear on each disk.

    If that works then drive Passes

    If it doesn't then drive fails.

    But doing so I'd need to go buy 4 more Unraid licenses just to test this bunch of drives which seems a bit of a waste.

    I suspect I could find the code for the preclear script some place and run it from a linux install of my choice but this feels like one of those problems someone else must have already solved and I'd be starting to reinvent the wheel.

    Am I already on the best path here or is there something I just don't know of yet?

  7. So strange I must have just shock a power cable loose.  After double checking all connections and giving them a wiggle, the drive are back.
    image.png.94152217bdc7cd865c5d902aed69ee04.png
    But I'm assuming I have to get the disks reassigned in the right order right?


    I do thankfully keep backups and was able to go did them up and find the disk assignments at /config/DISK_ASSIGNMENTS.txt from the backup of the unraid flashdrive.

    Thank you for the help

  8. 2 hours ago, trurl said:

    Diagnostics shows 6 slots for a pool named cache with nothing assigned to any of them. All the connected disks are part of the array, except for a single 2TB nvme which is unassigned. If there were ever any other attached drives they didn't show up for the reboot.


    Thanks for clarifying that, once this parity check finishes I'll do a clean shutdown and check all the cables.

    Now that I think about it I did move the rack a little bit yesterday maybe something got loose.

  9. Noticed this morning after a reboot last night

    image.png.e85a951d488d6ae5fbb2fcc6f4c8465d.png

    Started googling that issue as I recalled sometimes the docker image or something getting corrupted as something I'd seen before.

    But then noticed this

    image.png.9f0b88cef4ab980a1eb11020a1aeec50.png

    I have 6 (... I'm pretty sure it was 6) SSDs in a pool for the cache.  They are getting up there in age wouldn't surprise me if one failed but I wouldn't have expected them all to vanish.

    Its currently doing a parity check, once that finishes I'll stop the array and do a nice clean reboot + check the cables etc.

    In the mean time maybe someone might see something in the logs that didn't catch my eye?

     

    phoenix-diagnostics-20240202-1057.zip

  10. Quick question... I'm looking at what was written to my "Backup destination" path after completing a manual backup to verify it worked and its unclear what I should see there.

    image.thumb.png.ba8cf90583b9eadbb02788ffa82538e2.png


    I was expecting to find copies of my appdata directories, as well as the VM meta data, and the flash drive.

    image.thumb.png.ebce2ed21ac053bf69d249581c0ce9f5.png

    Clear enough that it has something for the VM meta data and flash drive.

    But are all the .xml just a representation of the appdata paths somehow? I'm guessing not...

    I don't see anything else there to indicate the appdata paths successfully backuped up. 
    But I don't see anything from the backup script indicating it failed. 

  11. 5 hours ago, JorgeB said:

    Last macvlan crash I see logged is from Nov 6th, and no toher call traces after that, did you reboot after changing to ipvlan?


    It has crashed once since I changed to ipvlan.

    Found it that way the morning of Nov 8th if I recall correctly and had to do a hard reboot (reset switch).  As of 10am Nov 9th its still up and running.

    If it crashes and I have to hard reset again, I'll grab new logs and diag files.  Hopefully it was just that macvlan setting or something else points to a fix.

    Thank you for the help

  12. I went back and filled in the server IP of my unraid server.

    I also adjusted the docker networking setting you suggested and changed it from macvlan to ipvlan

    I don't see anything different in the logs folder on the flash drive.   attached as log2.zip

    logs2.zip

    But I did find a syslog file in the appdata folder, maybe that is being mirrored to the flash drive and I just didnt see it?

    syslog-192.168.0.216.log

  13. Thanks JorgeB I was able to do that, and looks like that syslog data is included in the diag.zip I was able to grab from the next crash.

    This morning I found unraid again not responding when I try to  open the webGUI via its usual IP, and found a similar kernel panic error on screen when I checked it.

    After forcing another reboot I grabbed the diag.zip again here nothing is jumping out at me as I looked through the logs, hopefully someone sees something I'm not.

     

    phoenix-diagnostics-20231105-0912.zip

  14. I recently rebuilt my array after some drives failed, or went missing.  Things were fine for a day or two then one morning this week unraid wasn't responding checking on it I saw what looks to me a link a kernel panic.

    <screen shot attached>

    I was forced to power cycle unraid, and twice now during a parity sync the same thing has happen by the next morning.

    Anyone have any ideas what may be the problem?

    Should I just start by running old school memtest or something?

    image.thumb.png.25ffed494c5afed02f8f4ae8d6b40fbe.png

    phoenix-diagnostics-20231102-1258.zip

  15. I recently replaced some failing drives, and followed the process in one of space invader one's videos to encrypt the last few drives in my array.

    After the rebuild process completed I noticed the docker tab was missing.  I might have disabled docker sometime during that week long process but cant recall for sure.  When I re-enabled docker none of my containers show up in the GUI.

    I've spent a few days trying to figure out what went wrong.

    Firstly.... I think I may remember late one night seeing a warning / error from Fix Common Problems advising me to change the "primary storage" for the appdata share to Cache.

    image.thumb.png.15ef44e04c34110475d338d1db43e0dd.png

     

    But I'm not 100% sure on that.  Lesson learned... don't change multiple things at once... and don't try to fix extra stuff when half asleep at 2am...

    I also think I may have followed another of space invader one's guides quite some time ago to move appdata, or docker stuff?, or maybe it was just the VMs.. onto an unassigned device that is a NVME ssd I have in the system.   But I couldn't find that in my notes, or which guide that may have been searching through his videos.

    There's one thing I could think of to try to fix this.  I recalled my docker.img running out of space years back and all the containers vanishing.  Searching the forums and documentation I found this section.

    https://docs.unraid.net/unraid-os/manual/docker-management/

    "Re-Create the Docker image file"

    Maybe that is what I need to try to do here, but I was hoping to get input on that decision before I do it.  I feel I've already made things bad enough and don't want to make them worse still.

    phoenix-diagnostics-20231025-0901.zip

  16. Earlier this week I had two drives, 1 parity one data, show up as disabled.  Red X "device disabled".

    I shutdown, removed those devices, and installed new drives.  They're still showing up disabled.

    I even moved one of the drives to a free slot in my 24 bay chassis same problem.

    Am I missing something simple?  Or do I really have "part" of a cable (I say part as they are SPF 8087 cables between the SAS expander and the chassis backplane, so if one of the cables was bad some how I'd assume one whole "tray" of drives would be failing?   Or Is the backplane of the chassis bad already?  I've only had it a few months now, can you even get parts for such a case?

    I was able to run an extended smart self test on both drives which passed.

    phoenix-diagnostics-20231022-2137.zip

  17. I just noticed an issue with my unifi poller as well.

    image.thumb.png.96c28cf37609b8795f8e3bc9ca66fe17.png

    Looks to me like its authenticating with my Unifi Controller successfully but attempting to parse data (the server version) in this case is failing.  

    If thats correct I'm guessing this is an issue with unpoller (I checked I am running the latest version v2.9.2).

  18. On 9/8/2023 at 10:48 AM, FlamongOle said:

     

    Have you tried to go to "Tray Allocations" and check if the colors has been added as "Custom color"? They override the common config.

     

    Also, you can try to "Reset All Colors" with the button at the bottom. If I remember correctly, this should only reset custom colors.

    That was exactly my issue thank you!

    I keep that window on a side monitor and never even noticed the custom color option as it was off the edge of the screen.

    That reset fix it all.

×
×
  • Create New...