kmwoley

Members
  • Posts

    44
  • Joined

  • Last visited

Posts posted by kmwoley

  1. If it's helpful for folks, here's what it took for me to stop the WebGUI/nginx crashes and enabled tailnet access to my macvlan docker containers.

     

    Background: I'm on unraid version 6.12.4. I had (and have) Wireguard working great. I could reach my local network, all of my Docker containers (those with host networks and those with macvlan), etc. etc.

     

    In order to get the Tailscale plugin to work the same as Wireguard I had to make the following changes:

     

    1) As advised in the release notes, I changed my macvlan network & Docker settings so that they use the eth0.X interfaces instead of the br0.X interfaces. To do this, you'll need to update your network configuration to disable bridging and update your Docker configuration to allow the host to access custom networks. This config will enable macvlan, but move the interface to eth0. I recommend stopping your array and making both changes at the same time; if you do that, it *should* automatically update all of your docker containers with the right config. I did it in two steps, which meant I had to go manually change the network config of all of my Docker containers to get them to restart.

     

    2) As you'd expect, I needed to use the command line to advertise the routes I wanted my remotes to connect to:

    tailscale up --accept-routes --advertise-exit-node --advertise-routes=192.168.10.0/24,192.168.20.0/24,192.168.60.0/24 --accept-dns=false

     

    3) In the Tailscale Plugin settings, I needed to set "Enable IP Forwarding" and "Unraid services listen on Tailscale IP" to Yes.

     

    4) On the Tailscale WebUI Admin, configure the server to allow the subnet routing.

     

    The most important bit was step #1. Without it, I could not reach my Docker containers that were on their own IP addresses. And this change also appears to have solved the WebGUI crashing that others have reported. It's been stable now for 12 hours; I'll report back if I see a change.

     

    HTH

    • Thanks 2
  2. If you find yourself here because your MinIO container stopped being able to start on or around October 31st, it's likely because MinIO removed support for "Filesystem" backends and didn't provide an automatic upgrade/migration path:

    https://min.io/docs/minio/linux/operations/install-deploy-manage/deploy-minio-single-node-single-drive.html

     

    Quote

    Important

    RELEASE.2022-10-29T06-21-33Z fully removes the deprecated Gateway/Filesystem backends. MinIO returns an error if it starts up and detects existing Filesystem backend files.

    To migrate from an FS-backend deployment, use mc mirror or mc cp to copy your data over to a new MinIO SNSD deployment. You should also recreate any necessary users, groups, policies, and bucket configurations on the SNSD deployment.

     

    Looks like the only solution is to run a previous container version, stand up a new version simultaneously and copy your data/settings to a new MinIO deployment. Really annoyed by this. Off I go to recreate and duplicate a few TB of data...

  3. On 6/17/2021 at 8:46 AM, yayitazale said:

    Yes, map it as /dev/bus/usb/004/002:/dev/bus/usb/004/002

     

    Anyway, note that rebooting the system or disconnecting the USB will cause a change in the mapping, so this is for debugging purposes only. The application should work correctly with the assignment of /dev/bus/usb

     

    Thanks for the help on this, but I didn't find a solution that worked. The closest I came was by trying an external, powered USB hub. I actually saw the TPU do some detection in Frigate for a moment... only to stop working shortly thereafter. I think I might have a defective device so I'm going to return it. :/

    • Like 1
  4. 4 hours ago, yayitazale said:

    You can try to map the specific usb to frigate too to be sure that there is no compatibility issues...

     

    Are you using the original cable that comes with the coral?

     

    Yes, I'm using the original cable that came with the coral.

     

    How are you mapping the specific USB?

    $lsusb
    ...
    Bus 004 Device 002: ID 1a6e:089a Global Unichip Corp. 
    ...

     

    In my case, would I map /dev/bus/usb/004/002 ?

     

    I don't know how to find, or cannot find, a specific /dev/tty* device for the coral.

     

     

  5. 11 hours ago, yayitazale said:

     

    Not dure about whst is happening, I'm using frigate with usb coral from 1 Year now and I never got that error.

     

    Are you mapping any usb Port to another container or VM?

    Are you using a native 3.0 USB? Maybe is related to a lack of power?

     

    My box has 3 USB devices - the USB stick for the OS, a zwave stick, and the Coral. The Coral is on the USB 3 hub. The other two devices are on a USB 2 hub. Interesting theory that it could be lack of power; my motherboard is very, very old and it wouldn't surprise me if there's some quirk there.

     

    My USB Coral is mapped via the default settings:
    /dev/bus/usb -> /dev/bus/usb

     

    My USB zwave stick is mapped in another container to the specific device:

    /dev/ttyACM0 -> /dev/ttyACM0

     

    Any issue with those settings?

  6. Hey folks - I'm trying to get Frigate setup with a Google Coral USB stick. Everything on the setup has gone well, except that the USB device is behaving strange. It appears to be connecting/disconnecting repeatedly. And it's awfully warm to the touch when it's not in use. It doesn't look like it's actually being used to detect anything, either.

     

    To make sure it's not an issue with the Coral, I've plugged the USB device into a Windows based system and have confirmed that it (1) doesn't get warm at idle and (2) works correctly following the Google getting started instructions.

    When trying to use the device on my Unraid box, I get these repeated errors in the log:

    ...
    Jun 9 00:02:39 lenny kernel: usb 4-2: LPM exit latency is zeroed, disabling LPM.
    Jun 9 00:03:29 lenny kernel: usb 4-2: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
    Jun 9 00:03:29 lenny kernel: usb 4-2: LPM exit latency is zeroed, disabling LPM.
    Jun 9 00:04:19 lenny kernel: usb 4-2: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
    ...

     

    And in Frigate, I get a corresponding set of repeated errors around the same time:

    ...
    detector.coral INFO : Starting detection process: 126
    frigate.edgetpu INFO : Attempting to load TPU as usb
    frigate.edgetpu INFO : TPU found
    frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
    root INFO : Waiting for detection process to exit gracefully...
    root INFO : Detection process didnt exit. Force killing...
    detector.coral INFO : Starting detection process: 136
    frigate.edgetpu INFO : Attempting to load TPU as usb
    frigate.edgetpu INFO : TPU found
    frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
    root INFO : Waiting for detection process to exit gracefully...
    frigate.watchdog INFO : Detection appears to be stuck. Restarting detection process...
    root INFO : Waiting for detection process to exit gracefully...
    root INFO : Detection process didnt exit. Force killing...
    detector.coral INFO : Starting detection process: 146
    ...

     

    This feels more like an OS, USB, hardware issue than something specific to Frigate... can anyone help point me in the right direction to diagnose this? Thanks in advance.

  7. On 12/17/2019 at 9:19 AM, limetech said:

     

    ...Hence I want to float this idea to you guys in this topic: suppose we rip out BT support from the Unraid kernel and everyone use the @Chuck Claunch method of using BT in a container?  Sure there will be some who don't want to change anything because current config works for them, however moving forward this might be the best overall solution?

    I dislike this solution for a few reasons. The primary of which is that passing in the entire BT bus into a Docker container only works if it's running as privileged and on the host network IIRC. In my configuration, I run my containers in VLANs for network isolation, making it such that they can't access the Bluetooth devices. If the BT support were removed from the host OS, I'd then have to create/maintain a Docker container in a higher-permission config just to run a simple Bluetooth script. Feels like overkill to go in that direction to me. 

     

    It's nice that there's multiple ways to do it so that if someone runs into a driver issue they need to work around, they can use Docker. But I don't think the host OS support necessitates the frequent/trivial OS updates that seem to be the desired reason to remove the support.  

     

    TL;DR - I appreciate and use the native host OS support for bluetooth and would like to keep it.

  8. Thanks for your help. I kinda get it, but when looking at the disk usage before making these changes I would have also said that I had 31GB free. Am I interpreting that first `btrfs fi usage...` command wrong?

     

    How am I to know in the future when a balance operation is required before I "run out of space"?

     

    I ask because I'd like to write a cron job that warns me before I run out of space.

     

    Thanks again for the help.

  9. 23 hours ago, johnnie.black said:

    The btrfs errors are from the docker image, caused by the cache pool filesystem being fully allocated, i.e., it's giving not enough space errors.

     

    Thanks for the pointer. After reading that thread, here's what I did... I'd appreciate if you could check my understanding and help me to know how to avoid this in the future.

     

    1) Checked the space reported by btrfs...

    root@lenny:~# btrfs fi usage /mnt/cache
    Overall:
        Device size:                 352.13GiB
        Device allocated:            238.48GiB
        Device unallocated:          113.64GiB
        Device missing:                  0.00B
        Used:                        182.85GiB
        Free (estimated):             84.24GiB      (min: 84.24GiB)
        Data ratio:                       2.00
        Metadata ratio:                   2.00
        Global reserve:              376.52MiB      (used: 0.00B)
    
    Data,RAID1: Size:118.21GiB, Used:90.79GiB
       /dev/sdb1     118.21GiB
       /dev/sdc1     118.21GiB
    
    Metadata,RAID1: Size:1.00GiB, Used:645.45MiB
       /dev/sdb1       1.00GiB
       /dev/sdc1       1.00GiB
    
    System,RAID1: Size:32.00MiB, Used:48.00KiB
       /dev/sdb1      32.00MiB
       /dev/sdc1      32.00MiB
    
    Unallocated:
       /dev/sdb1     113.64GiB
       /dev/sdc1       1.05MiB

    The way I read that, my RAID1 cache drives have 27.42GB free. Accept for that 1.05MB that's unallocated on /dev/sdc1 which I'm assuming is the problem...

     

    2) I deleted some files off the drive(s) to make some space and ran the balance command as recommended:

    btrfs balance start -dusage=75 /mnt/cache

    That command completed with no errors.

     

    Afterwords, here's what free space is reported...

    root@lenny:/mnt/cache# btrfs fi usage /mnt/cache
    Overall:
        Device size:                 352.13GiB
        Device allocated:            188.16GiB
        Device unallocated:          163.97GiB
        Device missing:                  0.00B
        Used:                        179.96GiB
        Free (estimated):             84.62GiB      (min: 84.62GiB)
        Data ratio:                       2.00
        Metadata ratio:                   2.00
        Global reserve:              330.16MiB      (used: 0.00B)
    
    Data,RAID1: Size:92.05GiB, Used:89.41GiB
       /dev/sdb1      92.05GiB
       /dev/sdc1      92.05GiB
    
    Metadata,RAID1: Size:2.00GiB, Used:585.58MiB
       /dev/sdb1       2.00GiB
       /dev/sdc1       2.00GiB
    
    System,RAID1: Size:32.00MiB, Used:32.00KiB
       /dev/sdb1      32.00MiB
       /dev/sdc1      32.00MiB
    
    Unallocated:
       /dev/sdb1     138.80GiB
       /dev/sdc1      25.16GiB

     

    So, my question is... how much free space on the cache do I actually have now? How do I report on it and monitor it so I can set up some alerts when it's getting close to being a problem in the future?

     

    Given the catastrophic nature of running out of cache space (i.e. my entire infrastructure ground to a halt), I can't rely on this system if this is going to happen without warning in the future.

     

    Any guidance would be helpful.

    Thanks!

  10. Hey Folks,

    I could really use some help. I've run into trouble that I noticed after having upgraded to 6.7 from 6.6.7, but it might not be related to the upgrade.

     

    I first noticed that Docker was filling up /var/log and my Docker containers were stopping on me. I turned off a number of my Docker containers and rebooted to see if I could isolate which Docker was log-spamming. That helped, but eventually my docker containers stopped again after a few hours, complaining about running out of disk space (on /mnt/user/appdata) where there's clearly enough disk space.

     

    In that process, I was exploring the system logs and found the errors below.

     

    I downgraded back to 6.6.7 from 6.7 to see if it'd fix it and it hasn't. The ACPI BIOS errors are present in 6.6.7. As are the BTRFS write errors... so maybe it's likely coincidence that these errors have come about shortly after the 6.7 upgrade. 

     

    Now I'm now seeing errors from Fix Common Problems that "**** Unable to write to cache ****   **** Unable to write to Docker Image ****"... which is no surprise if the cache disks are having issues.

     

    Any clue where to start on this issue?

    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
    May 14 10:18:18 lenny kernel: loop: Write error at byte offset 852369408, length 4096.
    May 14 10:18:18 lenny kernel: print_req_error: I/O error, dev loop2, sector 1664768
    May 14 10:18:18 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0
    May 14 11:07:15 lenny kernel: loop: Write error at byte offset 876773376, length 4096.
    May 14 11:07:15 lenny kernel: print_req_error: I/O error, dev loop2, sector 1712448
    May 14 11:07:15 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 21, rd 0, flush 0, corrupt 0, gen 0
    May 14 19:19:21 lenny kernel: loop: Write error at byte offset 1133211648, length 4096.
    May 14 19:19:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 2213280
    May 14 19:19:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 22, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:06:46 lenny kernel: loop: Write error at byte offset 12656640, length 4096.
    May 14 20:06:46 lenny kernel: print_req_error: I/O error, dev loop2, sector 24720
    May 14 20:06:46 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 23, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:17:13 lenny kernel: dhcpcd[1674]: segfault at 88 ip 00000000004216d2 sp 00007ffd89a6dd70 error 4 in dhcpcd[407000+31000]
    May 14 20:17:50 lenny kernel: loop: Write error at byte offset 146325504, length 4096.
    May 14 20:17:50 lenny kernel: print_req_error: I/O error, dev loop2, sector 285792
    May 14 20:17:50 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 24, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:20:20 lenny kernel: loop: Write error at byte offset 159694848, length 4096.
    May 14 20:20:20 lenny kernel: print_req_error: I/O error, dev loop2, sector 311904
    May 14 20:20:20 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 25, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:22:21 lenny kernel: loop: Write error at byte offset 16392192, length 4096.
    May 14 20:22:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 32016
    May 14 20:22:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 26, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:25:00 lenny kernel: loop: Write error at byte offset 167190528, length 4096.
    May 14 20:25:00 lenny kernel: print_req_error: I/O error, dev loop2, sector 326528
    May 14 20:25:00 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 27, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:26:51 lenny kernel: loop: Write error at byte offset 2207227904, length 4096.
    May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 4310784
    May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 28, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:26:51 lenny kernel: loop: Write error at byte offset 65536, length 4096.
    May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128
    May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128
    May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 29, rd 0, flush 0, corrupt 0, gen 0
    May 14 20:26:51 lenny kernel: BTRFS error (device loop2): error writing primary super block to device 1
    May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in write_all_supers:3781: errno=-5 IO failure (1 errors while writing supers)
    May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in cleanup_transaction:1846: errno=-5 IO failure

     

     

  11. I cannot tell you why or how, but after downgrading from 6.6.7 -> 6.6.6 four days ago, and then upgrading again today from 6.6.6 -> 6.6.7 the issue I reported before (i.e. Custom Network Types not appearing in Docker Container configuration) appears to have gone away. 🤷‍♂️

  12. Hey folks - I ran into a problem with this upgrade.

     

    I have custom network's configured for Docker (MACVLAN).

     

    After the update, all of my containers started just fine. But one of them started to have some networking issues and I started investigating. I tried changing a setting in that container and when I saved the container, it disappeared!

     

    Looking deeper, I notice now that under the networking options for the container config any network which was using a MACVLAN network (e.g. br0.55) now shows Network Type of "None" and the custom networks for MACVLAN aren't an option in the container config.

     

    If I apply any changes to a container now, it functionally disappears.

     

    I'm going to try rolling back to 6.6.6 to see if it solves the issue... but thought you should know. Anyone else seeing this?

  13. You're right that masquerading is typically done for traffic exiting your local network, but it can also be used for things like the OpenVPN config.

     

    In the case of OpenVPN it's handing out IP addresses to the clients in a different range than the local network. In the bridged docker network configuration installation of OpenVPN, those remote client IP addresses were being masqueraded somewhere along the path before they left the host so they showed up as my host server's IP on the local network.

     

    Switching to host networking exposed those client IPs to the local network. At some point I'd like to understand why that was... but for now, I can configure my local network to work around the problem. At this point it's kinda academic for me to learn why that was; I'm curious. 

  14. I honestly don't have a clue then how OpenVPN-AS (or any other VPN) running in Docker host networking mode is masquerading it's client traffic, then. It just doesn't make sense to me that there's no way to masquerade the OpenVPN client IP addresses on their way out the door.

     

    In the end, I gave up trying to get my host to masquerade the traffic and instead added static routes for my OpenVPN client address space to my networking stack outside of the unraid host. It's a fine solution, just annoying.

     

    If someone trips across this post in the future and finds a solution, I'd be interested in knowing what you figure out.

     

    Thanks.

     

     

  15. @bonienl thanks - that’s incredibly useful to know. Is that an unraid specific feature/limitation, or a general networking characteristic?

     

    Do you know how OpenVPN-AS (or other VPNs which run in host mode) and their hosts are configured to properly route the traffic?

     

    I could probably put a NAT entry on my router, but I don’t like the idea of solving this issue “off box” if there’s a cleaner way to do it.