Jump to content

BurningSky

Members
  • Posts

    48
  • Joined

  • Last visited

Posts posted by BurningSky

  1. I've had Unraid hang twice in the last 24 hours, both times for around 20-40 minutes, before recovering again. Syslog looks the same for both times, messages about disk temps being around 30C followed by "nginx: 2024/03/20 14:00:29 [error] 831#831: *31271114 limiting requests, excess: 21.000 by zone "authlimit", client: 192.168.3.30, server: , request: "GET /login HTTP/2.0"" then "php-fpm[7716]: [WARNING] [pool www] child 10760 exited on signal 9 (SIGKILL) after 936.425570 seconds from start" type messages.

     

    Any insights would be appreciated!

    ragon-diagnostics-20240320-1616.zip

  2. 55 minutes ago, JorgeB said:

    The problem was the log tree, so that was the correct fix, if that was the only issue the pool should be fine now, if the same issue reoccurs in the near future, and that is not that uncommon, then I would recommend reformatting the pool.

    Thanks for the feedback, I'll keep an eye on it. Doesn't point to a disk failure though?

  3. I noticed a container had stopped with a message about the log being RO so having looked at a couple of other similar looking issues decided to delete the docker.img to see if that would help. After that the Docker service wouldn't restart and I noticed in the logs it looked like a cache issue so I rebooted to see if that would resolve it.

     

    That let to the cache showing an "Unmountable: unsupported or no file system" error. Based on another forum post I ran "btrfs rescue zero-log /dev/sdi1" and restared the array and the pool appears to be back now but I'm worried if there is a deeper issue? I've attached 2 sets of logs, 1411 was before I ran the rescue command and 1414 is after. Does it look like sdi1 is going to fail or was there another potential cause.

    ragon-diagnostics-20240208-1414.zip ragon-diagnostics-20240208-1411.zip

  4. I just got a message that my /var/log is getting full and when I look there are hundreds of repeats of

     

    Jan 21 10:03:35 Ragon kernel: device eth0 left promiscuous mode
    Jan 21 10:06:51 Ragon kernel: device eth0 entered promiscuous mode

     

    I've seen some people mention similar issues with multiple dockers on the same port but I can't see any similar issues, most of my containers are on br0 if they use a repeated port like 8080 with their own IPs.

     

    I have a Realtek NIC but I've installed the driver from CA for that.

    ragon-diagnostics-20240121-1004.zip

  5. I have a mPCIe Coral TPU connected to a PCIe to mPCIe adaptor which is then passed through to Frigate Docker which seems to be working but today I got a notification that my syslog is filling up and I've had a look and see this error relating to the Coral device but I don't know what could be causing this error? The Docker container is running in priviledged mode and passed through via /dev/apex_0

     

    Nov  5 08:46:19 Ragon kernel: pcieport 0000:00:01.3: AER: Multiple Corrected error received: 0000:25:00.0
    Nov  5 08:46:19 Ragon kernel: apex 0000:25:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    Nov  5 08:46:19 Ragon kernel: apex 0000:25:00.0:   device [1ac1:089a] error status/mask=00000041/00006000
    Nov  5 08:46:19 Ragon kernel: apex 0000:25:00.0:    [ 0] RxErr                 
    Nov  5 08:46:19 Ragon kernel: apex 0000:25:00.0:    [ 6] BadTLP    

     

    ragon-diagnostics-20231105-0852.zip

  6.   

    I sent back the Coral USB I was having issues with and have swapped it with a mini PCIE one which I have connected to a PCIE to mPCIE adaptor but I am still having some issues. I have installed the drivers and it is being recognised by the Coral Driver app:

    Coral TPU1:
    Status:	ALIVE
    Temperature:	39.30 °C
    Frequency:	500 MHz
    Driver Version:	1.2
    Framework Version:	1.1.4 

     

    I can also see the device under sysdevs:

    	[1ac1:089a] 25:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU

     

    I had previously deleted the section for mapping the TPU but I have readded it with the following:

    Config Type: Device
    Name: Coral TPU/NCS2 Mapping
    Value: /dev/apex_0
    Description: Use /dev/bus/usb for USB devices and /dev/apex_0 for PCIe devices (you must install the drivers first for PCIe devices). Remove this if you are not using it.

     

    In Frigate I added the following to detectors:

    detectors:
      coral:
        type: edgetpu
        device: pci

     

    But I get an error saying that no EdgeTPU was detected, have I misconfigured something?

    2023-11-03 12:49:18.767219029  [INFO] Preparing go2rtc config...
    2023-11-03 12:49:18.767580529  [INFO] Starting Frigate...
    2023-11-03 12:49:18.768623682  [INFO] Starting NGINX...
    2023-11-03 12:49:18.957519850  [WARN] Using go2rtc binary from '/config/go2rtc' instead of the embedded one
    2023-11-03 12:49:18.960160839  [INFO] Starting go2rtc...
    2023-11-03 12:49:19.063972373  12:49:19.063 INF go2rtc version 1.8.1 linux/amd64
    2023-11-03 12:49:19.064282260  12:49:19.064 INF [api] listen addr=0.0.0.0:1984
    2023-11-03 12:49:19.064866554  12:49:19.064 INF [rtsp] listen addr=0.0.0.0:8554
    2023-11-03 12:49:19.064871792  12:49:19.064 INF [webrtc] listen addr=0.0.0.0:8555/tcp
    2023-11-03 12:49:19.771751395  [2023-11-03 12:49:19] frigate.app                    INFO    : Starting Frigate (0.12.1-367d724)
    2023-11-03 12:49:19.811722914  [2023-11-03 12:49:19] peewee_migrate                 INFO    : Starting migrations
    2023-11-03 12:49:19.815743064  [2023-11-03 12:49:19] peewee_migrate                 INFO    : There is nothing to migrate
    2023-11-03 12:49:19.840242161  [2023-11-03 12:49:19] detector.coral                 INFO    : Starting detection process: 577
    2023-11-03 12:49:19.956163315  [2023-11-03 12:49:19] detector.cuda                  INFO    : Starting detection process: 580
    2023-11-03 12:49:19.956171347  [2023-11-03 12:49:19] frigate.app                    INFO    : Output process started: 585
    2023-11-03 12:49:19.956179867  [2023-11-03 12:49:19] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci
    2023-11-03 12:49:19.956187410  Process detector:coral:
    2023-11-03 12:49:19.956194534  [2023-11-03 12:49:19] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
    2023-11-03 12:49:19.956218629  [2023-11-03 12:49:19] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.

     

    Is there any way to check if the docker is actually seeing the device?

  7. 35 minutes ago, JorgeB said:

    Parity looks more like a power/connection issue, replace cables, disk4 may be failing, run an extended SMART test.

    Thanks for that, I'll check the cables for the parity disk. What did you notice in the logs that led you to believe that?

  8. Hoping someone might be able too help me here. I've had a few issues recently with Unraid, but a chunk of them seemed to end up being down to a bad SSD in my cache pool which I have now taken out of the pool (but is still in the server for now). I had noticed that my parity disk was showing increasing numbers of errors but it would flucuate up and down and seemed to settle after the changes I made recently (replaced the SSD with SMART issues and swapped out my SAS coontroller).

     

    However, yesterday there were ~3000 errors on parity and now there are over 9000! I've taken a quick look in the syslog and I've started to see these errors repeating:

     

    Oct 16 05:26:35 Ragon kernel: I/O error, dev sdd, sector 3019156232 op 0x0:(READ) flags 0x0 phys_seg 29 prio class 2
    Oct 16 05:26:35 Ragon kernel: md: disk0 read error, sector=3019156168
    Oct 16 07:37:41 Ragon kernel: I/O error, dev sdg, sector 1016 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
    Oct 16 07:37:41 Ragon kernel: md: disk4 read error, sector=952

     

    sdd is in there a lot more, which is the parity disk, so looks like it might been failing, which is annoying as it's only a year or so old.

     

    Wondering if anyone could take a look and suggest if there are any other issues other than 2 failing disks, there aren't any SMART errors on either disk.

     

  9. I ordered a Coral TPU to use with Frigate and at first it seemed to work but now it isn't. When I do lsusb on the host the device is showing as a Global Unichip Corp device rather than Google, and then when passed through to a container it just comes up with no name at all.

     

    I had installed ich777's Coral drivers but was told to remove that as maybe that causes issues so I'm wondering if maybe they haven't uninstalled.

     

    Has anyone come across this issue before and have any idea how to resolve?

  10. 51 minutes ago, yayitazale said:

    Maybe you need to reboot the host...

    I don't think I've misconfigured anything...

     

    docker run
      -d
      --name='frigate'
      --net='br0'
      --ip='192.168.0.48'
      --privileged=true
      -e TZ="Europe/London"
      -e HOST_OS="Unraid"
      -e HOST_HOSTNAME="Unraid"
      -e HOST_CONTAINERNAME="frigate"
      -e 'TCP_PORT_5000'='5000'
      -e 'TCP_PORT_8554'='8554'
      -e 'FRIGATE_RTSP_PASSWORD'='xxxxx'
      -e 'NVIDIA_VISIBLE_DEVICES'='GPU-6b2....5a0'
      -e 'NVIDIA_DRIVER_CAPABILITIES'='compute,utility,video'
      -e 'TCP_PORT_8555'='8555'
      -e 'UDP_PORT_8555'='8555'
      -e 'TCP_PORT_1984'='1984'
      -l net.unraid.docker.managed=dockerman
      -l net.unraid.docker.webui='http://[IP]:[PORT:5000]'
      -l net.unraid.docker.icon='https://raw.githubusercontent.com/yayitazale/unraid-templates/main/frigate.png'
      -v '/mnt/cache/appdata/frigate':'/config':'rw'
      -v '/mnt/user/Media/frigate':'/media/frigate':'rw'
      -v '/mnt/user/appdata/trt-models':'/trt-models':'ro'
      -v '/mnt/user/appdata/trt-models':'/trt-models':'ro'
      -v '/etc/localtime':'/etc/localtime':'rw'
      --device='/dev/bus/usb'
      --shm-size=256mb
      --mount type=tmpfs,target=/tmp/cache,tmpfs-size=100000000
      --restart unless-stopped
      --runtime=nvidia 'ghcr.io/blakeblackshear/frigate:stable-tensorrt'

     

  11. 13 hours ago, yayitazale said:

    Are you using the original cable and a 3.0 USB port?

    Just had a look at lsusb on the host and in the container and noticed it's started misbehaving...

     

    Unraid:

    root@Ragon:~# lsusb
    Bus 006 Device 002: ID 1a6e:089a Global Unichip Corp.
    Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 005 Device 002: ID 0781:5567 SanDisk Corp. Cruzer Blade
    Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 003 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
    Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 001 Device 002: ID 1cf1:0030 Dresden Elektronik ZigBee gateway [ConBee II]
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

     

    Frigate:

    # lsusb
    Bus 006 Device 002: ID 1a6e:089a  
    Bus 006 Device 001: ID 1d6b:0003 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller
    Bus 005 Device 002: ID 0781:5567 SanDisk Cruzer Blade
    Bus 005 Device 001: ID 1d6b:0002 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller
    Bus 004 Device 001: ID 1d6b:0003 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller
    Bus 003 Device 002: ID 051d:0002 American Power Conversion Back-UPS RS 900G FW:879.L4 .I USB FW:L4  
    Bus 003 Device 001: ID 1d6b:0002 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller
    Bus 002 Device 001: ID 1d6b:0003 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller
    Bus 001 Device 002: ID 1cf1:0030 dresden elektronik ingenieurtechnik GmbH ConBee II
    Bus 001 Device 001: ID 1d6b:0002 Linux 6.1.49-Unraid xhci-hcd xHCI Host Controller

     

  12. On 9/27/2023 at 9:31 PM, yayitazale said:

    Can you test it with another computer with any of the examples of https://coral.ai/examples/#code-examples?

     

    Can you see it listed on the devices on unraid?

    Looks like the module works so I assume it's the usb passthrough? Is there another method to passthrough I should try?

     

    python3 examples/classify_image.py \
    --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \
    --labels test_data/inat_bird_labels.txt \
    --input test_data/parrot.jpg
    /Users/burningsky/Downloads/edgetpu_runtime/coral/pycoral/examples/classify_image.py:79: DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
      image = Image.open(args.input).convert('RGB').resize(size, Image.ANTIALIAS)
    ----INFERENCE TIME----
    Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
    13.2ms
    2.9ms
    2.9ms
    2.8ms
    2.9ms
    -------RESULTS--------
    Ara macao (Scarlet Macaw): 0.75781

     

×
×
  • Create New...