mwasserman

Members
  • Posts

    11
  • Joined

  • Last visited

Everything posted by mwasserman

  1. @exibit, no dedicated GPU on this system. Just using Intel Quick Sync for Plex transcoding. I've now been up 28 days running 6.11.5. No plans to move from this version for awhile.
  2. Downgraded to 6.11.5 and have been up for 21 days, Definitely not a hardware issue likely just another bug in the 6.12.X of Unraid. Going to stay with 6.11.5 for awhile now.
  3. I've tried a few different changes, so far still getting random crashes every 3-6 days. Here is what I have done and some new information. Can anyone help me make sense of the errors I as able to see on the monitor Attached monitor and keyboard so I can see the terminal after crash Ran Memtest86+ v6.20. Passed 1 round Changed out power supply Upgraded to 6.12.3 Server ran for 6 days before complete dead lock. Nothing on monitor or keyboard, numlock didn't even work Read on other posts, this can be caused by duplicati docker. Shut down duplicati docker Crashed after 2 days but this time the terminal still worked. Screenshot of errors OCR of errors to make this searchable Tower login: crond [1420]: exit status 126 from user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null crond [11850]: unable to exec /usr/sbin/sendmail: cron output for user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null to /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [14201: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null Hint: Num Lock on Tower login: crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null I tried to call "diagnostics" from the command line to do diagnostics collection but received "command not found I just upgrade to 6.12.4, lets see if that makes any difference. Any other suggestions for things to try? My next step may be to roll back to a pre 6.12 version as everything seems to have gone down hill as of 6.12.X
  4. HI everyone, I've been running Unraid on this Lenovo ThinkServer TS140 for about 6 years without a single issue. As of about 2-3 months ago I've been getting random lockups roughly every 2-3 weeks. Unraid 6.12.2 Process: Intel Xeon E3-1246 v3 Memory: 32GB ECC Running many dockers and VMs, nothing new between stable and random crashes. syslog to usb stick was enabled during the last crash. diagnostics dump and syslog attached. The last crash occurred sometime between these 2 lines. Aug 3 02:00:38 Tower root: /mnt/cache: 188.6 GiB (202545577984 bytes) trimmed on /dev/sdg1 Aug 3 18:19:16 Tower kernel: microcode: microcode updated early to revision 0x28, date = 2019-11-12 I'm in the process of running Memtest86+ v6.20 now to see if anything comes up. Any help to figure out what is going on here is much approached. tower-diagnostics-20230803-1828.zip syslog
  5. After reading this I really had high hopes this was everything I was doing wrong... No luck 😞 Still getting this error in the Unraid System Log Mar 28 14:37:37 Tower kernel: eth0: renamed from veth8c00d88 Mar 28 14:37:51 Tower kernel: apex 0000:03:00.0: RAM did not enable within timeout (12000 ms) Mar 28 14:37:51 Tower kernel: apex 0000:03:00.0: Error in device open cb: -110 Mar 28 14:38:59 Tower kernel: veth8c00d88: renamed from eth0 The eth0: renamed error is really strange and new. Not sure if this is related to trying to use the PCIe Coral at all. Not giving up yet but putting in my order for a USB Coral (they look to be backordered for 2+ months). EDIT: Adding insult to injury, I pulled the PCIe to m-PCIe adapter and Coral card out of my Unraid box and put it into my Windows box. Tried the example on https://coral.ai/docs/m2/get-started/#4-run-a-model-on-the-edge-tpu and it worked perfectly. Now at least I know the Coral m-PCIe card and adapter are working correctly. It's just a matter of figuring out why Unraid won't handle the card correctly. I tried to pass-though the card to a VM on my Unraid box and Unraid refuses to list the card as able to pass-though in a VM. I bound the IOMMU group (just the 1 Coral card) to VFIO but the card will not list under "Other PCI Devices"
  6. This got me into an interesting debug path. I had Unraid setup on a bonded network (802.3ad) and it appears that this network style comes up after trying to load the plugins or ends up in a race condition where they both need to happen at the same time and the network doesn't come up in time. I've removed the bonded network and now the plugin loads after every reboot. Thank you for pointing me in the correct direction. I now get a new error that looks to be purely a Google Coral Issue I'm going to do some searching and ask in the Google Coral forums to see if this is a known problem.
  7. Thank you for correcting me. There goes that idea of why it isn't working I double checked my mapping and it is a device. Removed and recreated it just to be 100% sure. Same results as before, "No EdgeTPU detected". I was really hoping I had missed this and you were right. The search goes on. I do run pfBlockerNG on pfsense (not in a VM). Checked the logs on pfBlocker and didn't see it blocking anything from my Unraid box. No Plugin in Error State, this is my plugin page just before a reboot, I had just installed the coral module driver. Now after a reboot (I collected the Diagnostics Logs at this point) tower-diagnostics-20210324-0917.zip
  8. After seeing the great progress that was being made to get the Mini PCIe Coral working I bought one with an adapter to try my luck. It's not going as smoothly as I had hoped. Maybe someone here can point me in the correct direction or next steps to help debug Unraid 6.9.1 Adapter I am using: Ableconn PEX-MP117 Mini PCI-E to PCI-E Adapter Card Card correctly shows up in Unraid I have installed "Coral Accelerator Module Drivers" From terminal, if I run the below command I get a return suggesting the card is correctly installed root@Tower:~# ls /dev/apex_0 /dev/apex_0 I have also checked lsmod and can see apex and gasket loaded root@Tower:~/apex/packages# lsmod Module Size Used by apex 16384 0 gasket 90112 1 apex I have the card passed though to frigate container My frigate container works great with CPU processing so I believe my configuration is good but when I switch to detectors: coral: type: edgetpu device: pci After an Unraid system restart I get 1 start of Frigate where it says it finds the EdgeTPU but soon crashes. After that every time I start the container I get the following errors * Starting nginx nginx ...done. Starting migrations peewee_migrate INFO : Starting migrations There is nothing to migrate peewee_migrate INFO : There is nothing to migrate detector.coral INFO : Starting detection process: 41 frigate.app INFO : Camera processor started for living_room: 44 frigate.edgetpu INFO : Attempting to load TPU as pci frigate.app INFO : Camera processor started for kitchen: 46 frigate.edgetpu INFO : No EdgeTPU detected. Process detector:coral: frigate.app INFO : Camera processor started for garage: 47 frigate.app INFO : Camera processor started for backyard: 49 frigate.app INFO : Capture process started for living_room: 50 frigate.app INFO : Capture process started for kitchen: 52 frigate.app INFO : Capture process started for garage: 57 frigate.app INFO : Capture process started for backyard: 59 frigate.mqtt INFO : MQTT connected Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 152, in load_delegate delegate = Delegate(library, options) File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 111, in __init__ raise ValueError(capture.message) ValueError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/frigate/frigate/edgetpu.py", line 124, in run_detector object_detector = LocalObjectDetector(tf_device=tf_device, num_threads=num_threads) File "/opt/frigate/frigate/edgetpu.py", line 63, in __init__ edge_tpu_delegate = load_delegate('libedgetpu.so.1.0', device_config) File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 154, in load_delegate raise ValueError('Failed to load delegate from {}\n{}'.format( ValueError: Failed to load delegate from libedgetpu.so.1.0 frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate... frigate.app INFO : Stopping... frigate.record INFO : Exiting recording maintenance... frigate.object_processing INFO : Exiting object processor... frigate.events INFO : Exiting event processor... frigate.events INFO : Exiting event cleanup... frigate.watchdog INFO : Exiting watchdog... frigate.stats INFO : Exiting watchdog... peewee.sqliteq INFO : writer received shutdown request, exiting. root INFO : Waiting for detection process to exit gracefully... watchdog.backyard INFO : Terminating the existing ffmpeg process... Final questions Why does Unraid look to be seeing the EdgeTPU but the container can't talk to it? Is there a way to keep the "Coral Accelerator Module Drivers" between reboots? It looks to go away after every Unraid reboot. After many more hours of this I think it just comes down to the driver being for the wrong kernel. @ich777 Any chance you can build the Coral PCI driver for Unraid 6.9.1 (kernel 5.10.21)? Thank you in advance!
  9. Just want to point out 2 issues I ran into and how I solved them after updating to 6.9.1 My br0 network is a 802.3ad bonded pair with bridging enabled. After the first reboot any docker container that was using br0 stopped working. To solve this I ran the following 2 lines from the terminal console rm /var/lib/docker/network/files/local-kv.db /etc/rc.d/rc.docker restart Virtual Machine "VNC Remote" from within the web browser stopped working with a "SyntaxError: The requested module '../core/util/browser.js" error Clearing Chrome "Cached images and files" fixed this
  10. I have attached my diagnostics zip file tower-diagnostics-20200401-0807.zip
  11. Hi all, I've been using Unraid for about a year now without any major issues. I looked into my log the other day and started to notice many the follow warning. Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503 In my case sdg is an SSD Cache drive in an array of 2 Cache drives. I'm assumming (sdg1) and sdg are referring to the same drive? Is this correct? I've been googling around and found many many pointing to bad SATA cables suggested to run scrub. I've run the scrub operation and it is reporting "no errors found". What are my next steps in trying to fix this warning?