mwasserman

October 21, 2023

@exibit, no dedicated GPU on this system. Just using Intel Quick Sync for Plex transcoding.

I've now been up 28 days running 6.11.5. No plans to move from this version for awhile.

October 15, 2023

Downgraded to 6.11.5 and have been up for 21 days, Definitely not a hardware issue likely just another bug in the 6.12.X of Unraid. Going to stay with 6.11.5 for awhile now.

September 5, 2023

I've tried a few different changes, so far still getting random crashes every 3-6 days.

Here is what I have done and some new information. Can anyone help me make sense of the errors I as able to see on the monitor

Attached monitor and keyboard so I can see the terminal after crash
Ran Memtest86+ v6.20. Passed 1 round
Changed out power supply
Upgraded to 6.12.3
Server ran for 6 days before complete dead lock. Nothing on monitor or keyboard, numlock didn't even work
Read on other posts, this can be caused by duplicati docker. Shut down duplicati docker

Crashed after 2 days but this time the terminal still worked. Screenshot of errors

OCR of errors to make this searchable

Tower login: crond [1420]: exit status 126 from user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null 
crond [11850]: unable to exec /usr/sbin/sendmail: cron output for user root /usr/bin/run-parts /etc/cron.hourly 1> /dev/null to /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [14201: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null
Hint: Num Lock on
Tower login: crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null 
crond [1420]: exit status 135 from user root /usr/local/emhttp/plugins/dynamix/scripts/monitor &> /dev/null

I tried to call "diagnostics" from the command line to do diagnostics collection but received "command not found
I just upgrade to 6.12.4, lets see if that makes any difference.

Any other suggestions for things to try? My next step may be to roll back to a pre 6.12 version as everything seems to have gone down hill as of 6.12.X

August 4, 2023

HI everyone,

I've been running Unraid on this Lenovo ThinkServer TS140 for about 6 years without a single issue. As of about 2-3 months ago I've been getting random lockups roughly every 2-3 weeks.

Unraid 6.12.2
Process: Intel Xeon E3-1246 v3
Memory: 32GB ECC
Running many dockers and VMs, nothing new between stable and random crashes.

syslog to usb stick was enabled during the last crash. diagnostics dump and syslog attached.

The last crash occurred sometime between these 2 lines.
Aug 3 02:00:38 Tower root: /mnt/cache: 188.6 GiB (202545577984 bytes) trimmed on /dev/sdg1
Aug 3 18:19:16 Tower kernel: microcode: microcode updated early to revision 0x28, date = 2019-11-12

I'm in the process of running Memtest86+ v6.20 now to see if anything comes up.

Any help to figure out what is going on here is much approached.

tower-diagnostics-20230803-1828.zip syslog

March 28, 2021

15 hours ago, digiblur said:

You don't need to pass in the device to the container. It really is as simple as loading the plugin to get the drivers going and then telling Frigate to use it.

detectors:
coral_pci:
type: edgetpu
device: pci

After reading this I really had high hopes this was everything I was doing wrong... No luck 😞

Still getting this error in the Unraid System Log

Mar 28 14:37:37 Tower kernel: eth0: renamed from veth8c00d88
Mar 28 14:37:51 Tower kernel: apex 0000:03:00.0: RAM did not enable within timeout (12000 ms)
Mar 28 14:37:51 Tower kernel: apex 0000:03:00.0: Error in device open cb: -110
Mar 28 14:38:59 Tower kernel: veth8c00d88: renamed from eth0

The eth0: renamed error is really strange and new. Not sure if this is related to trying to use the PCIe Coral at all. Not giving up yet but putting in my order for a USB Coral (they look to be backordered for 2+ months).

EDIT:

Adding insult to injury, I pulled the PCIe to m-PCIe adapter and Coral card out of my Unraid box and put it into my Windows box. Tried the example on https://coral.ai/docs/m2/get-started/#4-run-a-model-on-the-edge-tpu and it worked perfectly. Now at least I know the Coral m-PCIe card and adapter are working correctly. It's just a matter of figuring out why Unraid won't handle the card correctly.

I tried to pass-though the card to a VM on my Unraid box and Unraid refuses to list the card as able to pass-though in a VM. I bound the IOMMU group (just the 1 Coral card) to VFIO but the card will not list under "Other PCI Devices"

March 25, 2021

13 hours ago, ich777 said:

I think from what I see something is preventing your Unraid box on boot from connecting to the internet itself, have you set a custom DNS or something like that in Unraid itself?

This got me into an interesting debug path. I had Unraid setup on a bonded network (802.3ad) and it appears that this network style comes up after trying to load the plugins or ends up in a race condition where they both need to happen at the same time and the network doesn't come up in time. I've removed the bonded network and now the plugin loads after every reboot. Thank you for pointing me in the correct direction.

I now get a new error that looks to be purely a Google Coral Issue

Quote

Mar 24 12:52:10 Tower kernel: x86/PAT: frigate.detecto:29004 map pfn RAM range req uncached-minus for [mem 0x6f1c4c000-0x6f1c4ffff], got write-back
Mar 24 12:53:13 Tower kernel: apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
Mar 24 12:53:25 Tower kernel: apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
Mar 24 12:53:25 Tower kernel: apex 0000:06:00.0: Error in device open cb: -110

I'm going to do some searching and ask in the Google Coral forums to see if this is a known problem.

March 24, 2021

7 hours ago, ich777 said:

I already built it for Unraid 6.9.1 Kernel v5.10.21 otherwise the the Plugin won't work on this Unraid version.

Thank you for correcting me. There goes that idea of why it isn't working

7 hours ago, ich777 said:

EDIT: I think I got what is wrong here, you passed over a path to the container but it's a device:

I double checked my mapping and it is a device. Removed and recreated it just to be 100% sure. Same results as before, "No EdgeTPU detected". I was really hoping I had missed this and you were right. The search goes on. 650146000_CoralAccelerationModule.jpg.c8877c3616455a01e229c755ae5640a2.jpg

7 hours ago, ich777 said:

This shouldn't happen, have you a active internet connection on boot or better speaking have you anything like PiHole or a VM that is your Firewall on your Unraid box?

I do run pfBlockerNG on pfsense (not in a VM). Checked the logs on pfBlocker and didn't see it blocking anything from my Unraid box.

7 hours ago, ich777 said:

The Diagnostics (Tools -> Diagnostics -> Download -> drop the downloaded file here in the textbox) from a reboot with previously installed Coral Accelerator Module Drivers would be very helpful to troubleshoot why it disappear.

Have you looked into the Plugins tab in Unraid if there is a Plugin in Error state and removed that in the first place and installed it afterwards?

No Plugin in Error State, this is my plugin page just before a reboot, I had just installed the coral module driver.

Now after a reboot (I collected the Diagnostics Logs at this point)

tower-diagnostics-20210324-0917.zip

March 23, 2021

After seeing the great progress that was being made to get the Mini PCIe Coral working I bought one with an adapter to try my luck. It's not going as smoothly as I had hoped. Maybe someone here can point me in the correct direction or next steps to help debug

Unraid 6.9.1
Adapter I am using: Ableconn PEX-MP117 Mini PCI-E to PCI-E Adapter Card
Card correctly shows up in Unraid
I have installed "Coral Accelerator Module Drivers"
From terminal, if I run the below command I get a return suggesting the card is correctly installed

root@Tower:~# ls /dev/apex_0
/dev/apex_0

I have also checked lsmod and can see apex and gasket loaded

root@Tower:~/apex/packages# lsmod
Module                  Size  Used by
apex                   16384  0
gasket                 90112  1 apex

I have the card passed though to frigate container
My frigate container works great with CPU processing so I believe my configuration is good but when I switch to

detectors:
  coral:
    type: edgetpu
    device: pci

After an Unraid system restart I get 1 start of Frigate where it says it finds the EdgeTPU but soon crashes. After that every time I start the container I get the following errors

* Starting nginx nginx
...done.
Starting migrations
peewee_migrate INFO : Starting migrations
There is nothing to migrate
peewee_migrate INFO : There is nothing to migrate
detector.coral INFO : Starting detection process: 41
frigate.app INFO : Camera processor started for living_room: 44
frigate.edgetpu INFO : Attempting to load TPU as pci
frigate.app INFO : Camera processor started for kitchen: 46
frigate.edgetpu INFO : No EdgeTPU detected.
Process detector:coral:
frigate.app INFO : Camera processor started for garage: 47
frigate.app INFO : Camera processor started for backyard: 49
frigate.app INFO : Capture process started for living_room: 50
frigate.app INFO : Capture process started for kitchen: 52
frigate.app INFO : Capture process started for garage: 57
frigate.app INFO : Capture process started for backyard: 59
frigate.mqtt INFO : MQTT connected
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 152, in load_delegate
delegate = Delegate(library, options)
File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 111, in __init__
raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/frigate/frigate/edgetpu.py", line 124, in run_detector
object_detector = LocalObjectDetector(tf_device=tf_device, num_threads=num_threads)
File "/opt/frigate/frigate/edgetpu.py", line 63, in __init__
edge_tpu_delegate = load_delegate('libedgetpu.so.1.0', device_config)
File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 154, in load_delegate
raise ValueError('Failed to load delegate from {}\n{}'.format(

ValueError: Failed to load delegate from libedgetpu.so.1.0


frigate.watchdog INFO : Detection appears to have stopped. Exiting frigate...
frigate.app INFO : Stopping...
frigate.record INFO : Exiting recording maintenance...
frigate.object_processing INFO : Exiting object processor...
frigate.events INFO : Exiting event processor...
frigate.events INFO : Exiting event cleanup...
frigate.watchdog INFO : Exiting watchdog...
frigate.stats INFO : Exiting watchdog...
peewee.sqliteq INFO : writer received shutdown request, exiting.
root INFO : Waiting for detection process to exit gracefully...
watchdog.backyard INFO : Terminating the existing ffmpeg process...

Final questions

Why does Unraid look to be seeing the EdgeTPU but the container can't talk to it?
Is there a way to keep the "Coral Accelerator Module Drivers" between reboots? It looks to go away after every Unraid reboot.

After many more hours of this I think it just comes down to the driver being for the wrong kernel.

@ich777 Any chance you can build the Coral PCI driver for Unraid 6.9.1 (kernel 5.10.21)? Thank you in advance!

March 16, 2021

Just want to point out 2 issues I ran into and how I solved them after updating to 6.9.1

My br0 network is a 802.3ad bonded pair with bridging enabled. After the first reboot any docker container that was using br0 stopped working. To solve this I ran the following 2 lines from the terminal console

rm /var/lib/docker/network/files/local-kv.db
/etc/rc.d/rc.docker restart

Virtual Machine "VNC Remote" from within the web browser stopped working with a "SyntaxError: The requested module '../core/util/browser.js" error

Clearing Chrome "Cached images and files" fixed this

April 1, 2020

I have attached my diagnostics zip file

tower-diagnostics-20200401-0807.zip

April 1, 2020

Hi all, I've been using Unraid for about a year now without any major issues. I looked into my log the other day and started to notice many the follow warning.

Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:09 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503
Mar 31 23:52:10 Tower kernel: BTRFS error (device sdg1): parent transid verify failed on 620778586112 wanted 16531233 found 15373503

In my case sdg is an SSD Cache drive in an array of 2 Cache drives. I'm assumming (sdg1) and sdg are referring to the same drive? Is this correct?

I've been googling around and found many many pointing to bad SATA cables suggested to run scrub. I've run the scrub operation and it is reporting "no errors found". What are my next steps in trying to fix this warning?

mwasserman

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by mwasserman

6.12.X Random Crashes roughly every 2-3 weeks

6.12.X Random Crashes roughly every 2-3 weeks

6.12.X Random Crashes roughly every 2-3 weeks

6.12.X Random Crashes roughly every 2-3 weeks

[SUPPORT] blakeblackshear - Frigate

[SUPPORT] blakeblackshear - Frigate

[SUPPORT] blakeblackshear - Frigate

[SUPPORT] blakeblackshear - Frigate

Unraid OS version 6.9.1 available

Tower kernel: BTRFS error (device sdg1)

Tower kernel: BTRFS error (device sdg1)