Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Kernel Bug Error

Featured Replies

  • Author

It's a puzzle allright!

 

But, it has to be docker something I think. The network was a place to begin, but probably not the cause....

 

With docker running, but with all containers stopped, it is currently 57% of the way through the parity check. I would have expected a crash by now, but we will let it finish this time. Assuming there is no crash then it will not be the HBA - if it can complete the whole check and no skip a beat then there is no cause to suspect it....

 

It is also worth noting that appdata runs on a ZFS m.2 pool - so it isn't on the array, and it isn't on the HBA. So that is pretty much ruled out as well. I could destroy all dockers and start from scratch but it is a lot of work to set them all up....and since there is no (easy) docker-compose and most dockers (like the Arr stack) don't seem to have a simple export/import configuration option.... though presumably I can find the config files in the appdata folder....

 

In terms of docker:

  1. Weird network stuff? Unlikely, should be able to handle lots of dockers either bridged or hosted on own IPs, should be able to handle lots of traffic. 
     
  2. External devices - 2x coral.ai for Frigate. There are 'reports' of these causing problems but logs and root causes are not well documented. Options:
    - Run docker with no Frigate
    - Run docker with Frigate and use CPU detection only
    - Run docker with Frigate and coral.ai (one instance)
    - Run docker with Frigate and coral.ai (different USB bus USB2 vs USB3)
     
  3. Accessing data on the array via dockers...maybe I'm doing this completely wrong.
    - I typically make the container path and the host path the same, that way I don't need to define different paths/aliases for each.
    - For SABnzbd (for example) both are set to /mnt/user/www_downloads/complete/usenet/ for completed downloads. All containers are similarly configured.
    - Could badly configured container read/writes to the system make Unraid unhappy? Presumably....is my approach OK?
     
  4. Can still put everythign back to SATA, but if parity check works then this is probably moot. 

Computer parts will all be here today except for power supply - apparently they never had the quantity they claimed, so need to wait until tomorrow for that (all going well).

 

And that concludes the lunch break (mostly). 🙂

  • Replies 88
  • Views 11.4k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • ChirpyTurnip
    ChirpyTurnip

    So, it has been a minute. Where am I at?   After building a new machine based on a Core Ultra CPU I'm happy to report that everything is back on one machine, two copies of frigate, all of th

  • ChirpyTurnip
    ChirpyTurnip

    Just got home. Should have provided more details, but I was at work when I saw this and figured I'd post it and see what comes back.   The time that this error occurred the system had alread

  • ChirpyTurnip
    ChirpyTurnip

    OK....those were some serious steps to work through! Tried to swap out the USB key, no joy. Got stuck in a "you must license this box or else loop" so no option to run a trail license for a bi

Posted Images

  • Community Expert

?
image.png.b0aa700bf562367158fc071d87cd5069.png

there's a docker compose plugin (I use this)
I was exploring that for other network related things for your answer, but it appeared you wanted to use unraid template system.

as example netprobe:
image.thumb.png.1cdc53fb166c18b75b16e0ff5f2bf247.png

run multiple docker in its own bridge network.

since zfs and docker image, are you using the xfs verison?
image.png.1b4b6a9e4aa1cb5bab0f8d762a214653.png
what storage driver? as 7 did become rc1 and i have yet to find a issue/bug with it.

https://docs.unraid.net/unraid-os/release-notes/7.0.0/#add-support-for-overlay2-storage-driver

Quote

Add support for overlay2 storage driver

If you are using Docker data-root=directory on a ZFS volume, we recommend that you navigate to Settings → Docker and switch the Docker storage driver to overlay2, then delete the directory contents and let Docker re-download the image layers.

If retaining the ability to downgrade to earlier releases is important, then switch to Docker data-root=xfs vDisk instead.

 

  • Author

The parity check is still running - now at nearly 86%. Logs are squeaky clean. Power supply is delayed...so won't be here until Friday at the earliest as they're shipping it with one of the worst companies we have...so not likely to be here Friday....live in hope, but sigh.

 

Yes, on Unraid prefer to stay with the 'template' approach, if only because I want this thing to be an appliance. The more I tinker the harder it becomes, especially if I break it after a numbers of months (or longer) and cannot remember the hacks. I have that in other places already (like Home Assistant) so need to concentrate my pain in certain areas and pay for an off-the-shelf solution in others (Unraid)...though not going well at present clearly!

 

In terms of docker storage I just used all the defaults:

image.thumb.png.dd968cb6f073edb59cb0837a2e650fc4.png

 

Mine doesn't show the Docker version....

 

Was there anything wrong with how I map volumes to docker containers?

 

Cheers!

  • Community Expert

Version 7 beta 1 was showing the docker version for testing and which version was installed. I'm running v7 rc1 and happy with it.


When parity is done.

turn off docker.

I would suggest deleting the docker.img via checkmark delete the vdisk file 
And then change docker data-root and use xfs vDisk  img

(this may fix some things) including later if/when upgrading to version 7 when fully release.

 

This will remove all docker images in the docker tab. (data is not lost...)
in docker tab at bottom

add container > template drop down > click apply to bring them back.

 

No data loss is done as data is stored in the appdata folder/ on disks and no in the docker image.

The docker pull image name data is whats is lost which is why the docker tab will go empty...

If you suspect its docker related, it may be due to the docker data-root option. I don't recall having full trace but I do recall a update to 6.x.x I forget which one where this fixed a bunch of docker issues for me.

 

I also recommend installing and setting up a swap file for dockers.

swap plugin needs a btrfs disk to place the swap file.

Edited by bmartino1

  • Author

OK....update. The parity completed last night while I was asleep. No errors, no crashes.

 

Docker disk has been deleted and re-created. Now have a xfs imag running on a ZFS pool:

image.png.6e64cfcf255b86d0ba22c43e029d038a.png

 

I have re-created just the Gluetun and Arr stack based on the previous no-bridge configuration and have given it some work to do. No issues encountered, no crashes. Currently I've triggered move to push 500Gb onto the array from the cache just so I can give the array a bit of a workout with the stack up.

 

I will add back one 'set' of dockers at a time and leave them for least half a day to see if there are problems. I have divided them like this:

  1. Arrs - first group, exerts load on array, network, cache. So far so good.
  2. Randoms (e.g. mealie, calibre, Twingate, Syncthing, TubeSync) - These add more load, more compute, add extra IP addresses to eth0, but otherwise should be benign.
  3. Plex - runs privilieged, does decoding, not really expecting an issue as when we had the crashes I'm 99.99% sure nothing was streaming so it should just have been pretty much idle.
  4. Jellyfin (ditto plex)
  5. Stats/Helper containers (Tautulli, JellyStat, JellySync, Postgres) - should be benign, just more load
  6. Frigate x2 - these were running all the time (as it is a high priority) so could just as easily be a problem cause. These run privileged, access both CPU for decoding and also USB for coral.ai object detection, and write to two separate pool drives (instead of array). The USB corals definitely some weird stuff, but never near the crash, but that doesn't mean there's no link.
     
    Nov 27 16:02:11 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM.
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: device firmware changed
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: USB disconnect, device number 5
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: new SuperSpeed USB device number 6 using xhci_hcd
    Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM.


    Might not be material, but it is a little weird. If found a post about this here (right at the bottom) and an Unraid forum article on this here. So that powertop thing is now on my list to do as well just to eliminate another possible issue...unless you think that's a bad idea....

 

Lastly, I have also tried to install the swap plugin - I only have one btrfs drive (an SSD) as the array is xfs and the swap is zfs. When I try to start the swap file I get this in the logs:

Dec  5 18:25:34 Svalbard rc.swapfile[13294]: Plugin configuration written
Dec  5 18:26:15 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start
Dec  5 18:26:15 Svalbard rc.swapfile[20304]: Creating swap file /mnt/scratch/swapfile please wait ...
Dec  5 18:26:19 Svalbard rc.swapfile[20621]: Swap file /mnt/scratch/swapfile created and started
Dec  5 18:26:19 Svalbard kernel: BTRFS warning (device sdd1): swapfile must not be copy-on-write
Dec  5 18:26:19 Svalbard rc.swapfile[20622]: Setting swappiness to 60
Dec  5 18:26:41 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile updatecfg true true /mnt/scratch swapfile UNRAID-SWAP 2048 60
Dec  5 18:26:42 Svalbard rc.swapfile[23332]: Plugin configuration written
Dec  5 18:26:48 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start
Dec  5 18:26:48 Svalbard rc.swapfile[24495]: Swap file /mnt/scratch/swapfile is on a BTRFS file system but does not have the No_COW attribute.

 

How now brown cow....no cow? Found your post with the script, ran it, and sorted:
image.png.59b952598c44bc61c5083737d86c12f3.png

 

Might be coincidence but after setting that up I got my first fault (not crash) in two days:

Dec  5 19:47:56 Svalbard kernel: Adding 4194300k swap on /mnt/scratch/swapfile.  Priority:-2 extents:11 across:130568864k 
Dec  5 19:51:23 Svalbard kernel: cgroup: fork rejected by pids controller in /docker/e7b0ac6467266b5fb595bca74d953c400e486175fd98e22fb74df13af3942211
Dec  5 19:54:54 Svalbard kernel: device_list[2339]: segfault at 0 ip 000000000093454b sp 00007ffeeeedd200 error 6 in php[600000+3b3000] likely on CPU 12 (core 24, socket 0)
Dec  5 19:54:54 Svalbard kernel: Code: 08 e9 0a ab ff ff e8 14 1b ff ff 41 ff 27 e8 8c 0d fe ff 41 ff 27 e8 14 0c fe ff 41 ff 27 e8 4c 1f ff ff 41 ff 27 49 83 c7 20 <83> 02 01 41 ff 27 e8 ea 17 fb ff e9 65 c7 ff ff e8 e0 17 fb ff e9

 

Most of my dockers also run with these parameters to create a RAM-base swap file (since I 64GB to burn):

--mount type=tmpfs,target=/tmp,tmpfs-mode=1777,tmpfs-size=256M --log-driver none --no-healthcheck

 

Anyway, just shy of 48 hours with no crash.

 

New PC is mostly built, but still waiting on power supply (tomorrow one hopes) and 4 drives from ServerPartDeals (Tue/Wed). For now, still chipping away on the old one....

 

 

  • Community Expert

looks like great progress and hopefully fixed. IDK what changed with btrfs img and zfs in uraids evolution, glad that more stable then it was before.  

? Are you using usb Hard Disk in the zfs pool/disk array?
*as This may be the underline cause to the original kernel bugs

Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: device firmware changed Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: USB disconnect, device number 5
*this is why unraid doesn't want usb devices within the poll/array disk as usb can randomly disconnect.

When using usb enclosures or usb/thuderport attached disk. It best to use them with the unsigned device plugin and not with the unraid array/pool mechanics.

There are other grub boot syslinux commands you could run for the power top issues as well.
Look at the autotweak plugin 

You would be installing a 3rd party driver outside of unraids control that a more use at your own risk. but not a problem.
*there are other plugins that may help there as well.
https://slackware.pkgs.org/15.0/slackware-x86_64/powertop-2.13-x86_64-3.txz.html

 

cd /boot/extra
wget https://slackware.uk/slackware/slackware64-15.0/slackware64/ap/powertop-2.13-x86_64-3.txz
#reboot to install


With Unraid 7 RC1 they implement the ability to save udev rules at reboot. I can see udev used here to help with power or connection commands ...

I usual disable usb sleep states (this is what causes the usb device to break...)
grub syslinux options:

#Disable LPM Globally: If you're unsure about the specific device or want a global solution, you can try disabling LPM entirely
append initrd=/bzroot usbcore.autosuspend=-1 usbcore.quirks=0:k

 

*Neeeds lspci and vfio device identifiers... comand are examples...
*Lsusb:

append initrd=/bzroot usbcore.autosuspend=-1 usbcore.quirks=0x1234:0x5678:k

Replace 0xVID and 0xPID with the Vendor ID and Product ID of the USB device experiencing issues. You can find these values using lsusb or in the Unraid logs. (used as example 0x1234:0x5678)


So go to Main > Flash and scroll down to system linux.

My Recommend full unraid grub/syslinux:
 

kernel /bzimage
append initrd=/bzroot default_hugepagesz=1G hugepagesz=1G transparent_hugepage=always acpi=force pci=nocrs usbcore.autosuspend=-1 usbcore.quirks=0:k libata.allow_tpm=1 nvme_core.default_ps_max_latency_us=5500 pci=noaer pcie_aspm=off intremap=no_x2apic_optout


 

Boot Parameters

Here’s what each parameter does:

 

default_hugepagesz=1G:

Sets the default hugepage size to 1 GB. Hugepages are used to allocate large chunks of memory, which can improve performance for applications requiring large memory segments, such as virtual machines or databases.

 

hugepagesz=1G:

Explicitly specifies that hugepages should use a size of 1 GB.

 

transparent_hugepage=always:

Enables transparent hugepages. This allows the kernel to automatically use hugepages for memory allocation when possible, which can improve performance for some workloads.

 

acpi=force:

Forces ACPI (Advanced Configuration and Power Interface) to be enabled, even if the hardware or BIOS indicates it should not be.

 

pci=nocrs:

Prevents the kernel from using PCI host bridge resource entries provided by the ACPI firmware. This can be useful to avoid issues with devices being misconfigured.

 

usbcore.autosuspend=-1:

Disables USB autosuspend. This can prevent USB devices from being put into low-power states, which may resolve issues with devices disconnecting or behaving erratically.

 

usbcore.quirks=0:k:

Applies a quirk to USB devices. The 0:k setting disables Link Power Management (LPM) globally for all USB devices. This can fix problems with devices that don't handle LPM well.

 

libata.allow_tpm=1:

Enables support for Trusted Platform Module (TPM) passthrough on ATA devices. Useful in virtualized environments or for disk encryption.

 

nvme_core.default_ps_max_latency_us=5500:

Sets the maximum power-saving latency for NVMe devices to 5500 microseconds. Reducing this value can prevent NVMe devices from entering deeper power-saving states that may cause delays or performance issues.

 

pci=noaer:

Disables Advanced Error Reporting (AER) on PCI devices. This prevents noisy error messages in the logs, especially with hardware that doesn't fully support AER.

 

pcie_aspm=off:

Disables PCI Express Active State Power Management (ASPM), which can prevent power management issues that affect device stability.

 

intremap=no_x2apic_optout:

Ensures that interrupt remapping is used even if x2APIC (an advanced interrupt controller mode) is enabled. This can be important for stability in certain virtualization or hardware setups.

  • Author

Just popped on to say that between work today and other commitments not much progress has been made, but I do now have all the dockers running again except for Frigate (x2). They will come back tomorrow all going well as it is quick and easy to reactivate the containers.

 

I will update my boot parameters tomorrow (or Sunday) - some of these I have already, but there's some really good tweaks in that mix. Apparently there are "plans" for tomorrow which means we will not be home for most of the day so probably will not have a lot of time. So Sunday is a good day to update that, reinstall the additional NIC and relocate the server back to where it belongs.
 

On the USB front I have no USB disks (at all), but the USB messages relate to the coral.ai devices that Frigate uses to do object detection. The "firmware change" message is normal and happens when the device is activated, but the LPM stuff is probably not what we want. Hopefully some of the boot tweaks (or power tweaks) will help to stop some of the errant behaviour. I've also read somewhere that if you have two of them you should put them on separate USB busses.....so I will plug one into the USB3 port and another into a USB 3.2 or USB C port (which should be separate from default USB3).

 

We have now been up over three days with no crash....so I think the hardware is all perfectly fine. If we are still here this time tomorrow I will see what Frigate does....

 

🙂

  • Author

A few days have passed.....it is hard being patient....but everything was going well until this afternoon and then it crashed....I'm pretty certain that it is related to Frigate because everything was fine up and until I restarted that. I'd noticed a lot of cache writes and so I stopped everything until I found the ones responsible for the writes...and just after restarting it crashed:

 

Dec  9 14:14:09 Svalbard kernel: eth0: renamed from veth1ee9a31
Dec  9 14:14:09 Svalbard kernel: python3[7289]: segfault at 1f00000049 ip 0000000000544235 sp 00007ffee9511880 error 4 in python3.9[41f000+288000] likely on CPU 12 (core 24, socket 0)
Dec  9 14:14:09 Svalbard kernel: Code: 3d d0 57 8f 00 0f 84 26 01 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 80 00 00 00 00 4c 8b 4f 60 4d 85 c9 0f 84 81 01 00 00 <4f> 8b 2c 01 4c 39 eb 0f 84 e2 00 00 00 48 85 db 74 23 4d 85 ed 75
Dec  9 14:14:14 Svalbard kernel: veth1ee9a31: renamed from eth0
Dec  9 14:14:14 Svalbard kernel: eth0: renamed from vethc744450
Dec  9 14:14:33 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 6 using xhci_hcd
Dec  9 14:14:33 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM.
Dec  9 14:19:13 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038
Dec  9 14:19:13 Svalbard kernel: #PF: supervisor read access in kernel mode
Dec  9 14:19:13 Svalbard kernel: #PF: error_code(0x0000) - not-present page
Dec  9 14:19:13 Svalbard kernel: PGD 3c6c80067 P4D 3c6c80067 PUD 3d7958067 PMD 0 
Dec  9 14:19:13 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec  9 14:19:13 Svalbard kernel: CPU: 12 PID: 26181 Comm: lsof Tainted: P           O       6.1.118-Unraid #1
Dec  9 14:19:13 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024
Dec  9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf
Dec  9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41
Dec  9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202
Dec  9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001
Dec  9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00
Dec  9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e
Dec  9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000
Dec  9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002
Dec  9 14:19:13 Svalbard kernel: FS:  000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000
Dec  9 14:19:13 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0
Dec  9 14:19:13 Svalbard kernel: PKRU: 55555554
Dec  9 14:19:13 Svalbard kernel: Call Trace:
Dec  9 14:19:13 Svalbard kernel: <TASK>
Dec  9 14:19:13 Svalbard kernel: ? __die_body+0x1a/0x5c
Dec  9 14:19:13 Svalbard kernel: ? page_fault_oops+0x329/0x376
Dec  9 14:19:13 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465
Dec  9 14:19:13 Svalbard kernel: ? exc_page_fault+0xfb/0x11d
Dec  9 14:19:13 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30
Dec  9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f
Dec  9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf
Dec  9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x20/0xcf
Dec  9 14:19:13 Svalbard kernel: ? kmem_cache_alloc+0x122/0x14d
Dec  9 14:19:13 Svalbard kernel: kmem_cache_free+0xb7/0x154
Dec  9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f
Dec  9 14:19:13 Svalbard kernel: user_path_at_empty+0x42/0x4f
Dec  9 14:19:13 Svalbard kernel: do_readlinkat+0x61/0x106
Dec  9 14:19:13 Svalbard kernel: __x64_sys_readlink+0x1a/0x21
Dec  9 14:19:13 Svalbard kernel: do_syscall_64+0x65/0x7b
Dec  9 14:19:13 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec  9 14:19:13 Svalbard kernel: RIP: 0033:0x14a122677197
Dec  9 14:19:13 Svalbard kernel: Code: 73 01 c3 48 8b 0d 81 2c 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 2c 0e 00 f7 d8 64 89 02 48
Dec  9 14:19:13 Svalbard kernel: RSP: 002b:00007ffdd4212428 EFLAGS: 00000206 ORIG_RAX: 0000000000000059
Dec  9 14:19:13 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a122677197
Dec  9 14:19:13 Svalbard kernel: RDX: 0000000000001000 RSI: 00007ffdd42124a0 RDI: 0000000000496870
Dec  9 14:19:13 Svalbard kernel: RBP: 00007ffdd4212460 R08: 0000000000000064 R09: 0000000000000000
Dec  9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
Dec  9 14:19:13 Svalbard kernel: R13: 00007ffdd4215b98 R14: 0000000000433dd0 R15: 000014a1227dc000
Dec  9 14:19:13 Svalbard kernel: </TASK>
Dec  9 14:19:13 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3
Dec  9 14:19:13 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm]
Dec  9 14:19:13 Svalbard kernel: CR2: 0000000000000038
Dec  9 14:19:13 Svalbard kernel: ---[ end trace 0000000000000000 ]---
Dec  9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf
Dec  9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41
Dec  9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202
Dec  9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001
Dec  9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00
Dec  9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e
Dec  9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000
Dec  9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002
Dec  9 14:19:13 Svalbard kernel: FS:  000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000
Dec  9 14:19:13 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0
Dec  9 14:19:13 Svalbard kernel: PKRU: 55555554
Dec  9 14:19:13 Svalbard kernel: note: lsof[26181] exited with irqs disabled
Dec  9 14:23:37 Svalbard emhttpd: spinning down /dev/sdh
Dec  9 14:25:19 Svalbard emhttpd: spinning down /dev/sdb
Dec  9 14:25:40 Svalbard emhttpd: spinning down /dev/sdg
Dec  9 14:25:44 Svalbard emhttpd: spinning down /dev/sdj
Dec  9 14:29:26 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038
Dec  9 14:29:26 Svalbard kernel: #PF: supervisor read access in kernel mode
Dec  9 14:29:26 Svalbard kernel: #PF: error_code(0x0000) - not-present page
Dec  9 14:29:26 Svalbard kernel: PGD 2cfb02067 P4D 2cfb02067 PUD 385ec1067 PMD 0 
Dec  9 14:29:26 Svalbard kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI
Dec  9 14:29:26 Svalbard kernel: CPU: 12 PID: 336 Comm: lsof Tainted: P      D    O       6.1.118-Unraid #1
Dec  9 14:29:26 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024
Dec  9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf
Dec  9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41
Dec  9 14:29:26 Svalbard kernel: RSP: 0018:ffffc900259bbdd0 EFLAGS: 00010202
Dec  9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001
Dec  9 14:29:26 Svalbard kernel: RDX: ffffc900259bbe20 RSI: 0000000000000000 RDI: ffff8881001dee00
Dec  9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8124c0c2
Dec  9 14:29:26 Svalbard kernel: R10: ffffc900259bbd20 R11: ffffc900259bbe94 R12: 0000000000000000
Dec  9 14:29:26 Svalbard kernel: R13: ffffc900259bbe90 R14: ffffc900259bbe20 R15: 0000000000000000
Dec  9 14:29:26 Svalbard kernel: FS:  00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000
Dec  9 14:29:26 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0
Dec  9 14:29:26 Svalbard kernel: PKRU: 55555554
Dec  9 14:29:26 Svalbard kernel: Call Trace:
Dec  9 14:29:26 Svalbard kernel: <TASK>
Dec  9 14:29:26 Svalbard kernel: ? __die_body+0x1a/0x5c
Dec  9 14:29:26 Svalbard kernel: ? page_fault_oops+0x329/0x376
Dec  9 14:29:26 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465
Dec  9 14:29:26 Svalbard kernel: ? exc_page_fault+0xfb/0x11d
Dec  9 14:29:26 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30
Dec  9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62
Dec  9 14:29:26 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf
Dec  9 14:29:26 Svalbard kernel: kmem_cache_free+0xb7/0x154
Dec  9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62
Dec  9 14:29:26 Svalbard kernel: vfs_fstatat+0x52/0x62
Dec  9 14:29:26 Svalbard kernel: __do_sys_newfstatat+0x26/0x5c
Dec  9 14:29:26 Svalbard kernel: do_syscall_64+0x65/0x7b
Dec  9 14:29:26 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec  9 14:29:26 Svalbard kernel: RIP: 0033:0x1514d1ae71ca
Dec  9 14:29:26 Svalbard kernel: Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 19 4c 0e 00 f7
Dec  9 14:29:26 Svalbard kernel: RSP: 002b:00007fff7debeb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
Dec  9 14:29:26 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00001514d1ae71ca
Dec  9 14:29:26 Svalbard kernel: RDX: 00007fff7debecb0 RSI: 00007fff7debebc0 RDI: 00000000ffffff9c
Dec  9 14:29:26 Svalbard kernel: RBP: 00007fff7dec0e10 R08: 0000000000000073 R09: 0000000000000000
Dec  9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Dec  9 14:29:26 Svalbard kernel: R13: 00007fff7dec4548 R14: 0000000000433dd0 R15: 00001514d1c4e000
Dec  9 14:29:26 Svalbard kernel: </TASK>
Dec  9 14:29:26 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3
Dec  9 14:29:26 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm]
Dec  9 14:29:26 Svalbard kernel: CR2: 0000000000000038
Dec  9 14:29:26 Svalbard kernel: ---[ end trace 0000000000000000 ]---
Dec  9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf
Dec  9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41
Dec  9 14:29:26 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202
Dec  9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001
Dec  9 14:29:26 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00
Dec  9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e
Dec  9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000
Dec  9 14:29:26 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002
Dec  9 14:29:26 Svalbard kernel: FS:  00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000
Dec  9 14:29:26 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0
Dec  9 14:29:26 Svalbard kernel: PKRU: 55555554
Dec  9 14:29:26 Svalbard kernel: note: lsof[336] exited with irqs disabled

 

My second Unraid is now built, the new disks are pre-clearing / testing so soon that will be up and running.

 

In the meantime I have to take a side trip to see if I can figure out why Frigate is writing so much to the cache - it certainly isn't the share - that's pointed to a single disk pool with no caching.  So it is some sort of appdata activity.....

 

 

  • Community Expert

sometimes a docker may have hidden settings default maped to other locations.

which one are you running?

image.png.d1c0c6b80f147242999a6a2dbf16ea9a.png

 

this docker has some other warrning as well check support if its misconfigured.

image.png.b4a7e2cd70269f99dd5945810927a363.png

 

also which varient? stable?
image.png.9770ccba7cf4ede49dcf8776bd7ae52e.png

 

as this one did have some hidden settings... but non that would point to another write source:
image.thumb.png.96532c54f4f35dc1b06aa2836cbdafc3.png

  • Author

I am running two copies of the default repository:
image.png.3f1e5775aab5f8b902448bcff16c36b0.png

 

And I'm pointing to ghcr.io/blakeblackshear/frigate:stable.  I'm running coral.ai via USB so I have the containers as privileged - everything works just fine, but during a parity sync if Frigate is running it will crash. 

 

I had a hard crash last night (Unraid dropped off the network) and so on restart it started a parity check....which crashed in the usual way at 6.5% progress. 

 

I have gradually been re-instating my old settings (second NIC with vLANs) but the crashes are still doing my head in....everything is fine and then bang - unraid kernel bug and we're dead in the water.

 

There's an outside chance that the cause is running a pair of Frigate instances....but I can't see why that would cause parity-check problems when then frigate data paths are nowhere near the array itself. 

 

It just makes ZERO sense......

 

  • Community Expert

? are you passing the coral USB to one of the dockers? (How --device?). I think it may be a limitation of the coral USB being called by 2 different docker instances...

Coral Detector:

https://coral.ai/products/accelerator/

 

Does the crash happen with the USB connected?

Edited by bmartino1

  • Author

Yes...the crash happens with the TPU connected and used (with Frigate not running there is no crash, but the TPUs are still connected)....but there are two TPUs and the Frigate instances just take one each.....

 

Even if that was a cause....how does that then relate to a Unraid kernel crash? I can see how that might cause a problem in the dockers if they were fighting for it. I followed a guide (from somewhere) to set it up. 

 

In the first instance it is configured like this:

detectors:
  coral:
    type: edgetpu
    device: usb:0

 

In the second the device is set to usb:1.

 

With one Frigate running only one TPU is active (can tell by the flashing light). When a second instance is started the other TPU comes on line and both start flashing. 

 

We might be getting somewhere though as this is definitely Frigate related one way or another I think.... 

 

I will dig some more, and I can (once the the other box is ready) move a frigate instance onto another platform so there are no longer dual TPUs.....

  • Community Expert

ok I assume you renamed to have separate templates.
image.png.fe19f48498f423a85212cd9acf700f74.png

That looks like the yaml inside frigate what I'm asking is how are you passing the TPU into frigate.

Example I pass this usb device from unraid into my plex for tuner operations:
image.png.a4090cfcdcde7cb6d4815558ea0b3b66.png

 

lsblk
since there are 2 you will need to be device specific...
example
--device=/dev/bus/usb/001/002
?

This is a docker host/containerized issues. if multiple Docker containers attempt to access the same USB device simultaneously, it could indeed cause conflicts or errors, as typically these devices are not designed for concurrent access by multiple clients. You would need to manage access carefully, possibly by coordinating access through software or limiting the device to one container at a time.

https://coral.ai/docs/accelerator/get-started/

 

or are you using the default temaplate addon:
image.thumb.png.013be3706931ad52bd86ba63506a51c0.png

Per recent docker support on that one they recommend a fresh template download.

 

On 10/21/2020 at 2:31 AM, yayitazale said:

image.png.27f30d24af87bac720b628bb9ad305ff.png

 

 

IMPORTANT PLEASE

UPDATE V0.14.0 IS A BREAKING CHANGE:

 

It is recommended to uninstall the current app and reinstall it to meet the needs of the new template.

 

PLEASE READ THE CHANGELOG TO UPDATE YOUR CONFIG FILE

 

 

Support for Frigate docker container. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Designed for integration with HomeAssistant or others via MQTT.

 

Application Name: Frigate
Application Site: https://github.com/blakeblackshear/frigate

Docker Hub: https://hub.docker.com/r/blakeblackshear/frigate/
Github: https://github.com/blakeblackshear/frigate

Documentation: https://docs.frigate.video/

 

This container is only for AMD64 architecture CPUs (Intel/AMD) and its intended use is with a Coral Edge TPU accelerator to reduce the CPU usage.

 

Make sure to look at the complete documentation available on Github! Any question about the usage of the app and runtime errors please use the github issue page.

 

PD: To use a M.2 or PCI CORAL instead of a USB Edge TPU install the drivers easily thanks to @ich777 by going to CA Apps and installing the 'Coral Accelerator Module Driver' app. To use a Nvidia dedicate graphics card install the 'Nvidia-Driver' plugin from CA Apps.

 

 

  • Author

Yup - two different names for the templates - like frigate-baker and frigate-jones.

 

Here is my two TPU instances:

image.png.2c2e105da666b87dcb2cf1b6e50ccf38.png

 

On the templates I'm using the default /dev/bus/usb notation for both. 

 

I can update this /dev/bus/usb/002/004 and 005 respectively in the docker configuration and it will start fine. In theory I suppose this hard codes the TPU (based on USB location) to the Frigate instance (so also no more changing ports when unplugging / replugging the devices). 

 

I cannot set this in the application configuration as the system fails to start successfully. The default in any case is also just USB...the usb:0 and usb:1 I believe are to keep them separate (somehow). 

 

I have the latest template (I think) as only really started building this in August when this was only just released - certainly I can't see any obvious differences so probably OK. 

 

[Also, which of your donate options results in the most cash actually getting to you?]

  • Community Expert

PM about donation stuff.

I think to help fix some issues you may need to do a deticated device per docker.

/dev/bus/usb/002/004

and 

 

/dev/bus/usb/002/005

this may change teh internal, as frigate would only see usb:0
you would have to console into frigate and use lsblk to confirm or cd to /dev/bus/usb/ and type ls and go from there to see if only one was in per container.

  • Author

Hmmmm....the lsblck gives me nothing - just disk mounts.....but the ls on the /dev/bus/usb yields the same result on both containers:

# cd /dev/bus/usb
# ls
001  002

 

If both claim it then it seems that sharing isn't possible after all? Or is some other way of preventing containers from helping themselves?

 

A bit more digging - the 001 and 002 are the two usb busses....inside each there are more 001, 002, 003 etc....  so this isn't definitive - it seems the container can see everything.

 

I've pondered using the docker compose manager approach....but interesting when the machine rebooted after this afternoon's crash the USB addresses have actually moved even though the USB devices have *not* been moved to different ports. Compare this to the previous screenshot:

image.png.ba8d146785ef73fa594a61ac8b5adde6.png

 

So assigning a specific address is a bust as they keep moving.

Edited by ChirpyTurnip
Updated details

  • Author

Righto. So one crash later and a new plan....fooled around for a while to see if I could pin the coral devices, but no. Might be possible, but definitely too much hassle. So the instance with the fewest cameras has had the coral removed and must now use CPU-based detection. So that leaves me no shared USB device as *only* one instance will be using it.

 

And now we wait....hopefully a long time....I'll be a bit disappointed if it dies again with an hour or so....

 

 

  • Author

It died as well - I saw it when I woke up int he middle of the night. So I removed one of the frigate instances, left the coral.ai units plugged in, shutdown it down (again didn't actually power down), forced it off, and powered it back on.

 

So now running with just one instance of Frigate, as privileged, with TPU detection enabled, and now we wait again....

 

I did see now in reviewing the logs some odd USB behaviour during the boot (look for usb 2-9.1):

Dec 11 02:28:41 Svalbard kernel: IPMI message handler: version 39.2
Dec 11 02:28:41 Svalbard kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
Dec 11 02:28:41 Svalbard kernel: Freeing initrd memory: 30324K
Dec 11 02:28:41 Svalbard kernel: lp: driver loaded but no devices found
Dec 11 02:28:41 Svalbard kernel: hpet_acpi_add: no address or irqs in _CRS
Dec 11 02:28:41 Svalbard kernel: Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
Dec 11 02:28:41 Svalbard kernel: AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.
Dec 11 02:28:41 Svalbard kernel: Floppy drive(s): fd1 is 1.2M
Dec 11 02:28:41 Svalbard kernel: loop: module loaded
Dec 11 02:28:41 Svalbard kernel: Rounding down aligned max_sectors from 4294967295 to 4294967288
Dec 11 02:28:41 Svalbard kernel: db_root: cannot open: /etc/target
Dec 11 02:28:41 Svalbard kernel: VFIO - User Level meta-driver version: 0.3
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Host supports USB 3.2 Enhanced SuperSpeed
Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: 16 ports detected
Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: 9 ports detected
Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usb-storage
Dec 11 02:28:41 Svalbard kernel: i8042: PNP: No PS/2 controller found.
Dec 11 02:28:41 Svalbard kernel: mousedev: PS/2 mouse device common for all mice
Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver synaptics_usb
Dec 11 02:28:41 Svalbard kernel: input: PC Speaker as /devices/platform/pcspkr/input/input0
Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: RTC can wake from S4
Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: registered as rtc0
Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: setting system clock to 2024-12-10T13:27:52 UTC (1733837272)
Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: alarms up to one month, y3k, 114 bytes nvram
Dec 11 02:28:41 Svalbard kernel: intel_pstate: Intel P-state driver initializing
Dec 11 02:28:41 Svalbard kernel: intel_pstate: HWP enabled
Dec 11 02:28:41 Svalbard kernel: pstore: Registered efi as persistent store backend
Dec 11 02:28:41 Svalbard kernel: hid: raw HID events driver (C) Jiri Kosina
Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usbhid
Dec 11 02:28:41 Svalbard kernel: usbhid: USB HID core driver
Dec 11 02:28:41 Svalbard kernel: ipip: IPv4 and MPLS over IPv4 tunneling driver
Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_INET6 protocol family
Dec 11 02:28:41 Svalbard kernel: Segment Routing with IPv6
Dec 11 02:28:41 Svalbard kernel: RPL Segment Routing with IPv6
Dec 11 02:28:41 Svalbard kernel: In-situ OAM (IOAM) with IPv6
Dec 11 02:28:41 Svalbard kernel: 9pnet: Installing 9P2000 support
Dec 11 02:28:41 Svalbard kernel: microcode: sig=0xb0671, pf=0x2, revision=0x12b
Dec 11 02:28:41 Svalbard kernel: microcode: Microcode Update Driver: v2.2.
Dec 11 02:28:41 Svalbard kernel: IPI shorthand broadcast: enabled
Dec 11 02:28:41 Svalbard kernel: sched_clock: Marking stable (2528000652, 6582841)->(2556612702, -22029209)
Dec 11 02:28:41 Svalbard kernel: registered taskstats version 1
Dec 11 02:28:41 Svalbard kernel: Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no
Dec 11 02:28:41 Svalbard kernel: pstore: Using crash dump compression: deflate
Dec 11 02:28:41 Svalbard kernel: clk: Disabling unused clocks
Dec 11 02:28:41 Svalbard kernel: usb 1-5: new high-speed USB device number 2 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 2-8: new SuperSpeed USB device number 2 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 1-6: new high-speed USB device number 3 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 2-9: new SuperSpeed USB device number 3 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 1-9: new high-speed USB device number 4 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 1-6.1: new low-speed USB device number 5 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:0665:5161.0001: hiddev96,hidraw0: USB HID v1.00 Device [INNO TECH USB to Serial] on usb-0000:00:14.0-6.1/input0
Dec 11 02:28:41 Svalbard kernel: floppy0: no floppy controllers found
Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (initmem) memory: 1884K
Dec 11 02:28:41 Svalbard kernel: Write protecting the kernel read-only data: 18432k
Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (text/rodata gap) memory: 2040K
Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (rodata/data gap) memory: 140K
Dec 11 02:28:41 Svalbard kernel: rodata_test: all tests were successful
Dec 11 02:28:41 Svalbard kernel: Run /init as init process
Dec 11 02:28:41 Svalbard kernel:  with arguments:
Dec 11 02:28:41 Svalbard kernel:    /init
Dec 11 02:28:41 Svalbard kernel:  with environment:
Dec 11 02:28:41 Svalbard kernel:    HOME=/
Dec 11 02:28:41 Svalbard kernel:    TERM=linux
Dec 11 02:28:41 Svalbard kernel:    BOOT_IMAGE=/bzimage
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 4, error -62
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 5, error -62
Dec 11 02:28:41 Svalbard kernel: usb 2-9-port1: attempt power cycle
Dec 11 02:28:41 Svalbard kernel: usb 1-10: new high-speed USB device number 6 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: USB hub found
Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: 4 ports detected
Dec 11 02:28:41 Svalbard kernel: usb 1-6.3: new high-speed USB device number 7 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: usb-storage 1-6.3:1.0: USB Mass Storage device detected
Dec 11 02:28:41 Svalbard kernel: scsi host0: usb-storage 1-6.3:1.0
Dec 11 02:28:41 Svalbard kernel: usb 1-11: new full-speed USB device number 8 using xhci_hcd
Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:048D:5702.0002: hiddev97,hidraw1: USB HID v1.12 Device [ITE Tech. Inc. ITE Device] on usb-0000:00:14.0-11/input0
Dec 11 02:28:41 Svalbard kernel: scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0 PMAP PQ: 0 ANSI: 6
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] 121110528 512-byte logical blocks: (62.0 GB/57.8 GiB)
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write Protect is off
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Mode Sense: 45 00 00 00
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Dec 11 02:28:41 Svalbard kernel: sda: sda1
Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk
Dec 11 02:28:41 Svalbard kernel: random: crng init done
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: loop0: detected capacity change from 0 to 130016
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 6, error -62
Dec 11 02:28:41 Svalbard kernel: loop1: detected capacity change from 0 to 713824
Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_UNIX/PF_LOCAL protocol family
Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Dec 11 02:28:41 Svalbard kernel: input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input1
Dec 11 02:28:41 Svalbard kernel: ACPI: button: Sleep Button [SLPB]
Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2
Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRB]
Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
Dec 11 02:28:41 Svalbard kernel: intel_pmc_core INT33A1:00:  initialized
Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRF]
Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: version 3.0
Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode
Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: flags: 64bit ncq sntf led clo only pio slum part ems deso sadm sds 
Dec 11 02:28:41 Svalbard kernel: mei_me 0000:00:16.0: enabling device (0000 -> 0002)

The full boot log is attached....

 

With these errors however something has gone wrong with one of the coral.ai devices as now there is only one "google" device showing:

image.thumb.png.2d2cc5e78f2dc1eccbc622831ec9eb6b.png

 

So it seems one of the units definitely rejected the address supplied and stayed off line. If I unplug the device that isn't being used and plug it back in it shows up in the system device list now as a completely different device:

image.png.e59a36201cce8074440bb6fa59a87c86.png

 

If I plug in into another port I also get the same weird result:

image.png.5eda76c67342a7f2fdb06693240b2883.png

 

So for now I've left it unplugged. Parity check is on 16%, which is again higher than it has managed in the last day....

 

And now. We. Wait. Again.

 

unraid_boot.log

  • Community Expert

Definitely looks like the USB 2-9-1 didn't do well with a power cycles and may need unplugged re-pluged to be working again.
*Bios option for usb suspend/power/sleep states?

My recommendation is in the template click remove on option and use the extra parameter to add the device.

image.png.2074fa1d7815404193d7e09b51f8351c.png

 

This will delete that from the template

In the extra parameter, add the device via 

--device=/dev/bus/usb/001

*Selecting the correct parent path for the usb device

 

you may need to do device attach and pathing for the container. Example for a plex nvdia gpu:
 

--device=/dev/dri:/dev/dri

Unread host /dev : Container /dev

so /dev/dri is the NVIDIA drive dev path and separated by ":" to the host path sees /dev/dri

*sometimes the driver and runtime docker options are not enough...
 

Either use the extra parameter or the template option. I think you may have had both or didn't set the TPU mapping in the template correctly when trying to separate them. as the picture would pass all usb devices.
as devices:
- “/dev/bus/usb:/dev/bus/usb” # Mount the entire USB bus

*Since unraid is USB based, when that side of the kernel crashes, you get these kernel errors.
Try moving the USB flash drive to a another USB port (unraid prefers usb2.0 ports) it may be sharing the bus with the TPU sensors.) I recommended going after a internal motherboard header and attaching the disk there to separate it from the bus:
https://a.co/d/idjePjN

I honestly prefer the /dev/by serial ID option:
same --device path but by /dev/serail/by id
 

ls /dev/serial/by-id

and passign it via ID to make sure to grab one and only that device... I usually see USB docker passing with the zigbee/zwave in home assistant.


Review:


I have to relook at unraid and the kernel option to find the other usb quiks and options to disable selective power and sleep states.
Esentialy looking for the linux option in windows advance power options
image.png.70b5f20abf21750a09cb591777c70a9e.png

 

so the syslinx/grub command needs to be "-1"
 

usbcore.autosuspend=-1

 

In the context of the usbcore.autosuspend kernel parameter, the value you assign controls the default autosuspend delay for all USB devices. Here’s what the values mean:

-1: Autosuspend is disabled for all USB devices. This means the devices won't enter the power-saving mode automatically.

0 or positive integers (e.g., 1, 2, etc.): These values set the delay in seconds before a USB device is autosuspended after it becomes idle.

So, if you set usbcore.autosuspend=0, it means autosuspend is enabled with no delay—devices can suspend immediately when they become idle. Setting it to -1 completely disables the autosuspend feature, keeping the devices powered all the time, similar to the "disabled" setting for USB selective suspend in Windows.

  • Author

And.....we're dead. A hard crash this time. Completely fell off the network again. The last syslog entry offers a potential clue:

Dec 11 08:25:08 Svalbard kernel: veth6c58b38: renamed from eth0
Dec 11 08:25:10 Svalbard kernel: eth0: renamed from veth970b4eb

 

Anyway...where to from here:

  1. Power saving in the BIOS is disabled (as best I can tell)
  2. I'm already running usbcore.autosuspend=-1 as a boot option, so that's not a fix
  3. The Unraid flash is on a USB2 bus, so separate from USB3 (where the TPU is). Aside from the UPS, the (now single) TPU, and the boot flash there's no other USB devices.  
  4. I've tried the TPU mapping as both /dev/bus/usb and also as /dev/bus/usb/002/004 but the bus numbering keeps changing so it's not a static setting. 
  5. ls /dev/serial/by-id returns nothing as there is no /dev/serial path

  6. lsusb -t returns:
     

    /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/9p, 20000M/x2
        |__ Port 8: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 9: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M
            |__ Port 1: Dev 5, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M
    /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
        |__ Port 5: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M
        |__ Port 9: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 10: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M
            |__ Port 4: Dev 8, If 0, Class=Human Interface Device, Driver=usbfs, 1.5M
        |__ Port 11: Dev 7, If 0, Class=Human Interface Device, Driver=usbhid, 12M
        |__ Port 14: Dev 9, If 0, Class=Wireless, Driver=btusb, 12M
        |__ Port 14: Dev 9, If 1, Class=Wireless, Driver=btusb, 12M

     

  7. lsusb returns:
     

    Bus 002 Device 005: ID 18d1:9302 Google Inc. 
    Bus 002 Device 003: ID 0bda:0411 Realtek Semiconductor Corp. Hub
    Bus 002 Device 002: ID 0bda:0411 Realtek Semiconductor Corp. Hub
    Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    Bus 001 Device 004: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub
    Bus 001 Device 005: ID 0951:1666 Kingston Technology DataTraveler 100 G3/G4/SE9 G2/50
    Bus 001 Device 003: ID 05e3:0608 Genesys Logic, Inc. Hub
    Bus 001 Device 002: ID 05e3:0608 Genesys Logic, Inc. Hub
    Bus 001 Device 009: ID 8087:0033 Intel Corp. 
    Bus 001 Device 007: ID 048d:5702 Integrated Technology Express, Inc. ITE Device
    Bus 001 Device 008: ID 0665:5161 Cypress Semiconductor USB to Serial
    Bus 001 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

     

  8. The lsusb -v of the Google devices returns (with no serial number):
     

    Bus 002 Device 005: ID 18d1:9302 Google Inc. 
    Device Descriptor:
      bLength                18
      bDescriptorType         1
      bcdUSB               3.10
      bDeviceClass            0 
      bDeviceSubClass         0 
      bDeviceProtocol         0 
      bMaxPacketSize0         9
      idVendor           0x18d1 Google Inc.
      idProduct          0x9302 
      bcdDevice            1.00
      iManufacturer           0 
      iProduct                0 
      iSerial                 0 
      bNumConfigurations      1
      Configuration Descriptor:
        bLength                 9
        bDescriptorType         2
        wTotalLength       0x0060
        bNumInterfaces          1
        bConfigurationValue     1
        iConfiguration          0 
        bmAttributes         0x80
          (Bus Powered)
        MaxPower              896mA
        Interface Descriptor:
          bLength                 9
          bDescriptorType         4
          bInterfaceNumber        0
          bAlternateSetting       0
          bNumEndpoints           6
          bInterfaceClass       255 Vendor Specific Class
          bInterfaceSubClass    255 Vendor Specific Subclass
          bInterfaceProtocol    255 Vendor Specific Protocol
          iInterface              0 
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x01  EP 1 OUT
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0400  1x 1024 bytes
            bInterval               0
            bMaxBurst              15
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x02  EP 2 OUT
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0400  1x 1024 bytes
            bInterval               0
            bMaxBurst              15
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x03  EP 3 OUT
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0400  1x 1024 bytes
            bInterval               0
            bMaxBurst              15
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x81  EP 1 IN
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0400  1x 1024 bytes
            bInterval               0
            bMaxBurst              15
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x82  EP 2 IN
            bmAttributes            2
              Transfer Type            Bulk
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0400  1x 1024 bytes
            bInterval               0
            bMaxBurst              15
          Endpoint Descriptor:
            bLength                 7
            bDescriptorType         5
            bEndpointAddress     0x83  EP 3 IN
            bmAttributes            3
              Transfer Type            Interrupt
              Synch Type               None
              Usage Type               Data
            wMaxPacketSize     0x0040  1x 64 bytes
            bInterval               1
            bMaxBurst               0
    Binary Object Store Descriptor:
      bLength                 5
      bDescriptorType        15
      wTotalLength       0x0016
      bNumDeviceCaps          2
      USB 2.0 Extension Device Capability:
        bLength                 7
        bDescriptorType        16
        bDevCapabilityType      2
        bmAttributes   0x00000002
          HIRD Link Power Management (LPM) Supported
      SuperSpeed USB Device Capability:
        bLength                10
        bDescriptorType        16
        bDevCapabilityType      3
        bmAttributes         0x00
        wSpeedsSupported   0x000c
          Device can operate at High Speed (480Mbps)
          Device can operate at SuperSpeed (5Gbps)
        bFunctionalitySupport   2
          Lowest fully-functional device speed is High Speed (480Mbps)
        bU1DevExitLat           0 micro seconds
        bU2DevExitLat           0 micro seconds
    Device Status:     0x0000
      (Bus Powered)

     

  9. Not sure if that's useful or not.....

In the meantime running again, and waiting for the next crash. If it happens again I'm pulling the TPU out completely and running on CPU-based detection. Interestingly when I did that on the other instance yesterday I *still* had to runb it as privileged as it wouldn't connect to the cameras without that....unexpected I think as I though privileged was only for the TPU - however it is possible that it is also needed for the Intel GPU access....

 

 

  • Community Expert

dang... I'm not sure of a potential solution, ATM. Do you have other usb2.0 port to try the coal usb device in this may be a kernel usb 3 issue. ALSO with a USB attached UPS this could also be flaging the error in the kernel as the bus try to reset/recover.

?double check the UPS usb connection. and Unraid NUT/Power stuff...

*i'm lacking in that area.


The prevailed option is required when using other devices, as this grants root access to the host for these devices.
Usually when I have set up frigate in the past it was with a NVIDIA GPU

 

There are other /dev/ call for usb if there is nor serial ID. That is weird to me being on a serial bus...
 

As the bus keeps changing is a different issue...  ?pcie usb addon card? where you can pass the USB pcie device instead?
?-mabye go with a coral ai pce device instead?

From the information you've provided, it appears the Google device is located at Bus 002 Device 005 with the ID 18d1:9302. This device will need to be passed to the Frigate container so it can utilize the Coral AI capabilities.

 

*Since there the same device ID we won't be able to use the ID as a selector...
based on what you have provided, the single device pass would be this extra parm command:

--device /dev/bus/usb/002/005

But you need to review your docker template and make sure frigate only get that device.


as this is why i would want udev rules.. in Beta 7 rc1 you could make a udev rule...

cd /boot/udev/99-usb-coral.rules

ACTION=="add", ATTRS{idVendor}=="18d1", ATTRS{idProduct}=="9302", SYMLINK+="coral_ai"

 

ls -l /dev/coral_ai
--device /dev/coral_ai

 

...

 

  • Author

Definitely getting closer.....

 

Just had another crash, but it was non-fatal:

Dec 11 13:05:19 Svalbard emhttpd: read SMART /dev/sdc
Dec 11 13:54:53 Svalbard emhttpd: spinning down /dev/sdc
Dec 11 14:42:52 Svalbard kernel: ------------[ cut here ]------------
Dec 11 14:42:52 Svalbard kernel: WARNING: CPU: 12 PID: 32398 at fs/dcache.c:430 retain_dentry+0x52/0xa5
Dec 11 14:42:52 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc
Dec 11 14:42:52 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb]
Dec 11 14:42:52 Svalbard kernel: CPU: 12 PID: 32398 Comm: lsof Tainted: P           O       6.1.118-Unraid #1
Dec 11 14:42:52 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024
Dec 11 14:42:52 Svalbard kernel: RIP: 0010:retain_dentry+0x52/0xa5
Dec 11 14:42:52 Svalbard kernel: Code: 74 18 eb e9 48 8b 43 60 48 89 df 48 8b 40 20 ff d0 0f 1f 00 85 c0 74 e4 eb d3 ff 4b 5c 0f ba e0 13 72 49 a9 00 04 08 00 74 02 <0f> 0b 0d 00 00 08 00 89 03 65 48 ff 05 fe ea dc 7e f7 03 00 00 70
Dec 11 14:42:52 Svalbard kernel: RSP: 0018:ffffc9002debbd98 EFLAGS: 00010206
Dec 11 14:42:52 Svalbard kernel: RAX: 0000000000600c00 RBX: ffff88841b174900 RCX: 0000000000000064
Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88841b174900
Dec 11 14:42:52 Svalbard kernel: RBP: ffffc9002debbe65 R08: 00000000009461d4 R09: 000000000000000a
Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Dec 11 14:42:52 Svalbard kernel: R13: ffffffff812b1d42 R14: ffff88841b174900 R15: ffff88841b174900
Dec 11 14:42:52 Svalbard kernel: FS:  00001540446a8e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000
Dec 11 14:42:52 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 14:42:52 Svalbard kernel: CR2: 00005590cfeb8000 CR3: 00000004b5f4e000 CR4: 0000000000750ee0
Dec 11 14:42:52 Svalbard kernel: PKRU: 55555554
Dec 11 14:42:52 Svalbard kernel: Call Trace:
Dec 11 14:42:52 Svalbard kernel: <TASK>
Dec 11 14:42:52 Svalbard kernel: ? __warn+0xab/0x122
Dec 11 14:42:52 Svalbard kernel: ? report_bug+0x109/0x17e
Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5
Dec 11 14:42:52 Svalbard kernel: ? handle_bug+0x41/0x6f
Dec 11 14:42:52 Svalbard kernel: ? exc_invalid_op+0x13/0x60
Dec 11 14:42:52 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20
Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d
Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5
Dec 11 14:42:52 Svalbard kernel: dput+0x41/0x17b
Dec 11 14:42:52 Svalbard kernel: proc_fill_cache+0x110/0x156
Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a
Dec 11 14:42:52 Svalbard kernel: proc_readfd_common+0x16b/0x1bc
Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d
Dec 11 14:42:52 Svalbard kernel: iterate_dir+0x94/0x149
Dec 11 14:42:52 Svalbard kernel: __do_sys_getdents64+0x6b/0xd8
Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a
Dec 11 14:42:52 Svalbard kernel: do_syscall_64+0x65/0x7b
Dec 11 14:42:52 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Dec 11 14:42:52 Svalbard kernel: RIP: 0033:0x154044908283
Dec 11 14:42:52 Svalbard kernel: Code: 89 df e8 20 05 fb ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 61 0b 11 00 f7 d8
Dec 11 14:42:52 Svalbard kernel: RSP: 002b:00007ffde2feab28 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Dec 11 14:42:52 Svalbard kernel: RAX: ffffffffffffffda RBX: 00000000004c8c80 RCX: 0000154044908283
Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000008000 RSI: 00000000004c8cb0 RDI: 0000000000000004
Dec 11 14:42:52 Svalbard kernel: RBP: 00000000004c8c84 R08: 0000154044a1a2d0 R09: 0000154044a1a2d0
Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000293 R12: ffffffffffffff88
Dec 11 14:42:52 Svalbard kernel: R13: 0000000000000002 R14: 0000000000433dd0 R15: 0000154044a9b000
Dec 11 14:42:52 Svalbard kernel: </TASK>
Dec 11 14:42:52 Svalbard kernel: ---[ end trace 0000000000000000 ]---
Dec 11 14:43:02 Svalbard kernel: ------------[ cut here ]------------
Dec 11 14:43:02 Svalbard kernel: WARNING: CPU: 10 PID: 314 at fs/dcache.c:472 dentry_lru_isolate+0x44/0xb1
Dec 11 14:43:02 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc
Dec 11 14:43:02 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb]
Dec 11 14:43:02 Svalbard kernel: CPU: 10 PID: 314 Comm: kswapd0 Tainted: P        W  O       6.1.118-Unraid #1
Dec 11 14:43:02 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024
Dec 11 14:43:02 Svalbard kernel: RIP: 0010:dentry_lru_isolate+0x44/0xb1
Dec 11 14:43:02 Svalbard kernel: Code: ef e8 d3 d1 62 00 89 c2 b8 03 00 00 00 85 d2 74 7b 83 7b dc 00 8b 43 80 74 40 89 c2 81 e2 00 04 08 00 81 fa 00 00 08 00 74 02 <0f> 0b 25 ff ff f7 ff 89 43 80 65 48 ff 0d 34 e8 dc 7e f7 43 80 00
Dec 11 14:43:02 Svalbard kernel: RSP: 0018:ffffc90000cd7ab8 EFLAGS: 00010206
Dec 11 14:43:02 Svalbard kernel: RAX: 0000000000888c40 RBX: ffff88841b174980 RCX: ffffc90000cd7b78
Dec 11 14:43:02 Svalbard kernel: RDX: 0000000000080400 RSI: ffff888109889888 RDI: ffff88841b174958
Dec 11 14:43:02 Svalbard kernel: RBP: ffff88841b174958 R08: 0000000000000000 R09: 0000000000000014
Dec 11 14:43:02 Svalbard kernel: R10: ffff888106454380 R11: ffff888145095240 R12: ffff888109889888
Dec 11 14:43:02 Svalbard kernel: R13: ffffc90000cd7b78 R14: ffffffff8125cee6 R15: ffff88841b174980
Dec 11 14:43:02 Svalbard kernel: FS:  0000000000000000(0000) GS:ffff88907f280000(0000) knlGS:0000000000000000
Dec 11 14:43:02 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 14:43:02 Svalbard kernel: CR2: 00000000004d8098 CR3: 000000000420a000 CR4: 0000000000750ee0
Dec 11 14:43:02 Svalbard kernel: PKRU: 55555554
Dec 11 14:43:02 Svalbard kernel: Call Trace:
Dec 11 14:43:02 Svalbard kernel: <TASK>
Dec 11 14:43:02 Svalbard kernel: ? __warn+0xab/0x122
Dec 11 14:43:02 Svalbard kernel: ? report_bug+0x109/0x17e
Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1
Dec 11 14:43:02 Svalbard kernel: ? handle_bug+0x41/0x6f
Dec 11 14:43:02 Svalbard kernel: ? exc_invalid_op+0x13/0x60
Dec 11 14:43:02 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20
Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38
Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1
Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x20/0xb1
Dec 11 14:43:02 Svalbard kernel: __list_lru_walk_one+0x90/0x123
Dec 11 14:43:02 Svalbard kernel: list_lru_walk_one+0x60/0x7d
Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38
Dec 11 14:43:02 Svalbard kernel: prune_dcache_sb+0x46/0x73
Dec 11 14:43:02 Svalbard kernel: super_cache_scan+0xf4/0x17c
Dec 11 14:43:02 Svalbard kernel: do_shrink_slab+0x188/0x2a1
Dec 11 14:43:02 Svalbard kernel: shrink_slab+0x1f9/0x267
Dec 11 14:43:02 Svalbard kernel: shrink_node+0x334/0x588
Dec 11 14:43:02 Svalbard kernel: balance_pgdat+0x4e9/0x6a2
Dec 11 14:43:02 Svalbard kernel: ? update_cfs_rq_load_avg+0x176/0x189
Dec 11 14:43:02 Svalbard kernel: ? update_load_avg+0x46/0x398
Dec 11 14:43:02 Svalbard kernel: kswapd+0x2f0/0x333
Dec 11 14:43:02 Svalbard kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
Dec 11 14:43:02 Svalbard kernel: ? balance_pgdat+0x6a2/0x6a2
Dec 11 14:43:02 Svalbard kernel: kthread+0xe4/0xef
Dec 11 14:43:02 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b
Dec 11 14:43:02 Svalbard kernel: ret_from_fork+0x1f/0x30
Dec 11 14:43:02 Svalbard kernel: </TASK>
Dec 11 14:43:02 Svalbard kernel: ---[ end trace 0000000000000000 ]---
Dec 11 14:54:15 Svalbard kernel: vetha28b7f0: renamed from eth0
Dec 11 14:54:17 Svalbard kernel: eth0: renamed from vethb9c7ce8
Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 5 using xhci_hcd
Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM.
Dec 11 15:10:47 Svalbard emhttpd: read SMART /dev/sdc

 

The system has been up for 6.5 hours but not Frigate:

image.thumb.png.9298c504fadbe08f06c43d718c8fb881.png

 

The crash happened a minute or two after a detection, and the crash happened, and then a few minutes later Frigate restarted. 

 

What *really* interesting is that this time the parity sync didn't stop....normally it is immediately dead. That will probably still happen, but for now it is still hanging in there....

 

Next time it dies I will remove one of the frigate drives to put into the test machine (so I can run it there too), reset the this problem machine's BIOS settings all back to defaults (except for some boot options - since the settings changes have all made no difference), and then I might upgrade to v7 for the newer kernel...

 

 

 

  • Author

So a fun new problem today.....it's always nice when things mix it up a little...lets you focus on something else for a bit!

 

This morning I get up, sign in and see what's what. This is new:

image.png.11b0c2785470d014eb045c8fdf77f113.png

 

Locked CPU cores. Oh...and docker is running, but all the containers bar one are stopped, and the docker tab opens but nothing loads - you just get the unraid 'wave', and Apps doesn't load at all. 

 

Can get to tools and see the log - looks like we had a crash when AppBackup was running:

Dec 12 00:01:28 Svalbard kernel: veth7196d3b: renamed from eth0
Dec 12 00:01:28 Svalbard kernel: veth3a161a9: renamed from eth0
Dec 12 00:01:28 Svalbard kernel: vethe140ebc: renamed from eth0
Dec 12 00:01:31 Svalbard kernel: vethdcf506a: renamed from eth0
Dec 12 00:02:08 Svalbard kernel: veth4b3afd2: renamed from eth0
Dec 12 00:02:08 Svalbard kernel: veth6aec7d4: renamed from eth0
Dec 12 00:02:19 Svalbard kernel: veth56d4257: renamed from eth0
Dec 12 00:02:22 Svalbard kernel: vethbdb3af6: renamed from eth0
Dec 12 00:02:30 Svalbard kernel: vethf7bebb7: renamed from eth0
Dec 12 00:02:31 Svalbard kernel: veth6493f9c: renamed from eth0
Dec 12 00:02:41 Svalbard kernel: veth2a00935: renamed from eth0
Dec 12 00:02:45 Svalbard kernel: vethd666ffc: renamed from eth0
Dec 12 00:02:48 Svalbard kernel: veth2a06c9e: renamed from eth0
Dec 12 00:04:05 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Dec 12 00:04:05 Svalbard kernel: #PF: supervisor read access in kernel mode
Dec 12 00:04:05 Svalbard kernel: #PF: error_code(0x0000) - not-present page
Dec 12 00:04:05 Svalbard kernel: PGD 0 P4D 0 
Dec 12 00:04:05 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 12 00:04:05 Svalbard kernel: CPU: 14 PID: 1346 Comm: arc_evict Tainted: P           O       6.1.118-Unraid #1
Dec 12 00:04:05 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024
Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs]
Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31
Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286
Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000
Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb
Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f
Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280
Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
Dec 12 00:04:05 Svalbard kernel: FS:  0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000
Dec 12 00:04:05 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000000420a000 CR4: 0000000000750ee0
Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554
Dec 12 00:04:05 Svalbard kernel: Call Trace:
Dec 12 00:04:05 Svalbard kernel: <TASK>
Dec 12 00:04:05 Svalbard kernel: ? __die_body+0x1a/0x5c
Dec 12 00:04:05 Svalbard kernel: ? page_fault_oops+0x329/0x376
Dec 12 00:04:05 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465
Dec 12 00:04:05 Svalbard kernel: ? common_interrupt+0xb7/0xd0
Dec 12 00:04:05 Svalbard kernel: ? exc_page_fault+0xfb/0x11d
Dec 12 00:04:05 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30
Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x2b/0x83 [zfs]
Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x19/0x83 [zfs]
Dec 12 00:04:05 Svalbard kernel: arc_change_state.constprop.0+0x195/0x347 [zfs]
Dec 12 00:04:05 Svalbard kernel: arc_evict_state+0x30d/0x701 [zfs]
Dec 12 00:04:05 Svalbard kernel: ? random_get_pseudo_bytes+0xc4/0xf8 [spl]
Dec 12 00:04:05 Svalbard kernel: arc_evict_cb+0x424/0x564 [zfs]
Dec 12 00:04:05 Svalbard kernel: ? _raw_spin_unlock_irq+0x1a/0x2f
Dec 12 00:04:05 Svalbard kernel: ? sigprocmask+0x6e/0x8e
Dec 12 00:04:05 Svalbard kernel: zthr_procedure+0x89/0x12c [zfs]
Dec 12 00:04:05 Svalbard kernel: ? zrl_is_locked+0x15/0x15 [zfs]
Dec 12 00:04:05 Svalbard kernel: ? __thread_exit+0x13/0x13 [spl]
Dec 12 00:04:05 Svalbard kernel: thread_generic_wrapper+0x57/0x65 [spl]
Dec 12 00:04:05 Svalbard kernel: kthread+0xe4/0xef
Dec 12 00:04:05 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b
Dec 12 00:04:05 Svalbard kernel: ret_from_fork+0x1f/0x30
Dec 12 00:04:05 Svalbard kernel: </TASK>
Dec 12 00:04:05 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) coretemp zzstd(O) iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper btusb btrtl zavl(PO) btbcm drm_kms_helper icp(PO) btintel kvm bluetooth crct10dif_pclmul crc32_pclmul drm crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO)
Dec 12 00:04:05 Svalbard kernel: crypto_simd cryptd ecdh_generic spl(O) rapl mei_pxp mei_hdcp gigabyte_wmi wmi_bmof ecc intel_cstate mpt3sas i2c_i801 intel_gtt nvme intel_uncore agpgart i2c_smbus mei_me i2c_algo_bit nvme_core ahci i2c_core mei libahci raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb]
Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020
Dec 12 00:04:05 Svalbard kernel: ---[ end trace 0000000000000000 ]---
Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs]
Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31
Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286
Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000
Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb
Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f
Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280
Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
Dec 12 00:04:05 Svalbard kernel: FS:  0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000
Dec 12 00:04:05 Svalbard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000015ebfa000 CR4: 0000000000750ee0
Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554
Dec 12 00:04:05 Svalbard kernel: note: arc_evict[1346] exited with irqs disabled
Dec 12 00:32:50 Svalbard emhttpd: spinning down /dev/sdd
Dec 12 01:03:25 Svalbard emhttpd: spinning down /dev/sde
Dec 12 01:03:27 Svalbard emhttpd: read SMART /dev/sde

 

As to the CPU locking, top returns:

top - 06:58:28 up  9:15,  0 users,  load average: 19.02, 18.76, 17.97
Tasks: 636 total,   1 running, 635 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us,  0.7 sy,  0.0 ni, 78.1 id, 17.6 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  64082.5 total,    569.5 free,  10164.9 used,  53348.1 buff/cache
MiB Swap:   2048.0 total,   2014.0 free,     34.0 used.  53033.9 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                      
20777 root      20   0 1240368  18272   8404 S 100.0   0.0 402:23.88 unpackerr                                                                    
12200 root      20   0       0      0      0 S   5.6   0.0  43:57.54 unraidd0                                                                     
 7527 root      20   0       0      0      0 D   1.7   0.0  12:30.74 mdrecoveryd                                                                  
14632 root      20   0 4559768  92920  38868 S   0.3   0.1   0:30.29 dockerd                                                                      
25476 root      20   0   96460  17032   9936 S   0.3   0.0   0:00.02 php-fpm                                                                      
25498 root      20   0   95916  14456   7904 S   0.3   0.0   0:00.01 php-fpm                                                                      
27345 root      20   0   95916  14844   8288 S   0.3   0.0   0:00.02 php-fpm                                                                      
28084 root      20   0   95996  30948  24156 S   0.3   0.0   0:00.15 update_3                                                                     
    1 root      20   0    2592   1808   1688 S   0.0   0.0   0:01.06 init                                                                         
    2 root      20   0       0      0      0 S   0.0   0.0   0:01.43 kthreadd                     

 

Which is odd for a couple of reasons - firstly the there's only one thread holding the CPU high so why are the multiple cores high. So I kill the process:

Ending process 20777...
Checking...
Process 20777 could not be gently killed... will use SIGKILL...
Process 20777 ()...
Success...

 

And my reward is that the CPU stays high:

image.png.ff5e922e91f666bf417b99a9b6defe3c.png

 

CPU10 is released, but now CPU2 has gone high. And top still thinks there's nothing happening:

top - 07:04:08 up  9:21,  0 users,  load average: 18.17, 18.70, 18.23
Tasks: 636 total,   1 running, 635 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.4 sy,  0.0 ni, 78.2 id, 21.3 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  64082.5 total,    561.8 free,  10171.1 used,  53349.6 buff/cache
MiB Swap:   2048.0 total,   2014.0 free,     34.0 used.  53027.7 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                      
12200 root      20   0       0      0      0 S   7.3   0.0  44:19.24 unraidd0                                                                     
 7527 root      20   0       0      0      0 D   2.0   0.0  12:36.75 mdrecoveryd                                                                  
 4060 root      20   0    6204   2188   1880 S   0.3   0.0   0:02.50 blazer_usb                                                                   
 7505 root      20   0  279852   5080   4360 S   0.3   0.0   1:11.19 emhttpd                                                                      
28084 root      20   0   95996  30948  24156 S   0.3   0.0   0:00.81 update_3                                                                     
    1 root      20   0    2592   1808   1688 S   0.0   0.0   0:01.06 init                                                                         
    2 root      20   0       0      0      0 S   0.0   0.0   0:01.43 kthreadd                                                                     
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                       
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                   
    5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 slub_flushwq                 

 

Do we trust top or do we trust the dashboard? Personally I'm with top - so then given that the dashboard is updating why is it lying?

 

Then, on top of this docker has curled up it's toes and died:

image.png.2e400eab804a91ed80df17f6c52c2b21.png

 

And the logs having nothing to say:

Dec 12 06:53:14 Svalbard webGUI: Successful login user root from 192.168.2.101
Dec 12 07:00:01 Svalbard Plugin Auto Update: Checking for available plugin updates
Dec 12 07:00:09 Svalbard Plugin Auto Update: Community Applications Plugin Auto Update finished
Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777

 

But the parity check, now at 29%, is still running....which would normally have packed up and gone home by now. 

 

What to do? OK....let's go to Settings > Docker > Disable...then we can restart docker...nope....just the unraid wave again. The staus bar says "Services starting...." but it never ends. Now Docker is set to "n" but it's status is still "Running". The logs say:

Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777
Dec 12 07:07:29 Svalbard ool www[17934]: /usr/local/emhttp/plugins/dynamix/scripts/emcmd 'cmdStatus=Apply'
Dec 12 07:07:29 Svalbard emhttpd: Starting services...
Dec 12 07:07:29 Svalbard emhttpd: shcmd (27301): /etc/rc.d/rc.samba restart
Dec 12 07:07:29 Svalbard winbindd[14469]: [2024/12/12 07:07:29.359923,  0] ../../source3/winbindd/winbindd_dual.c:1964(winbindd_sig_term_handler)
Dec 12 07:07:29 Svalbard winbindd[14469]:   Got sig[15] terminate (is_parent=1)
Dec 12 07:07:29 Svalbard wsdd2[14466]: 'Terminated' signal received.
Dec 12 07:07:29 Svalbard wsdd2[14466]: terminating.
Dec 12 07:07:31 Svalbard root: Starting Samba:  /usr/sbin/smbd -D
Dec 12 07:07:31 Svalbard root:                  /usr/sbin/wsdd2 -d -4
Dec 12 07:07:31 Svalbard wsdd2[21498]: starting.
Dec 12 07:07:31 Svalbard root:                  /usr/sbin/winbindd -D
Dec 12 07:07:31 Svalbard emhttpd: shcmd (27306): /etc/rc.d/rc.avahidaemon restart
Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD Daemon: stopped
Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Got SIGTERM, quitting.
Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2.
Dec 12 07:07:31 Svalbard avahi-dnsconfd[14547]: read(): EOF
Dec 12 07:07:31 Svalbard avahi-daemon[14538]: avahi-daemon 0.8 exiting.
Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214).
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped root privileges.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: avahi-daemon 0.8 starting up.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully called chroot().
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped remaining capabilities.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/sftp-ssh.service.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/smb.service.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/ssh.service.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: New relevant interface eth0.IPv4 for mDNS.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Network interface enumeration completed.
Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Registering new address record for 192.168.6.2 on eth0.IPv4.
Dec 12 07:07:31 Svalbard emhttpd: shcmd (27307): /etc/rc.d/rc.avahidnsconfd restart
Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped
Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon:  /usr/sbin/avahi-dnsconfd -D
Dec 12 07:07:31 Svalbard avahi-dnsconfd[21580]: Successfully connected to Avahi daemon.
Dec 12 07:07:32 Svalbard emhttpd: shcmd (27312): /etc/rc.d/rc.docker stop
Dec 12 07:07:32 Svalbard avahi-daemon[21571]: Server startup complete. Host name is Svalbard.local. Local service cookie is 3045481838.
Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/ssh.service) successfully established.
Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/smb.service) successfully established.
Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/sftp-ssh.service) successfully established.

So it definitely did something....

 

While we wait lets go to AppBackup to see if there's anything in its log....Settings > AppBackup yields a blank page but after a few minutes it comes up. Check the log, and it still thinks it is running...so hit stop and copy the log:

[12.12.2024 00:00:02][ℹ️][Main] 👋 WELCOME TO APPDATA.BACKUP!! :D
[12.12.2024 00:00:03][ℹ️][Main] Backing up from: /mnt/user/appdata, /mnt/cache/appdata
[12.12.2024 00:00:03][ℹ️][Main] Backing up to: /mnt/scratch/archive_unraid/appdata_backups/ab_20241212_000003
[12.12.2024 00:00:03][ℹ️][Main] Selected containers: Calibre, Fenrus, Jellyfin, Mealie, Overseerr, PhotoPrism, Plex, PostgreSQL15, Starr, TubeSync, frigate-1, syncthing, tautulli
[12.12.2024 00:00:03][ℹ️][Main] Saving container XML files...
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Bazarr' is enabled and update is available! Schedule update after backup...
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Calibre' is enabled and update is available! Schedule update after backup...
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Czkawka' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Fenrus' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Flaresolverr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'frigate-1' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellyfin' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellystat' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'lidarr' is enabled and update is available! Schedule update after backup...
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Mealie' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Overseerr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PhotoPrism' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Plex' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PostgreSQL15' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Prowlarr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'qBittorrent' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Radarr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Readarr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'SABnzbd' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Sonarr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'syncthing' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'tautulli' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'TubeSync' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Unpackerr' is enabled but no update is available.
[12.12.2024 00:01:24][ℹ️][Main] Method: Stop all container before continuing.
[12.12.2024 00:01:24][ℹ️][Calibre] Stopping Calibre... done! (took 4 seconds)
[12.12.2024 00:01:28][ℹ️][Fenrus] Stopping Fenrus... done! (took 0 seconds)
[12.12.2024 00:01:28][ℹ️][PhotoPrism] Stopping PhotoPrism... done! (took 0 seconds)
[12.12.2024 00:01:28][ℹ️][TubeSync] Stopping TubeSync... done! (took 3 seconds)
[12.12.2024 00:01:31][ℹ️][Starr][Bazarr] Stopping Bazarr... done! (took 10 seconds)
[12.12.2024 00:01:41][ℹ️][Starr][Readarr] Stopping Readarr... done! (took 2 seconds)
[12.12.2024 00:01:43][ℹ️][Starr][lidarr] Stopping lidarr... done! (took 4 seconds)
[12.12.2024 00:01:47][ℹ️][Starr][Radarr] Stopping Radarr... done! (took 5 seconds)
[12.12.2024 00:01:52][ℹ️][Starr][Sonarr] Stopping Sonarr... done! (took 3 seconds)
[12.12.2024 00:01:55][ℹ️][Starr][Prowlarr] Stopping Prowlarr... done! (took 4 seconds)
[12.12.2024 00:01:59][ℹ️][Starr][qBittorrent] Stopping qBittorrent... done! (took 6 seconds)
[12.12.2024 00:02:05][ℹ️][Starr][SABnzbd] Stopping SABnzbd... done! (took 3 seconds)
[12.12.2024 00:02:08][ℹ️][Starr][GluetunVPN] Stopping GluetunVPN... done! (took 0 seconds)
[12.12.2024 00:02:08][ℹ️][PostgreSQL15] Stopping PostgreSQL15... done! (took 0 seconds)
[12.12.2024 00:02:08][ℹ️][Jellyfin] Stopping Jellyfin... done! (took 11 seconds)
[12.12.2024 00:02:19][ℹ️][tautulli] Stopping tautulli... done! (took 3 seconds)
[12.12.2024 00:02:22][ℹ️][Plex] Stopping Plex... done! (took 8 seconds)
[12.12.2024 00:02:30][ℹ️][Mealie] Stopping Mealie... done! (took 1 seconds)
[12.12.2024 00:02:31][ℹ️][frigate-1] Stopping frigate-pembroke... done! (took 10 seconds)
[12.12.2024 00:02:41][ℹ️][syncthing] Stopping syncthing... done! (took 4 seconds)
[12.12.2024 00:02:45][ℹ️][Overseerr] Stopping Overseerr... done! (took 3 seconds)
[12.12.2024 00:02:48][ℹ️][Main] Starting backup for containers
[12.12.2024 00:02:48][ℹ️][Calibre] Should NOT backup external volumes, sanitizing them...
[12.12.2024 00:02:48][ℹ️][Calibre] Calculated volumes to back up: /mnt/user/appdata/calibre
[12.12.2024 00:02:48][ℹ️][Calibre] Backing up Calibre...
[12.12.2024 00:02:49][ℹ️][Calibre] Backup created without issues (took 00:00:01 (hours:mins:secs))
[12.12.2024 00:02:49][ℹ️][Calibre] Verifying backup...
[12.12.2024 00:02:49][ℹ️][Calibre] Verification ended without issues (took 00:00:00 (hours:mins:secs))
[12.12.2024 00:02:49][ℹ️][Calibre] Installing planned update for Calibre...
[12.12.2024 00:03:04][ℹ️][Fenrus] Should NOT backup external volumes, sanitizing them...
[12.12.2024 00:03:04][ℹ️][Fenrus] Calculated volumes to back up: /mnt/user/appdata/fenrus/data
[12.12.2024 00:03:04][ℹ️][Fenrus] Backing up Fenrus...
[12.12.2024 00:03:04][ℹ️][Fenrus] Backup created without issues (took 00:00:00 (hours:mins:secs))
[12.12.2024 00:03:04][ℹ️][Fenrus] Verifying backup...
[12.12.2024 00:03:04][ℹ️][Fenrus] Verification ended without issues (took 00:00:00 (hours:mins:secs))
[12.12.2024 00:03:04][ℹ️][PhotoPrism] Should NOT backup external volumes, sanitizing them...
[12.12.2024 00:03:04][ℹ️][PhotoPrism] Calculated volumes to back up: /mnt/user/appdata/photoprism
[12.12.2024 00:03:04][ℹ️][PhotoPrism] Backing up PhotoPrism...
[12.12.2024 07:23:01][][PhotoPrism] tar creation failed! Tar said:

 

But....spoiler alert.....it still thinks its running. 

 

Jump back to console:

root@Svalbard:~# docker container list
CONTAINER ID   IMAGE                       COMMAND                  CREATED      STATUS                  PORTS     NAMES
0ca070369ca0   golift/unpackerr            "/unpackerr"             4 days ago   Up 10 hours                       Unpackerr
d96d888e2e34   flaresolverr/flaresolverr   "/usr/bin/dumb-init …"   4 days ago   Up 10 hours                       Flaresolverr
32a5d349b413   cyfershepard/jellystat      "docker-entrypoint.s…"   4 days ago   Up 10 hours (healthy)             Jellystat

 

But no amount of kill commands will make anything stop. So we have zombies. And the only solution for that is a reboot....so that's now the only way out from here. 

 

In the meantime though on the dashboard all of a sudden the docker listing is back, with everything stopped and three 'unknown' containers (presumably the ones I tried to kill). The system is also super slow....things appear but it takes a super long time...

 

Interestingly the parity check is still running, the read speeds are still updating, but the progress is stuck at 4.65TB / 29.1% - so that also isn't right. That should have progressed by now....

 

And the CPU keeps locking up:

image.png.c9fa4fbd49b763f364024fd386fb1756.png

 

It is a sick sick machine. But not in anyway that is possible to meaningfully diagnose....so far....

 

Can't reboot now either - not from console and not from GUI.....  :-(

 

There is already a discussion about this and we didn't get anywhere:

https://github.com/blakeblackshear/frigate/issues/8470
 

I managed to stop this hard locks just using a lower resolution feeds of the cameras, so it should be something related with memory usage, but I was not able to find a root cause.

  • Community Expert

looks like 
golift/unpackerr

may need some cpu pinning as it tried to use all cores and hung. maybe asign 4 cpu to that docker.

Edited by bmartino1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.