December 3, 20241 yr Author It's a puzzle allright! But, it has to be docker something I think. The network was a place to begin, but probably not the cause.... With docker running, but with all containers stopped, it is currently 57% of the way through the parity check. I would have expected a crash by now, but we will let it finish this time. Assuming there is no crash then it will not be the HBA - if it can complete the whole check and no skip a beat then there is no cause to suspect it.... It is also worth noting that appdata runs on a ZFS m.2 pool - so it isn't on the array, and it isn't on the HBA. So that is pretty much ruled out as well. I could destroy all dockers and start from scratch but it is a lot of work to set them all up....and since there is no (easy) docker-compose and most dockers (like the Arr stack) don't seem to have a simple export/import configuration option.... though presumably I can find the config files in the appdata folder.... In terms of docker: Weird network stuff? Unlikely, should be able to handle lots of dockers either bridged or hosted on own IPs, should be able to handle lots of traffic. External devices - 2x coral.ai for Frigate. There are 'reports' of these causing problems but logs and root causes are not well documented. Options: - Run docker with no Frigate - Run docker with Frigate and use CPU detection only - Run docker with Frigate and coral.ai (one instance) - Run docker with Frigate and coral.ai (different USB bus USB2 vs USB3) Accessing data on the array via dockers...maybe I'm doing this completely wrong. - I typically make the container path and the host path the same, that way I don't need to define different paths/aliases for each. - For SABnzbd (for example) both are set to /mnt/user/www_downloads/complete/usenet/ for completed downloads. All containers are similarly configured. - Could badly configured container read/writes to the system make Unraid unhappy? Presumably....is my approach OK? Can still put everythign back to SATA, but if parity check works then this is probably moot. Computer parts will all be here today except for power supply - apparently they never had the quantity they claimed, so need to wait until tomorrow for that (all going well). And that concludes the lunch break (mostly). 🙂
December 3, 20241 yr Community Expert ? there's a docker compose plugin (I use this) I was exploring that for other network related things for your answer, but it appeared you wanted to use unraid template system. as example netprobe: run multiple docker in its own bridge network. since zfs and docker image, are you using the xfs verison? what storage driver? as 7 did become rc1 and i have yet to find a issue/bug with it. https://docs.unraid.net/unraid-os/release-notes/7.0.0/#add-support-for-overlay2-storage-driver Quote Add support for overlay2 storage driver If you are using Docker data-root=directory on a ZFS volume, we recommend that you navigate to Settings → Docker and switch the Docker storage driver to overlay2, then delete the directory contents and let Docker re-download the image layers. If retaining the ability to downgrade to earlier releases is important, then switch to Docker data-root=xfs vDisk instead.
December 4, 20241 yr Author The parity check is still running - now at nearly 86%. Logs are squeaky clean. Power supply is delayed...so won't be here until Friday at the earliest as they're shipping it with one of the worst companies we have...so not likely to be here Friday....live in hope, but sigh. Yes, on Unraid prefer to stay with the 'template' approach, if only because I want this thing to be an appliance. The more I tinker the harder it becomes, especially if I break it after a numbers of months (or longer) and cannot remember the hacks. I have that in other places already (like Home Assistant) so need to concentrate my pain in certain areas and pay for an off-the-shelf solution in others (Unraid)...though not going well at present clearly! In terms of docker storage I just used all the defaults: Mine doesn't show the Docker version.... Was there anything wrong with how I map volumes to docker containers? Cheers!
December 4, 20241 yr Community Expert Version 7 beta 1 was showing the docker version for testing and which version was installed. I'm running v7 rc1 and happy with it. When parity is done. turn off docker. I would suggest deleting the docker.img via checkmark delete the vdisk file And then change docker data-root and use xfs vDisk img (this may fix some things) including later if/when upgrading to version 7 when fully release. This will remove all docker images in the docker tab. (data is not lost...) in docker tab at bottom add container > template drop down > click apply to bring them back. No data loss is done as data is stored in the appdata folder/ on disks and no in the docker image. The docker pull image name data is whats is lost which is why the docker tab will go empty... If you suspect its docker related, it may be due to the docker data-root option. I don't recall having full trace but I do recall a update to 6.x.x I forget which one where this fixed a bunch of docker issues for me. I also recommend installing and setting up a swap file for dockers. swap plugin needs a btrfs disk to place the swap file. Edited December 4, 20241 yr by bmartino1
December 5, 20241 yr Author OK....update. The parity completed last night while I was asleep. No errors, no crashes. Docker disk has been deleted and re-created. Now have a xfs imag running on a ZFS pool: I have re-created just the Gluetun and Arr stack based on the previous no-bridge configuration and have given it some work to do. No issues encountered, no crashes. Currently I've triggered move to push 500Gb onto the array from the cache just so I can give the array a bit of a workout with the stack up. I will add back one 'set' of dockers at a time and leave them for least half a day to see if there are problems. I have divided them like this: Arrs - first group, exerts load on array, network, cache. So far so good. Randoms (e.g. mealie, calibre, Twingate, Syncthing, TubeSync) - These add more load, more compute, add extra IP addresses to eth0, but otherwise should be benign. Plex - runs privilieged, does decoding, not really expecting an issue as when we had the crashes I'm 99.99% sure nothing was streaming so it should just have been pretty much idle. Jellyfin (ditto plex) Stats/Helper containers (Tautulli, JellyStat, JellySync, Postgres) - should be benign, just more load Frigate x2 - these were running all the time (as it is a high priority) so could just as easily be a problem cause. These run privileged, access both CPU for decoding and also USB for coral.ai object detection, and write to two separate pool drives (instead of array). The USB corals definitely some weird stuff, but never near the crash, but that doesn't mean there's no link. Nov 27 16:02:11 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: device firmware changed Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: USB disconnect, device number 5 Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: new SuperSpeed USB device number 6 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Might not be material, but it is a little weird. If found a post about this here (right at the bottom) and an Unraid forum article on this here. So that powertop thing is now on my list to do as well just to eliminate another possible issue...unless you think that's a bad idea.... Lastly, I have also tried to install the swap plugin - I only have one btrfs drive (an SSD) as the array is xfs and the swap is zfs. When I try to start the swap file I get this in the logs: Dec 5 18:25:34 Svalbard rc.swapfile[13294]: Plugin configuration written Dec 5 18:26:15 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start Dec 5 18:26:15 Svalbard rc.swapfile[20304]: Creating swap file /mnt/scratch/swapfile please wait ... Dec 5 18:26:19 Svalbard rc.swapfile[20621]: Swap file /mnt/scratch/swapfile created and started Dec 5 18:26:19 Svalbard kernel: BTRFS warning (device sdd1): swapfile must not be copy-on-write Dec 5 18:26:19 Svalbard rc.swapfile[20622]: Setting swappiness to 60 Dec 5 18:26:41 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile updatecfg true true /mnt/scratch swapfile UNRAID-SWAP 2048 60 Dec 5 18:26:42 Svalbard rc.swapfile[23332]: Plugin configuration written Dec 5 18:26:48 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start Dec 5 18:26:48 Svalbard rc.swapfile[24495]: Swap file /mnt/scratch/swapfile is on a BTRFS file system but does not have the No_COW attribute. How now brown cow....no cow? Found your post with the script, ran it, and sorted: Might be coincidence but after setting that up I got my first fault (not crash) in two days: Dec 5 19:47:56 Svalbard kernel: Adding 4194300k swap on /mnt/scratch/swapfile. Priority:-2 extents:11 across:130568864k Dec 5 19:51:23 Svalbard kernel: cgroup: fork rejected by pids controller in /docker/e7b0ac6467266b5fb595bca74d953c400e486175fd98e22fb74df13af3942211 Dec 5 19:54:54 Svalbard kernel: device_list[2339]: segfault at 0 ip 000000000093454b sp 00007ffeeeedd200 error 6 in php[600000+3b3000] likely on CPU 12 (core 24, socket 0) Dec 5 19:54:54 Svalbard kernel: Code: 08 e9 0a ab ff ff e8 14 1b ff ff 41 ff 27 e8 8c 0d fe ff 41 ff 27 e8 14 0c fe ff 41 ff 27 e8 4c 1f ff ff 41 ff 27 49 83 c7 20 <83> 02 01 41 ff 27 e8 ea 17 fb ff e9 65 c7 ff ff e8 e0 17 fb ff e9 Most of my dockers also run with these parameters to create a RAM-base swap file (since I 64GB to burn): --mount type=tmpfs,target=/tmp,tmpfs-mode=1777,tmpfs-size=256M --log-driver none --no-healthcheck Anyway, just shy of 48 hours with no crash. New PC is mostly built, but still waiting on power supply (tomorrow one hopes) and 4 drives from ServerPartDeals (Tue/Wed). For now, still chipping away on the old one....
December 5, 20241 yr Community Expert looks like great progress and hopefully fixed. IDK what changed with btrfs img and zfs in uraids evolution, glad that more stable then it was before. ? Are you using usb Hard Disk in the zfs pool/disk array? *as This may be the underline cause to the original kernel bugs Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: device firmware changed Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: USB disconnect, device number 5 *this is why unraid doesn't want usb devices within the poll/array disk as usb can randomly disconnect. When using usb enclosures or usb/thuderport attached disk. It best to use them with the unsigned device plugin and not with the unraid array/pool mechanics. There are other grub boot syslinux commands you could run for the power top issues as well. Look at the autotweak plugin You would be installing a 3rd party driver outside of unraids control that a more use at your own risk. but not a problem. *there are other plugins that may help there as well. https://slackware.pkgs.org/15.0/slackware-x86_64/powertop-2.13-x86_64-3.txz.html cd /boot/extra wget https://slackware.uk/slackware/slackware64-15.0/slackware64/ap/powertop-2.13-x86_64-3.txz #reboot to install With Unraid 7 RC1 they implement the ability to save udev rules at reboot. I can see udev used here to help with power or connection commands ... I usual disable usb sleep states (this is what causes the usb device to break...) grub syslinux options: #Disable LPM Globally: If you're unsure about the specific device or want a global solution, you can try disabling LPM entirely append initrd=/bzroot usbcore.autosuspend=-1 usbcore.quirks=0:k *Neeeds lspci and vfio device identifiers... comand are examples... *Lsusb: append initrd=/bzroot usbcore.autosuspend=-1 usbcore.quirks=0x1234:0x5678:k Replace 0xVID and 0xPID with the Vendor ID and Product ID of the USB device experiencing issues. You can find these values using lsusb or in the Unraid logs. (used as example 0x1234:0x5678) So go to Main > Flash and scroll down to system linux. My Recommend full unraid grub/syslinux: kernel /bzimage append initrd=/bzroot default_hugepagesz=1G hugepagesz=1G transparent_hugepage=always acpi=force pci=nocrs usbcore.autosuspend=-1 usbcore.quirks=0:k libata.allow_tpm=1 nvme_core.default_ps_max_latency_us=5500 pci=noaer pcie_aspm=off intremap=no_x2apic_optout Boot Parameters Here’s what each parameter does: default_hugepagesz=1G: Sets the default hugepage size to 1 GB. Hugepages are used to allocate large chunks of memory, which can improve performance for applications requiring large memory segments, such as virtual machines or databases. hugepagesz=1G: Explicitly specifies that hugepages should use a size of 1 GB. transparent_hugepage=always: Enables transparent hugepages. This allows the kernel to automatically use hugepages for memory allocation when possible, which can improve performance for some workloads. acpi=force: Forces ACPI (Advanced Configuration and Power Interface) to be enabled, even if the hardware or BIOS indicates it should not be. pci=nocrs: Prevents the kernel from using PCI host bridge resource entries provided by the ACPI firmware. This can be useful to avoid issues with devices being misconfigured. usbcore.autosuspend=-1: Disables USB autosuspend. This can prevent USB devices from being put into low-power states, which may resolve issues with devices disconnecting or behaving erratically. usbcore.quirks=0:k: Applies a quirk to USB devices. The 0:k setting disables Link Power Management (LPM) globally for all USB devices. This can fix problems with devices that don't handle LPM well. libata.allow_tpm=1: Enables support for Trusted Platform Module (TPM) passthrough on ATA devices. Useful in virtualized environments or for disk encryption. nvme_core.default_ps_max_latency_us=5500: Sets the maximum power-saving latency for NVMe devices to 5500 microseconds. Reducing this value can prevent NVMe devices from entering deeper power-saving states that may cause delays or performance issues. pci=noaer: Disables Advanced Error Reporting (AER) on PCI devices. This prevents noisy error messages in the logs, especially with hardware that doesn't fully support AER. pcie_aspm=off: Disables PCI Express Active State Power Management (ASPM), which can prevent power management issues that affect device stability. intremap=no_x2apic_optout: Ensures that interrupt remapping is used even if x2APIC (an advanced interrupt controller mode) is enabled. This can be important for stability in certain virtualization or hardware setups.
December 6, 20241 yr Author Just popped on to say that between work today and other commitments not much progress has been made, but I do now have all the dockers running again except for Frigate (x2). They will come back tomorrow all going well as it is quick and easy to reactivate the containers. I will update my boot parameters tomorrow (or Sunday) - some of these I have already, but there's some really good tweaks in that mix. Apparently there are "plans" for tomorrow which means we will not be home for most of the day so probably will not have a lot of time. So Sunday is a good day to update that, reinstall the additional NIC and relocate the server back to where it belongs. On the USB front I have no USB disks (at all), but the USB messages relate to the coral.ai devices that Frigate uses to do object detection. The "firmware change" message is normal and happens when the device is activated, but the LPM stuff is probably not what we want. Hopefully some of the boot tweaks (or power tweaks) will help to stop some of the errant behaviour. I've also read somewhere that if you have two of them you should put them on separate USB busses.....so I will plug one into the USB3 port and another into a USB 3.2 or USB C port (which should be separate from default USB3). We have now been up over three days with no crash....so I think the hardware is all perfectly fine. If we are still here this time tomorrow I will see what Frigate does.... 🙂
December 9, 20241 yr Author A few days have passed.....it is hard being patient....but everything was going well until this afternoon and then it crashed....I'm pretty certain that it is related to Frigate because everything was fine up and until I restarted that. I'd noticed a lot of cache writes and so I stopped everything until I found the ones responsible for the writes...and just after restarting it crashed: Dec 9 14:14:09 Svalbard kernel: eth0: renamed from veth1ee9a31 Dec 9 14:14:09 Svalbard kernel: python3[7289]: segfault at 1f00000049 ip 0000000000544235 sp 00007ffee9511880 error 4 in python3.9[41f000+288000] likely on CPU 12 (core 24, socket 0) Dec 9 14:14:09 Svalbard kernel: Code: 3d d0 57 8f 00 0f 84 26 01 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 80 00 00 00 00 4c 8b 4f 60 4d 85 c9 0f 84 81 01 00 00 <4f> 8b 2c 01 4c 39 eb 0f 84 e2 00 00 00 48 85 db 74 23 4d 85 ed 75 Dec 9 14:14:14 Svalbard kernel: veth1ee9a31: renamed from eth0 Dec 9 14:14:14 Svalbard kernel: eth0: renamed from vethc744450 Dec 9 14:14:33 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 6 using xhci_hcd Dec 9 14:14:33 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM. Dec 9 14:19:13 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038 Dec 9 14:19:13 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 9 14:19:13 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 9 14:19:13 Svalbard kernel: PGD 3c6c80067 P4D 3c6c80067 PUD 3d7958067 PMD 0 Dec 9 14:19:13 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Dec 9 14:19:13 Svalbard kernel: CPU: 12 PID: 26181 Comm: lsof Tainted: P O 6.1.118-Unraid #1 Dec 9 14:19:13 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:19:13 Svalbard kernel: FS: 000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:19:13 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0 Dec 9 14:19:13 Svalbard kernel: PKRU: 55555554 Dec 9 14:19:13 Svalbard kernel: Call Trace: Dec 9 14:19:13 Svalbard kernel: <TASK> Dec 9 14:19:13 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 9 14:19:13 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 9 14:19:13 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 9 14:19:13 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 9 14:19:13 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x20/0xcf Dec 9 14:19:13 Svalbard kernel: ? kmem_cache_alloc+0x122/0x14d Dec 9 14:19:13 Svalbard kernel: kmem_cache_free+0xb7/0x154 Dec 9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: do_readlinkat+0x61/0x106 Dec 9 14:19:13 Svalbard kernel: __x64_sys_readlink+0x1a/0x21 Dec 9 14:19:13 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 9 14:19:13 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 9 14:19:13 Svalbard kernel: RIP: 0033:0x14a122677197 Dec 9 14:19:13 Svalbard kernel: Code: 73 01 c3 48 8b 0d 81 2c 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 2c 0e 00 f7 d8 64 89 02 48 Dec 9 14:19:13 Svalbard kernel: RSP: 002b:00007ffdd4212428 EFLAGS: 00000206 ORIG_RAX: 0000000000000059 Dec 9 14:19:13 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a122677197 Dec 9 14:19:13 Svalbard kernel: RDX: 0000000000001000 RSI: 00007ffdd42124a0 RDI: 0000000000496870 Dec 9 14:19:13 Svalbard kernel: RBP: 00007ffdd4212460 R08: 0000000000000064 R09: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 00007ffdd4215b98 R14: 0000000000433dd0 R15: 000014a1227dc000 Dec 9 14:19:13 Svalbard kernel: </TASK> Dec 9 14:19:13 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 Dec 9 14:19:13 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm] Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 Dec 9 14:19:13 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:19:13 Svalbard kernel: FS: 000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:19:13 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0 Dec 9 14:19:13 Svalbard kernel: PKRU: 55555554 Dec 9 14:19:13 Svalbard kernel: note: lsof[26181] exited with irqs disabled Dec 9 14:23:37 Svalbard emhttpd: spinning down /dev/sdh Dec 9 14:25:19 Svalbard emhttpd: spinning down /dev/sdb Dec 9 14:25:40 Svalbard emhttpd: spinning down /dev/sdg Dec 9 14:25:44 Svalbard emhttpd: spinning down /dev/sdj Dec 9 14:29:26 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038 Dec 9 14:29:26 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 9 14:29:26 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 9 14:29:26 Svalbard kernel: PGD 2cfb02067 P4D 2cfb02067 PUD 385ec1067 PMD 0 Dec 9 14:29:26 Svalbard kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Dec 9 14:29:26 Svalbard kernel: CPU: 12 PID: 336 Comm: lsof Tainted: P D O 6.1.118-Unraid #1 Dec 9 14:29:26 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:29:26 Svalbard kernel: RSP: 0018:ffffc900259bbdd0 EFLAGS: 00010202 Dec 9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:29:26 Svalbard kernel: RDX: ffffc900259bbe20 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8124c0c2 Dec 9 14:29:26 Svalbard kernel: R10: ffffc900259bbd20 R11: ffffc900259bbe94 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: ffffc900259bbe90 R14: ffffc900259bbe20 R15: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: FS: 00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:29:26 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0 Dec 9 14:29:26 Svalbard kernel: PKRU: 55555554 Dec 9 14:29:26 Svalbard kernel: Call Trace: Dec 9 14:29:26 Svalbard kernel: <TASK> Dec 9 14:29:26 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 9 14:29:26 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 9 14:29:26 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 9 14:29:26 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 9 14:29:26 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: kmem_cache_free+0xb7/0x154 Dec 9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: __do_sys_newfstatat+0x26/0x5c Dec 9 14:29:26 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 9 14:29:26 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 9 14:29:26 Svalbard kernel: RIP: 0033:0x1514d1ae71ca Dec 9 14:29:26 Svalbard kernel: Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 19 4c 0e 00 f7 Dec 9 14:29:26 Svalbard kernel: RSP: 002b:00007fff7debeb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 Dec 9 14:29:26 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00001514d1ae71ca Dec 9 14:29:26 Svalbard kernel: RDX: 00007fff7debecb0 RSI: 00007fff7debebc0 RDI: 00000000ffffff9c Dec 9 14:29:26 Svalbard kernel: RBP: 00007fff7dec0e10 R08: 0000000000000073 R09: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: 00007fff7dec4548 R14: 0000000000433dd0 R15: 00001514d1c4e000 Dec 9 14:29:26 Svalbard kernel: </TASK> Dec 9 14:29:26 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 Dec 9 14:29:26 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm] Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 Dec 9 14:29:26 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:29:26 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:29:26 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:29:26 Svalbard kernel: FS: 00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:29:26 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0 Dec 9 14:29:26 Svalbard kernel: PKRU: 55555554 Dec 9 14:29:26 Svalbard kernel: note: lsof[336] exited with irqs disabled My second Unraid is now built, the new disks are pre-clearing / testing so soon that will be up and running. In the meantime I have to take a side trip to see if I can figure out why Frigate is writing so much to the cache - it certainly isn't the share - that's pointed to a single disk pool with no caching. So it is some sort of appdata activity.....
December 9, 20241 yr Community Expert sometimes a docker may have hidden settings default maped to other locations. which one are you running? this docker has some other warrning as well check support if its misconfigured. also which varient? stable? as this one did have some hidden settings... but non that would point to another write source:
December 9, 20241 yr Author I am running two copies of the default repository: And I'm pointing to ghcr.io/blakeblackshear/frigate:stable. I'm running coral.ai via USB so I have the containers as privileged - everything works just fine, but during a parity sync if Frigate is running it will crash. I had a hard crash last night (Unraid dropped off the network) and so on restart it started a parity check....which crashed in the usual way at 6.5% progress. I have gradually been re-instating my old settings (second NIC with vLANs) but the crashes are still doing my head in....everything is fine and then bang - unraid kernel bug and we're dead in the water. There's an outside chance that the cause is running a pair of Frigate instances....but I can't see why that would cause parity-check problems when then frigate data paths are nowhere near the array itself. It just makes ZERO sense......
December 9, 20241 yr Community Expert ? are you passing the coral USB to one of the dockers? (How --device?). I think it may be a limitation of the coral USB being called by 2 different docker instances... Coral Detector: https://coral.ai/products/accelerator/ Does the crash happen with the USB connected? Edited December 9, 20241 yr by bmartino1
December 9, 20241 yr Author Yes...the crash happens with the TPU connected and used (with Frigate not running there is no crash, but the TPUs are still connected)....but there are two TPUs and the Frigate instances just take one each..... Even if that was a cause....how does that then relate to a Unraid kernel crash? I can see how that might cause a problem in the dockers if they were fighting for it. I followed a guide (from somewhere) to set it up. In the first instance it is configured like this: detectors: coral: type: edgetpu device: usb:0 In the second the device is set to usb:1. With one Frigate running only one TPU is active (can tell by the flashing light). When a second instance is started the other TPU comes on line and both start flashing. We might be getting somewhere though as this is definitely Frigate related one way or another I think.... I will dig some more, and I can (once the the other box is ready) move a frigate instance onto another platform so there are no longer dual TPUs.....
December 10, 20241 yr Community Expert ok I assume you renamed to have separate templates. That looks like the yaml inside frigate what I'm asking is how are you passing the TPU into frigate. Example I pass this usb device from unraid into my plex for tuner operations: lsblk since there are 2 you will need to be device specific... example --device=/dev/bus/usb/001/002 ? This is a docker host/containerized issues. if multiple Docker containers attempt to access the same USB device simultaneously, it could indeed cause conflicts or errors, as typically these devices are not designed for concurrent access by multiple clients. You would need to manage access carefully, possibly by coordinating access through software or limiting the device to one container at a time. https://coral.ai/docs/accelerator/get-started/ or are you using the default temaplate addon: Per recent docker support on that one they recommend a fresh template download. On 10/21/2020 at 2:31 AM, yayitazale said: IMPORTANT PLEASE UPDATE V0.14.0 IS A BREAKING CHANGE: It is recommended to uninstall the current app and reinstall it to meet the needs of the new template. PLEASE READ THE CHANGELOG TO UPDATE YOUR CONFIG FILE Support for Frigate docker container. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Designed for integration with HomeAssistant or others via MQTT. Application Name: Frigate Application Site: https://github.com/blakeblackshear/frigate Docker Hub: https://hub.docker.com/r/blakeblackshear/frigate/ Github: https://github.com/blakeblackshear/frigate Documentation: https://docs.frigate.video/ This container is only for AMD64 architecture CPUs (Intel/AMD) and its intended use is with a Coral Edge TPU accelerator to reduce the CPU usage. Make sure to look at the complete documentation available on Github! Any question about the usage of the app and runtime errors please use the github issue page. PD: To use a M.2 or PCI CORAL instead of a USB Edge TPU install the drivers easily thanks to @ich777 by going to CA Apps and installing the 'Coral Accelerator Module Driver' app. To use a Nvidia dedicate graphics card install the 'Nvidia-Driver' plugin from CA Apps.
December 10, 20241 yr Author Yup - two different names for the templates - like frigate-baker and frigate-jones. Here is my two TPU instances: On the templates I'm using the default /dev/bus/usb notation for both. I can update this /dev/bus/usb/002/004 and 005 respectively in the docker configuration and it will start fine. In theory I suppose this hard codes the TPU (based on USB location) to the Frigate instance (so also no more changing ports when unplugging / replugging the devices). I cannot set this in the application configuration as the system fails to start successfully. The default in any case is also just USB...the usb:0 and usb:1 I believe are to keep them separate (somehow). I have the latest template (I think) as only really started building this in August when this was only just released - certainly I can't see any obvious differences so probably OK. [Also, which of your donate options results in the most cash actually getting to you?]
December 10, 20241 yr Community Expert PM about donation stuff. I think to help fix some issues you may need to do a deticated device per docker. /dev/bus/usb/002/004 and /dev/bus/usb/002/005 this may change teh internal, as frigate would only see usb:0 you would have to console into frigate and use lsblk to confirm or cd to /dev/bus/usb/ and type ls and go from there to see if only one was in per container.
December 10, 20241 yr Author Hmmmm....the lsblck gives me nothing - just disk mounts.....but the ls on the /dev/bus/usb yields the same result on both containers: # cd /dev/bus/usb # ls 001 002 If both claim it then it seems that sharing isn't possible after all? Or is some other way of preventing containers from helping themselves? A bit more digging - the 001 and 002 are the two usb busses....inside each there are more 001, 002, 003 etc.... so this isn't definitive - it seems the container can see everything. I've pondered using the docker compose manager approach....but interesting when the machine rebooted after this afternoon's crash the USB addresses have actually moved even though the USB devices have *not* been moved to different ports. Compare this to the previous screenshot: So assigning a specific address is a bust as they keep moving. Edited December 10, 20241 yr by ChirpyTurnip Updated details
December 10, 20241 yr Author Righto. So one crash later and a new plan....fooled around for a while to see if I could pin the coral devices, but no. Might be possible, but definitely too much hassle. So the instance with the fewest cameras has had the coral removed and must now use CPU-based detection. So that leaves me no shared USB device as *only* one instance will be using it. And now we wait....hopefully a long time....I'll be a bit disappointed if it dies again with an hour or so....
December 10, 20241 yr Author It died as well - I saw it when I woke up int he middle of the night. So I removed one of the frigate instances, left the coral.ai units plugged in, shutdown it down (again didn't actually power down), forced it off, and powered it back on. So now running with just one instance of Frigate, as privileged, with TPU detection enabled, and now we wait again.... I did see now in reviewing the logs some odd USB behaviour during the boot (look for usb 2-9.1): Dec 11 02:28:41 Svalbard kernel: IPMI message handler: version 39.2 Dec 11 02:28:41 Svalbard kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled Dec 11 02:28:41 Svalbard kernel: Freeing initrd memory: 30324K Dec 11 02:28:41 Svalbard kernel: lp: driver loaded but no devices found Dec 11 02:28:41 Svalbard kernel: hpet_acpi_add: no address or irqs in _CRS Dec 11 02:28:41 Svalbard kernel: Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds). Dec 11 02:28:41 Svalbard kernel: AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. Dec 11 02:28:41 Svalbard kernel: Floppy drive(s): fd1 is 1.2M Dec 11 02:28:41 Svalbard kernel: loop: module loaded Dec 11 02:28:41 Svalbard kernel: Rounding down aligned max_sectors from 4294967295 to 4294967288 Dec 11 02:28:41 Svalbard kernel: db_root: cannot open: /etc/target Dec 11 02:28:41 Svalbard kernel: VFIO - User Level meta-driver version: 0.3 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Host supports USB 3.2 Enhanced SuperSpeed Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: 16 ports detected Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: 9 ports detected Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usb-storage Dec 11 02:28:41 Svalbard kernel: i8042: PNP: No PS/2 controller found. Dec 11 02:28:41 Svalbard kernel: mousedev: PS/2 mouse device common for all mice Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver synaptics_usb Dec 11 02:28:41 Svalbard kernel: input: PC Speaker as /devices/platform/pcspkr/input/input0 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: RTC can wake from S4 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: registered as rtc0 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: setting system clock to 2024-12-10T13:27:52 UTC (1733837272) Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: alarms up to one month, y3k, 114 bytes nvram Dec 11 02:28:41 Svalbard kernel: intel_pstate: Intel P-state driver initializing Dec 11 02:28:41 Svalbard kernel: intel_pstate: HWP enabled Dec 11 02:28:41 Svalbard kernel: pstore: Registered efi as persistent store backend Dec 11 02:28:41 Svalbard kernel: hid: raw HID events driver (C) Jiri Kosina Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usbhid Dec 11 02:28:41 Svalbard kernel: usbhid: USB HID core driver Dec 11 02:28:41 Svalbard kernel: ipip: IPv4 and MPLS over IPv4 tunneling driver Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_INET6 protocol family Dec 11 02:28:41 Svalbard kernel: Segment Routing with IPv6 Dec 11 02:28:41 Svalbard kernel: RPL Segment Routing with IPv6 Dec 11 02:28:41 Svalbard kernel: In-situ OAM (IOAM) with IPv6 Dec 11 02:28:41 Svalbard kernel: 9pnet: Installing 9P2000 support Dec 11 02:28:41 Svalbard kernel: microcode: sig=0xb0671, pf=0x2, revision=0x12b Dec 11 02:28:41 Svalbard kernel: microcode: Microcode Update Driver: v2.2. Dec 11 02:28:41 Svalbard kernel: IPI shorthand broadcast: enabled Dec 11 02:28:41 Svalbard kernel: sched_clock: Marking stable (2528000652, 6582841)->(2556612702, -22029209) Dec 11 02:28:41 Svalbard kernel: registered taskstats version 1 Dec 11 02:28:41 Svalbard kernel: Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no Dec 11 02:28:41 Svalbard kernel: pstore: Using crash dump compression: deflate Dec 11 02:28:41 Svalbard kernel: clk: Disabling unused clocks Dec 11 02:28:41 Svalbard kernel: usb 1-5: new high-speed USB device number 2 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 2-8: new SuperSpeed USB device number 2 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6: new high-speed USB device number 3 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 2-9: new SuperSpeed USB device number 3 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-9: new high-speed USB device number 4 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6.1: new low-speed USB device number 5 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:0665:5161.0001: hiddev96,hidraw0: USB HID v1.00 Device [INNO TECH USB to Serial] on usb-0000:00:14.0-6.1/input0 Dec 11 02:28:41 Svalbard kernel: floppy0: no floppy controllers found Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (initmem) memory: 1884K Dec 11 02:28:41 Svalbard kernel: Write protecting the kernel read-only data: 18432k Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (text/rodata gap) memory: 2040K Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (rodata/data gap) memory: 140K Dec 11 02:28:41 Svalbard kernel: rodata_test: all tests were successful Dec 11 02:28:41 Svalbard kernel: Run /init as init process Dec 11 02:28:41 Svalbard kernel: with arguments: Dec 11 02:28:41 Svalbard kernel: /init Dec 11 02:28:41 Svalbard kernel: with environment: Dec 11 02:28:41 Svalbard kernel: HOME=/ Dec 11 02:28:41 Svalbard kernel: TERM=linux Dec 11 02:28:41 Svalbard kernel: BOOT_IMAGE=/bzimage Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 4, error -62 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 5, error -62 Dec 11 02:28:41 Svalbard kernel: usb 2-9-port1: attempt power cycle Dec 11 02:28:41 Svalbard kernel: usb 1-10: new high-speed USB device number 6 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6.3: new high-speed USB device number 7 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: usb-storage 1-6.3:1.0: USB Mass Storage device detected Dec 11 02:28:41 Svalbard kernel: scsi host0: usb-storage 1-6.3:1.0 Dec 11 02:28:41 Svalbard kernel: usb 1-11: new full-speed USB device number 8 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:048D:5702.0002: hiddev97,hidraw1: USB HID v1.12 Device [ITE Tech. Inc. ITE Device] on usb-0000:00:14.0-11/input0 Dec 11 02:28:41 Svalbard kernel: scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PMAP PQ: 0 ANSI: 6 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] 121110528 512-byte logical blocks: (62.0 GB/57.8 GiB) Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write Protect is off Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Mode Sense: 45 00 00 00 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Dec 11 02:28:41 Svalbard kernel: sda: sda1 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk Dec 11 02:28:41 Svalbard kernel: random: crng init done Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: loop0: detected capacity change from 0 to 130016 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 6, error -62 Dec 11 02:28:41 Svalbard kernel: loop1: detected capacity change from 0 to 713824 Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_UNIX/PF_LOCAL protocol family Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input1 Dec 11 02:28:41 Svalbard kernel: ACPI: button: Sleep Button [SLPB] Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2 Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRB] Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 Dec 11 02:28:41 Svalbard kernel: intel_pmc_core INT33A1:00: initialized Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRF] Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: version 3.0 Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: flags: 64bit ncq sntf led clo only pio slum part ems deso sadm sds Dec 11 02:28:41 Svalbard kernel: mei_me 0000:00:16.0: enabling device (0000 -> 0002) The full boot log is attached.... With these errors however something has gone wrong with one of the coral.ai devices as now there is only one "google" device showing: So it seems one of the units definitely rejected the address supplied and stayed off line. If I unplug the device that isn't being used and plug it back in it shows up in the system device list now as a completely different device: If I plug in into another port I also get the same weird result: So for now I've left it unplugged. Parity check is on 16%, which is again higher than it has managed in the last day.... And now. We. Wait. Again. unraid_boot.log
December 10, 20241 yr Community Expert Definitely looks like the USB 2-9-1 didn't do well with a power cycles and may need unplugged re-pluged to be working again. *Bios option for usb suspend/power/sleep states? My recommendation is in the template click remove on option and use the extra parameter to add the device. This will delete that from the template In the extra parameter, add the device via --device=/dev/bus/usb/001 *Selecting the correct parent path for the usb device you may need to do device attach and pathing for the container. Example for a plex nvdia gpu: --device=/dev/dri:/dev/dri Unread host /dev : Container /dev so /dev/dri is the NVIDIA drive dev path and separated by ":" to the host path sees /dev/dri *sometimes the driver and runtime docker options are not enough... Either use the extra parameter or the template option. I think you may have had both or didn't set the TPU mapping in the template correctly when trying to separate them. as the picture would pass all usb devices. as devices: - “/dev/bus/usb:/dev/bus/usb” # Mount the entire USB bus *Since unraid is USB based, when that side of the kernel crashes, you get these kernel errors. Try moving the USB flash drive to a another USB port (unraid prefers usb2.0 ports) it may be sharing the bus with the TPU sensors.) I recommended going after a internal motherboard header and attaching the disk there to separate it from the bus: https://a.co/d/idjePjN I honestly prefer the /dev/by serial ID option: same --device path but by /dev/serail/by id ls /dev/serial/by-id and passign it via ID to make sure to grab one and only that device... I usually see USB docker passing with the zigbee/zwave in home assistant. Review: I have to relook at unraid and the kernel option to find the other usb quiks and options to disable selective power and sleep states. Esentialy looking for the linux option in windows advance power options so the syslinx/grub command needs to be "-1" usbcore.autosuspend=-1 In the context of the usbcore.autosuspend kernel parameter, the value you assign controls the default autosuspend delay for all USB devices. Here’s what the values mean: -1: Autosuspend is disabled for all USB devices. This means the devices won't enter the power-saving mode automatically. 0 or positive integers (e.g., 1, 2, etc.): These values set the delay in seconds before a USB device is autosuspended after it becomes idle. So, if you set usbcore.autosuspend=0, it means autosuspend is enabled with no delay—devices can suspend immediately when they become idle. Setting it to -1 completely disables the autosuspend feature, keeping the devices powered all the time, similar to the "disabled" setting for USB selective suspend in Windows.
December 10, 20241 yr Author And.....we're dead. A hard crash this time. Completely fell off the network again. The last syslog entry offers a potential clue: Dec 11 08:25:08 Svalbard kernel: veth6c58b38: renamed from eth0 Dec 11 08:25:10 Svalbard kernel: eth0: renamed from veth970b4eb Anyway...where to from here: Power saving in the BIOS is disabled (as best I can tell) I'm already running usbcore.autosuspend=-1 as a boot option, so that's not a fix The Unraid flash is on a USB2 bus, so separate from USB3 (where the TPU is). Aside from the UPS, the (now single) TPU, and the boot flash there's no other USB devices. I've tried the TPU mapping as both /dev/bus/usb and also as /dev/bus/usb/002/004 but the bus numbering keeps changing so it's not a static setting. ls /dev/serial/by-id returns nothing as there is no /dev/serial path lsusb -t returns: /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/9p, 20000M/x2 |__ Port 8: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 9: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 1: Dev 5, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M |__ Port 5: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M |__ Port 9: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 10: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 4: Dev 8, If 0, Class=Human Interface Device, Driver=usbfs, 1.5M |__ Port 11: Dev 7, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 14: Dev 9, If 0, Class=Wireless, Driver=btusb, 12M |__ Port 14: Dev 9, If 1, Class=Wireless, Driver=btusb, 12M lsusb returns: Bus 002 Device 005: ID 18d1:9302 Google Inc. Bus 002 Device 003: ID 0bda:0411 Realtek Semiconductor Corp. Hub Bus 002 Device 002: ID 0bda:0411 Realtek Semiconductor Corp. Hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 004: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub Bus 001 Device 005: ID 0951:1666 Kingston Technology DataTraveler 100 G3/G4/SE9 G2/50 Bus 001 Device 003: ID 05e3:0608 Genesys Logic, Inc. Hub Bus 001 Device 002: ID 05e3:0608 Genesys Logic, Inc. Hub Bus 001 Device 009: ID 8087:0033 Intel Corp. Bus 001 Device 007: ID 048d:5702 Integrated Technology Express, Inc. ITE Device Bus 001 Device 008: ID 0665:5161 Cypress Semiconductor USB to Serial Bus 001 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub The lsusb -v of the Google devices returns (with no serial number): Bus 002 Device 005: ID 18d1:9302 Google Inc. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.10 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x18d1 Google Inc. idProduct 0x9302 bcdDevice 1.00 iManufacturer 0 iProduct 0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0060 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 896mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 6 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 1 bMaxBurst 0 Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 0x0016 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000002 HIRD Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000c Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 2 Lowest fully-functional device speed is High Speed (480Mbps) bU1DevExitLat 0 micro seconds bU2DevExitLat 0 micro seconds Device Status: 0x0000 (Bus Powered) Not sure if that's useful or not..... In the meantime running again, and waiting for the next crash. If it happens again I'm pulling the TPU out completely and running on CPU-based detection. Interestingly when I did that on the other instance yesterday I *still* had to runb it as privileged as it wouldn't connect to the cameras without that....unexpected I think as I though privileged was only for the TPU - however it is possible that it is also needed for the Intel GPU access....
December 10, 20241 yr Community Expert dang... I'm not sure of a potential solution, ATM. Do you have other usb2.0 port to try the coal usb device in this may be a kernel usb 3 issue. ALSO with a USB attached UPS this could also be flaging the error in the kernel as the bus try to reset/recover. ?double check the UPS usb connection. and Unraid NUT/Power stuff... *i'm lacking in that area. The prevailed option is required when using other devices, as this grants root access to the host for these devices. Usually when I have set up frigate in the past it was with a NVIDIA GPU There are other /dev/ call for usb if there is nor serial ID. That is weird to me being on a serial bus... As the bus keeps changing is a different issue... ?pcie usb addon card? where you can pass the USB pcie device instead? ?-mabye go with a coral ai pce device instead? From the information you've provided, it appears the Google device is located at Bus 002 Device 005 with the ID 18d1:9302. This device will need to be passed to the Frigate container so it can utilize the Coral AI capabilities. *Since there the same device ID we won't be able to use the ID as a selector... based on what you have provided, the single device pass would be this extra parm command: --device /dev/bus/usb/002/005 But you need to review your docker template and make sure frigate only get that device. as this is why i would want udev rules.. in Beta 7 rc1 you could make a udev rule... cd /boot/udev/99-usb-coral.rules ACTION=="add", ATTRS{idVendor}=="18d1", ATTRS{idProduct}=="9302", SYMLINK+="coral_ai" ls -l /dev/coral_ai --device /dev/coral_ai ...
December 11, 20241 yr Author Definitely getting closer..... Just had another crash, but it was non-fatal: Dec 11 13:05:19 Svalbard emhttpd: read SMART /dev/sdc Dec 11 13:54:53 Svalbard emhttpd: spinning down /dev/sdc Dec 11 14:42:52 Svalbard kernel: ------------[ cut here ]------------ Dec 11 14:42:52 Svalbard kernel: WARNING: CPU: 12 PID: 32398 at fs/dcache.c:430 retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc Dec 11 14:42:52 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 11 14:42:52 Svalbard kernel: CPU: 12 PID: 32398 Comm: lsof Tainted: P O 6.1.118-Unraid #1 Dec 11 14:42:52 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 11 14:42:52 Svalbard kernel: RIP: 0010:retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: Code: 74 18 eb e9 48 8b 43 60 48 89 df 48 8b 40 20 ff d0 0f 1f 00 85 c0 74 e4 eb d3 ff 4b 5c 0f ba e0 13 72 49 a9 00 04 08 00 74 02 <0f> 0b 0d 00 00 08 00 89 03 65 48 ff 05 fe ea dc 7e f7 03 00 00 70 Dec 11 14:42:52 Svalbard kernel: RSP: 0018:ffffc9002debbd98 EFLAGS: 00010206 Dec 11 14:42:52 Svalbard kernel: RAX: 0000000000600c00 RBX: ffff88841b174900 RCX: 0000000000000064 Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88841b174900 Dec 11 14:42:52 Svalbard kernel: RBP: ffffc9002debbe65 R08: 00000000009461d4 R09: 000000000000000a Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Dec 11 14:42:52 Svalbard kernel: R13: ffffffff812b1d42 R14: ffff88841b174900 R15: ffff88841b174900 Dec 11 14:42:52 Svalbard kernel: FS: 00001540446a8e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 11 14:42:52 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 11 14:42:52 Svalbard kernel: CR2: 00005590cfeb8000 CR3: 00000004b5f4e000 CR4: 0000000000750ee0 Dec 11 14:42:52 Svalbard kernel: PKRU: 55555554 Dec 11 14:42:52 Svalbard kernel: Call Trace: Dec 11 14:42:52 Svalbard kernel: <TASK> Dec 11 14:42:52 Svalbard kernel: ? __warn+0xab/0x122 Dec 11 14:42:52 Svalbard kernel: ? report_bug+0x109/0x17e Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: ? handle_bug+0x41/0x6f Dec 11 14:42:52 Svalbard kernel: ? exc_invalid_op+0x13/0x60 Dec 11 14:42:52 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20 Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: dput+0x41/0x17b Dec 11 14:42:52 Svalbard kernel: proc_fill_cache+0x110/0x156 Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a Dec 11 14:42:52 Svalbard kernel: proc_readfd_common+0x16b/0x1bc Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d Dec 11 14:42:52 Svalbard kernel: iterate_dir+0x94/0x149 Dec 11 14:42:52 Svalbard kernel: __do_sys_getdents64+0x6b/0xd8 Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a Dec 11 14:42:52 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 11 14:42:52 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 11 14:42:52 Svalbard kernel: RIP: 0033:0x154044908283 Dec 11 14:42:52 Svalbard kernel: Code: 89 df e8 20 05 fb ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 61 0b 11 00 f7 d8 Dec 11 14:42:52 Svalbard kernel: RSP: 002b:00007ffde2feab28 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 Dec 11 14:42:52 Svalbard kernel: RAX: ffffffffffffffda RBX: 00000000004c8c80 RCX: 0000154044908283 Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000008000 RSI: 00000000004c8cb0 RDI: 0000000000000004 Dec 11 14:42:52 Svalbard kernel: RBP: 00000000004c8c84 R08: 0000154044a1a2d0 R09: 0000154044a1a2d0 Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000293 R12: ffffffffffffff88 Dec 11 14:42:52 Svalbard kernel: R13: 0000000000000002 R14: 0000000000433dd0 R15: 0000154044a9b000 Dec 11 14:42:52 Svalbard kernel: </TASK> Dec 11 14:42:52 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 11 14:43:02 Svalbard kernel: ------------[ cut here ]------------ Dec 11 14:43:02 Svalbard kernel: WARNING: CPU: 10 PID: 314 at fs/dcache.c:472 dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc Dec 11 14:43:02 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 11 14:43:02 Svalbard kernel: CPU: 10 PID: 314 Comm: kswapd0 Tainted: P W O 6.1.118-Unraid #1 Dec 11 14:43:02 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 11 14:43:02 Svalbard kernel: RIP: 0010:dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: Code: ef e8 d3 d1 62 00 89 c2 b8 03 00 00 00 85 d2 74 7b 83 7b dc 00 8b 43 80 74 40 89 c2 81 e2 00 04 08 00 81 fa 00 00 08 00 74 02 <0f> 0b 25 ff ff f7 ff 89 43 80 65 48 ff 0d 34 e8 dc 7e f7 43 80 00 Dec 11 14:43:02 Svalbard kernel: RSP: 0018:ffffc90000cd7ab8 EFLAGS: 00010206 Dec 11 14:43:02 Svalbard kernel: RAX: 0000000000888c40 RBX: ffff88841b174980 RCX: ffffc90000cd7b78 Dec 11 14:43:02 Svalbard kernel: RDX: 0000000000080400 RSI: ffff888109889888 RDI: ffff88841b174958 Dec 11 14:43:02 Svalbard kernel: RBP: ffff88841b174958 R08: 0000000000000000 R09: 0000000000000014 Dec 11 14:43:02 Svalbard kernel: R10: ffff888106454380 R11: ffff888145095240 R12: ffff888109889888 Dec 11 14:43:02 Svalbard kernel: R13: ffffc90000cd7b78 R14: ffffffff8125cee6 R15: ffff88841b174980 Dec 11 14:43:02 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f280000(0000) knlGS:0000000000000000 Dec 11 14:43:02 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 11 14:43:02 Svalbard kernel: CR2: 00000000004d8098 CR3: 000000000420a000 CR4: 0000000000750ee0 Dec 11 14:43:02 Svalbard kernel: PKRU: 55555554 Dec 11 14:43:02 Svalbard kernel: Call Trace: Dec 11 14:43:02 Svalbard kernel: <TASK> Dec 11 14:43:02 Svalbard kernel: ? __warn+0xab/0x122 Dec 11 14:43:02 Svalbard kernel: ? report_bug+0x109/0x17e Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: ? handle_bug+0x41/0x6f Dec 11 14:43:02 Svalbard kernel: ? exc_invalid_op+0x13/0x60 Dec 11 14:43:02 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20 Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38 Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x20/0xb1 Dec 11 14:43:02 Svalbard kernel: __list_lru_walk_one+0x90/0x123 Dec 11 14:43:02 Svalbard kernel: list_lru_walk_one+0x60/0x7d Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38 Dec 11 14:43:02 Svalbard kernel: prune_dcache_sb+0x46/0x73 Dec 11 14:43:02 Svalbard kernel: super_cache_scan+0xf4/0x17c Dec 11 14:43:02 Svalbard kernel: do_shrink_slab+0x188/0x2a1 Dec 11 14:43:02 Svalbard kernel: shrink_slab+0x1f9/0x267 Dec 11 14:43:02 Svalbard kernel: shrink_node+0x334/0x588 Dec 11 14:43:02 Svalbard kernel: balance_pgdat+0x4e9/0x6a2 Dec 11 14:43:02 Svalbard kernel: ? update_cfs_rq_load_avg+0x176/0x189 Dec 11 14:43:02 Svalbard kernel: ? update_load_avg+0x46/0x398 Dec 11 14:43:02 Svalbard kernel: kswapd+0x2f0/0x333 Dec 11 14:43:02 Svalbard kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Dec 11 14:43:02 Svalbard kernel: ? balance_pgdat+0x6a2/0x6a2 Dec 11 14:43:02 Svalbard kernel: kthread+0xe4/0xef Dec 11 14:43:02 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 11 14:43:02 Svalbard kernel: ret_from_fork+0x1f/0x30 Dec 11 14:43:02 Svalbard kernel: </TASK> Dec 11 14:43:02 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 11 14:54:15 Svalbard kernel: vetha28b7f0: renamed from eth0 Dec 11 14:54:17 Svalbard kernel: eth0: renamed from vethb9c7ce8 Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 5 using xhci_hcd Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM. Dec 11 15:10:47 Svalbard emhttpd: read SMART /dev/sdc The system has been up for 6.5 hours but not Frigate: The crash happened a minute or two after a detection, and the crash happened, and then a few minutes later Frigate restarted. What *really* interesting is that this time the parity sync didn't stop....normally it is immediately dead. That will probably still happen, but for now it is still hanging in there.... Next time it dies I will remove one of the frigate drives to put into the test machine (so I can run it there too), reset the this problem machine's BIOS settings all back to defaults (except for some boot options - since the settings changes have all made no difference), and then I might upgrade to v7 for the newer kernel...
December 11, 20241 yr Author So a fun new problem today.....it's always nice when things mix it up a little...lets you focus on something else for a bit! This morning I get up, sign in and see what's what. This is new: Locked CPU cores. Oh...and docker is running, but all the containers bar one are stopped, and the docker tab opens but nothing loads - you just get the unraid 'wave', and Apps doesn't load at all. Can get to tools and see the log - looks like we had a crash when AppBackup was running: Dec 12 00:01:28 Svalbard kernel: veth7196d3b: renamed from eth0 Dec 12 00:01:28 Svalbard kernel: veth3a161a9: renamed from eth0 Dec 12 00:01:28 Svalbard kernel: vethe140ebc: renamed from eth0 Dec 12 00:01:31 Svalbard kernel: vethdcf506a: renamed from eth0 Dec 12 00:02:08 Svalbard kernel: veth4b3afd2: renamed from eth0 Dec 12 00:02:08 Svalbard kernel: veth6aec7d4: renamed from eth0 Dec 12 00:02:19 Svalbard kernel: veth56d4257: renamed from eth0 Dec 12 00:02:22 Svalbard kernel: vethbdb3af6: renamed from eth0 Dec 12 00:02:30 Svalbard kernel: vethf7bebb7: renamed from eth0 Dec 12 00:02:31 Svalbard kernel: veth6493f9c: renamed from eth0 Dec 12 00:02:41 Svalbard kernel: veth2a00935: renamed from eth0 Dec 12 00:02:45 Svalbard kernel: vethd666ffc: renamed from eth0 Dec 12 00:02:48 Svalbard kernel: veth2a06c9e: renamed from eth0 Dec 12 00:04:05 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020 Dec 12 00:04:05 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 12 00:04:05 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 12 00:04:05 Svalbard kernel: PGD 0 P4D 0 Dec 12 00:04:05 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Dec 12 00:04:05 Svalbard kernel: CPU: 14 PID: 1346 Comm: arc_evict Tainted: P O 6.1.118-Unraid #1 Dec 12 00:04:05 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31 Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286 Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000 Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280 Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Dec 12 00:04:05 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000 Dec 12 00:04:05 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000000420a000 CR4: 0000000000750ee0 Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554 Dec 12 00:04:05 Svalbard kernel: Call Trace: Dec 12 00:04:05 Svalbard kernel: <TASK> Dec 12 00:04:05 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 12 00:04:05 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 12 00:04:05 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 12 00:04:05 Svalbard kernel: ? common_interrupt+0xb7/0xd0 Dec 12 00:04:05 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 12 00:04:05 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x19/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: arc_change_state.constprop.0+0x195/0x347 [zfs] Dec 12 00:04:05 Svalbard kernel: arc_evict_state+0x30d/0x701 [zfs] Dec 12 00:04:05 Svalbard kernel: ? random_get_pseudo_bytes+0xc4/0xf8 [spl] Dec 12 00:04:05 Svalbard kernel: arc_evict_cb+0x424/0x564 [zfs] Dec 12 00:04:05 Svalbard kernel: ? _raw_spin_unlock_irq+0x1a/0x2f Dec 12 00:04:05 Svalbard kernel: ? sigprocmask+0x6e/0x8e Dec 12 00:04:05 Svalbard kernel: zthr_procedure+0x89/0x12c [zfs] Dec 12 00:04:05 Svalbard kernel: ? zrl_is_locked+0x15/0x15 [zfs] Dec 12 00:04:05 Svalbard kernel: ? __thread_exit+0x13/0x13 [spl] Dec 12 00:04:05 Svalbard kernel: thread_generic_wrapper+0x57/0x65 [spl] Dec 12 00:04:05 Svalbard kernel: kthread+0xe4/0xef Dec 12 00:04:05 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 12 00:04:05 Svalbard kernel: ret_from_fork+0x1f/0x30 Dec 12 00:04:05 Svalbard kernel: </TASK> Dec 12 00:04:05 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) coretemp zzstd(O) iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper btusb btrtl zavl(PO) btbcm drm_kms_helper icp(PO) btintel kvm bluetooth crct10dif_pclmul crc32_pclmul drm crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) Dec 12 00:04:05 Svalbard kernel: crypto_simd cryptd ecdh_generic spl(O) rapl mei_pxp mei_hdcp gigabyte_wmi wmi_bmof ecc intel_cstate mpt3sas i2c_i801 intel_gtt nvme intel_uncore agpgart i2c_smbus mei_me i2c_algo_bit nvme_core ahci i2c_core mei libahci raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 Dec 12 00:04:05 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31 Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286 Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000 Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280 Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Dec 12 00:04:05 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000 Dec 12 00:04:05 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000015ebfa000 CR4: 0000000000750ee0 Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554 Dec 12 00:04:05 Svalbard kernel: note: arc_evict[1346] exited with irqs disabled Dec 12 00:32:50 Svalbard emhttpd: spinning down /dev/sdd Dec 12 01:03:25 Svalbard emhttpd: spinning down /dev/sde Dec 12 01:03:27 Svalbard emhttpd: read SMART /dev/sde As to the CPU locking, top returns: top - 06:58:28 up 9:15, 0 users, load average: 19.02, 18.76, 17.97 Tasks: 636 total, 1 running, 635 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.6 us, 0.7 sy, 0.0 ni, 78.1 id, 17.6 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64082.5 total, 569.5 free, 10164.9 used, 53348.1 buff/cache MiB Swap: 2048.0 total, 2014.0 free, 34.0 used. 53033.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20777 root 20 0 1240368 18272 8404 S 100.0 0.0 402:23.88 unpackerr 12200 root 20 0 0 0 0 S 5.6 0.0 43:57.54 unraidd0 7527 root 20 0 0 0 0 D 1.7 0.0 12:30.74 mdrecoveryd 14632 root 20 0 4559768 92920 38868 S 0.3 0.1 0:30.29 dockerd 25476 root 20 0 96460 17032 9936 S 0.3 0.0 0:00.02 php-fpm 25498 root 20 0 95916 14456 7904 S 0.3 0.0 0:00.01 php-fpm 27345 root 20 0 95916 14844 8288 S 0.3 0.0 0:00.02 php-fpm 28084 root 20 0 95996 30948 24156 S 0.3 0.0 0:00.15 update_3 1 root 20 0 2592 1808 1688 S 0.0 0.0 0:01.06 init 2 root 20 0 0 0 0 S 0.0 0.0 0:01.43 kthreadd Which is odd for a couple of reasons - firstly the there's only one thread holding the CPU high so why are the multiple cores high. So I kill the process: Ending process 20777... Checking... Process 20777 could not be gently killed... will use SIGKILL... Process 20777 ()... Success... And my reward is that the CPU stays high: CPU10 is released, but now CPU2 has gone high. And top still thinks there's nothing happening: top - 07:04:08 up 9:21, 0 users, load average: 18.17, 18.70, 18.23 Tasks: 636 total, 1 running, 635 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.4 sy, 0.0 ni, 78.2 id, 21.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64082.5 total, 561.8 free, 10171.1 used, 53349.6 buff/cache MiB Swap: 2048.0 total, 2014.0 free, 34.0 used. 53027.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12200 root 20 0 0 0 0 S 7.3 0.0 44:19.24 unraidd0 7527 root 20 0 0 0 0 D 2.0 0.0 12:36.75 mdrecoveryd 4060 root 20 0 6204 2188 1880 S 0.3 0.0 0:02.50 blazer_usb 7505 root 20 0 279852 5080 4360 S 0.3 0.0 1:11.19 emhttpd 28084 root 20 0 95996 30948 24156 S 0.3 0.0 0:00.81 update_3 1 root 20 0 2592 1808 1688 S 0.0 0.0 0:01.06 init 2 root 20 0 0 0 0 S 0.0 0.0 0:01.43 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_flushwq Do we trust top or do we trust the dashboard? Personally I'm with top - so then given that the dashboard is updating why is it lying? Then, on top of this docker has curled up it's toes and died: And the logs having nothing to say: Dec 12 06:53:14 Svalbard webGUI: Successful login user root from 192.168.2.101 Dec 12 07:00:01 Svalbard Plugin Auto Update: Checking for available plugin updates Dec 12 07:00:09 Svalbard Plugin Auto Update: Community Applications Plugin Auto Update finished Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777 But the parity check, now at 29%, is still running....which would normally have packed up and gone home by now. What to do? OK....let's go to Settings > Docker > Disable...then we can restart docker...nope....just the unraid wave again. The staus bar says "Services starting...." but it never ends. Now Docker is set to "n" but it's status is still "Running". The logs say: Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777 Dec 12 07:07:29 Svalbard ool www[17934]: /usr/local/emhttp/plugins/dynamix/scripts/emcmd 'cmdStatus=Apply' Dec 12 07:07:29 Svalbard emhttpd: Starting services... Dec 12 07:07:29 Svalbard emhttpd: shcmd (27301): /etc/rc.d/rc.samba restart Dec 12 07:07:29 Svalbard winbindd[14469]: [2024/12/12 07:07:29.359923, 0] ../../source3/winbindd/winbindd_dual.c:1964(winbindd_sig_term_handler) Dec 12 07:07:29 Svalbard winbindd[14469]: Got sig[15] terminate (is_parent=1) Dec 12 07:07:29 Svalbard wsdd2[14466]: 'Terminated' signal received. Dec 12 07:07:29 Svalbard wsdd2[14466]: terminating. Dec 12 07:07:31 Svalbard root: Starting Samba: /usr/sbin/smbd -D Dec 12 07:07:31 Svalbard root: /usr/sbin/wsdd2 -d -4 Dec 12 07:07:31 Svalbard wsdd2[21498]: starting. Dec 12 07:07:31 Svalbard root: /usr/sbin/winbindd -D Dec 12 07:07:31 Svalbard emhttpd: shcmd (27306): /etc/rc.d/rc.avahidaemon restart Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD Daemon: stopped Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Got SIGTERM, quitting. Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2. Dec 12 07:07:31 Svalbard avahi-dnsconfd[14547]: read(): EOF Dec 12 07:07:31 Svalbard avahi-daemon[14538]: avahi-daemon 0.8 exiting. Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214). Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped root privileges. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: avahi-daemon 0.8 starting up. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully called chroot(). Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped remaining capabilities. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/sftp-ssh.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/smb.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/ssh.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: New relevant interface eth0.IPv4 for mDNS. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Network interface enumeration completed. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Registering new address record for 192.168.6.2 on eth0.IPv4. Dec 12 07:07:31 Svalbard emhttpd: shcmd (27307): /etc/rc.d/rc.avahidnsconfd restart Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon: /usr/sbin/avahi-dnsconfd -D Dec 12 07:07:31 Svalbard avahi-dnsconfd[21580]: Successfully connected to Avahi daemon. Dec 12 07:07:32 Svalbard emhttpd: shcmd (27312): /etc/rc.d/rc.docker stop Dec 12 07:07:32 Svalbard avahi-daemon[21571]: Server startup complete. Host name is Svalbard.local. Local service cookie is 3045481838. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/ssh.service) successfully established. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/smb.service) successfully established. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/sftp-ssh.service) successfully established. So it definitely did something.... While we wait lets go to AppBackup to see if there's anything in its log....Settings > AppBackup yields a blank page but after a few minutes it comes up. Check the log, and it still thinks it is running...so hit stop and copy the log: [12.12.2024 00:00:02][ℹ️][Main] 👋 WELCOME TO APPDATA.BACKUP!! :D [12.12.2024 00:00:03][ℹ️][Main] Backing up from: /mnt/user/appdata, /mnt/cache/appdata [12.12.2024 00:00:03][ℹ️][Main] Backing up to: /mnt/scratch/archive_unraid/appdata_backups/ab_20241212_000003 [12.12.2024 00:00:03][ℹ️][Main] Selected containers: Calibre, Fenrus, Jellyfin, Mealie, Overseerr, PhotoPrism, Plex, PostgreSQL15, Starr, TubeSync, frigate-1, syncthing, tautulli [12.12.2024 00:00:03][ℹ️][Main] Saving container XML files... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Bazarr' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Calibre' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Czkawka' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Fenrus' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Flaresolverr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'frigate-1' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellyfin' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellystat' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'lidarr' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Mealie' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Overseerr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PhotoPrism' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Plex' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PostgreSQL15' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Prowlarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'qBittorrent' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Radarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Readarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'SABnzbd' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Sonarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'syncthing' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'tautulli' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'TubeSync' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Unpackerr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Method: Stop all container before continuing. [12.12.2024 00:01:24][ℹ️][Calibre] Stopping Calibre... done! (took 4 seconds) [12.12.2024 00:01:28][ℹ️][Fenrus] Stopping Fenrus... done! (took 0 seconds) [12.12.2024 00:01:28][ℹ️][PhotoPrism] Stopping PhotoPrism... done! (took 0 seconds) [12.12.2024 00:01:28][ℹ️][TubeSync] Stopping TubeSync... done! (took 3 seconds) [12.12.2024 00:01:31][ℹ️][Starr][Bazarr] Stopping Bazarr... done! (took 10 seconds) [12.12.2024 00:01:41][ℹ️][Starr][Readarr] Stopping Readarr... done! (took 2 seconds) [12.12.2024 00:01:43][ℹ️][Starr][lidarr] Stopping lidarr... done! (took 4 seconds) [12.12.2024 00:01:47][ℹ️][Starr][Radarr] Stopping Radarr... done! (took 5 seconds) [12.12.2024 00:01:52][ℹ️][Starr][Sonarr] Stopping Sonarr... done! (took 3 seconds) [12.12.2024 00:01:55][ℹ️][Starr][Prowlarr] Stopping Prowlarr... done! (took 4 seconds) [12.12.2024 00:01:59][ℹ️][Starr][qBittorrent] Stopping qBittorrent... done! (took 6 seconds) [12.12.2024 00:02:05][ℹ️][Starr][SABnzbd] Stopping SABnzbd... done! (took 3 seconds) [12.12.2024 00:02:08][ℹ️][Starr][GluetunVPN] Stopping GluetunVPN... done! (took 0 seconds) [12.12.2024 00:02:08][ℹ️][PostgreSQL15] Stopping PostgreSQL15... done! (took 0 seconds) [12.12.2024 00:02:08][ℹ️][Jellyfin] Stopping Jellyfin... done! (took 11 seconds) [12.12.2024 00:02:19][ℹ️][tautulli] Stopping tautulli... done! (took 3 seconds) [12.12.2024 00:02:22][ℹ️][Plex] Stopping Plex... done! (took 8 seconds) [12.12.2024 00:02:30][ℹ️][Mealie] Stopping Mealie... done! (took 1 seconds) [12.12.2024 00:02:31][ℹ️][frigate-1] Stopping frigate-pembroke... done! (took 10 seconds) [12.12.2024 00:02:41][ℹ️][syncthing] Stopping syncthing... done! (took 4 seconds) [12.12.2024 00:02:45][ℹ️][Overseerr] Stopping Overseerr... done! (took 3 seconds) [12.12.2024 00:02:48][ℹ️][Main] Starting backup for containers [12.12.2024 00:02:48][ℹ️][Calibre] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:02:48][ℹ️][Calibre] Calculated volumes to back up: /mnt/user/appdata/calibre [12.12.2024 00:02:48][ℹ️][Calibre] Backing up Calibre... [12.12.2024 00:02:49][ℹ️][Calibre] Backup created without issues (took 00:00:01 (hours:mins:secs)) [12.12.2024 00:02:49][ℹ️][Calibre] Verifying backup... [12.12.2024 00:02:49][ℹ️][Calibre] Verification ended without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:02:49][ℹ️][Calibre] Installing planned update for Calibre... [12.12.2024 00:03:04][ℹ️][Fenrus] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:03:04][ℹ️][Fenrus] Calculated volumes to back up: /mnt/user/appdata/fenrus/data [12.12.2024 00:03:04][ℹ️][Fenrus] Backing up Fenrus... [12.12.2024 00:03:04][ℹ️][Fenrus] Backup created without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:03:04][ℹ️][Fenrus] Verifying backup... [12.12.2024 00:03:04][ℹ️][Fenrus] Verification ended without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:03:04][ℹ️][PhotoPrism] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:03:04][ℹ️][PhotoPrism] Calculated volumes to back up: /mnt/user/appdata/photoprism [12.12.2024 00:03:04][ℹ️][PhotoPrism] Backing up PhotoPrism... [12.12.2024 07:23:01][❌][PhotoPrism] tar creation failed! Tar said: But....spoiler alert.....it still thinks its running. Jump back to console: root@Svalbard:~# docker container list CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0ca070369ca0 golift/unpackerr "/unpackerr" 4 days ago Up 10 hours Unpackerr d96d888e2e34 flaresolverr/flaresolverr "/usr/bin/dumb-init …" 4 days ago Up 10 hours Flaresolverr 32a5d349b413 cyfershepard/jellystat "docker-entrypoint.s…" 4 days ago Up 10 hours (healthy) Jellystat But no amount of kill commands will make anything stop. So we have zombies. And the only solution for that is a reboot....so that's now the only way out from here. In the meantime though on the dashboard all of a sudden the docker listing is back, with everything stopped and three 'unknown' containers (presumably the ones I tried to kill). The system is also super slow....things appear but it takes a super long time... Interestingly the parity check is still running, the read speeds are still updating, but the progress is stuck at 4.65TB / 29.1% - so that also isn't right. That should have progressed by now.... And the CPU keeps locking up: It is a sick sick machine. But not in anyway that is possible to meaningfully diagnose....so far.... Can't reboot now either - not from console and not from GUI..... :-(
December 11, 20241 yr There is already a discussion about this and we didn't get anywhere: https://github.com/blakeblackshear/frigate/issues/8470 I managed to stop this hard locks just using a lower resolution feeds of the cameras, so it should be something related with memory usage, but I was not able to find a root cause.
December 11, 20241 yr Community Expert looks like golift/unpackerr may need some cpu pinning as it tried to use all cores and hung. maybe asign 4 cpu to that docker. Edited December 11, 20241 yr by bmartino1
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.