Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

ChirpyTurnip

Members
  • Joined

  • Last visited

Everything posted by ChirpyTurnip

  1. Mine updated and now all my folders in the Docker view are gone. They still show in the dashboard view, and they are there in the settings, but on the docker tab there is only a legacy view. Clicking "Add Folder" does nothing. Have clear cache and reloaded (not restarted Unraid) but no joy.... :-( Also have the problem of many dialogue messages / buttons have no visible text. It looks very promising....just going to be some teething pain. ;-)
  2. So, it has been a minute. Where am I at? After building a new machine based on a Core Ultra CPU I'm happy to report that everything is back on one machine, two copies of frigate, all of the *-arrs (with Gluetun), and so on and so forth. Absolutely stable, as it should be, on both 6x and 7x. Currently uptime is nearly 21 days, no issues at all! The old machine that was so very crash prone is now my desktop PC (on which I write now) and it too is rock solid. My ultimate suspicion? That the microcode changes in the BIOS for the core i7 bug made the Linux OS unstable....but running Windows no issues. With new HW Unraid is also happy.... So sorted FINALLY I think!
  3. Same here. In System Dynamix hitting "Detect" returns nothing....but running "sensors" from the console gives me this: it8689-isa-0a40 Adapter: ISA adapter in0: 1.44 V (min = +0.00 V, max = +3.06 V) in1: 1.99 V (min = +0.00 V, max = +3.06 V) in2: 2.02 V (min = +0.00 V, max = +3.06 V) in3: 2.02 V (min = +0.00 V, max = +3.06 V) in4: 1.04 V (min = +0.00 V, max = +3.06 V) in5: 1.13 V (min = +0.00 V, max = +3.06 V) in6: 1.99 V (min = +0.00 V, max = +3.06 V) 3VSB: 3.31 V (min = +0.00 V, max = +6.12 V) Vbat: 3.14 V fan1: 1607 RPM (min = 10 RPM) Array Fan: 1923 RPM (min = 0 RPM) Array Fan: 1415 RPM (min = 0 RPM) temp1: +30.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +48.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp3: +59.0°C (low = +127.0°C, high = +127.0°C) sensor = AMD AMDSI temp4: +32.0°C (low = +0.0°C, high = +127.0°C) sensor = thermistor temp5: +44.0°C (low = +0.0°C, high = -125.0°C) sensor = thermistor temp6: +44.0°C (low = +0.0°C, high = -125.0°C) sensor = thermistor intrusion0: ALARM acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C nvme-pci-0200 Adapter: PCI adapter Composite: +33.9°C (low = -20.1°C, high = +83.8°C) (crit = +88.8°C) Sensor 2: +54.9°C k10temp-pci-00c3 Adapter: PCI adapter Tctl: +59.0°C Tccd1: +57.2°C amdgpu-pci-1000 Adapter: PCI adapter vddgfx: 1.42 V vddnb: 1.02 V edge: +42.0°C PPT: 63.13 W gigabyte_wmi-virtual-0 Adapter: Virtual device temp1: +30.0°C temp2: +48.0°C temp3: +59.0°C temp4: +32.0°C temp5: +44.0°C temp6: +44.0°C nvme-pci-0d00 Adapter: PCI adapter Composite: +38.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +38.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) So it does have k10, and it definitely can return data....but other than that nothing.... Also running 7 RC2, prior to that 6.12x - neither worked. 😞
  4. Yup. Check out my tale of woe here: Kernel Bug Error Nothing has improved for me - but I've pretty much eliminated everything except CPU (previously replaced) and the MB. I now have a new AMD machine (sadly only running a 7600X - so less than half the compute power of the i7, and no GPU transcoding) but this machine can now do all my Frigate work + Arr apps. The CCTV storage is local, but the Arr apps write back to the main Unraid machine via NFS shares. In the new year I plan to convert the failing Unraid server into a WIndows machine (new desktop PC) and then I will build a Core Ultra-based machine as my new main Unraid platform. According to Lime Tech there should be support for Arc GPUs from v7.1, this should be a small hop from 7.0 once that is released so hopefully early next year. If the flakey Unraid machine stays up with with just storage + Plex as loads I will keep it running until 7.1 is available, if it stays really bad then I will move early and the bandwidth impaired will get minimal transcoding. So we will see....personallly I'm leaning towards this being another CPU problem....but it might work fine under WIndows....so we shall see! In the meantime I think I will soon upgrade both Unraid systems as I hear there is a nasty NFS bug - I've not come across this yet, but after a nightmare run of late I don't really need to stumble into another cesspit of despair!
  5. So they were fully isolated and not sharing at all.....each had it's own TPU, own pool, own config folder. And the nice thing about Frigate is that it integrates really well in Home Assistant....Blue Iris is very complex but a bit of a dog when it comes to detection and a big memory hog - which is one of the reasons I move to Frigate, which needed docker, which brought me to Unraid. I have been a bit busy this weekend with very little time to do much. But, I can report that both Frigate instances have been moved to the AMD machine, and for now, both are happy, and very much running as they should be! The AMD machine has nothing else on it yet, but for now it is showing all the right signs. It was a literal transplant of the configuration except for needing to set AMD flags instead of Intel flags - so nothing else is different except that we've moved from Intel to AMD. Certainly it is running the machine harder - the CPU is permanently at 20% (give or take) but then to be fair an AM5 7600 is about half the compute power of an i7 14700K so it's probably not that bad. On the other side the Intel machine is also happier and more stable - without Frigate there is it not crashing either. I've gradually been putting the old config back (vpn_bridge for GlueTun), back to ipvlan, etc just to see if anything unstable comes back, but so far so good. If this turns out to be *the* fix then I'm pretty sure I will never buy another Intel again. Now I just wish I could get the Fan Control stuff working....but I've posted that on their support forum - will control the fans down to 0 rpm, but won't bring them up again, or stop before 0 rpm, even if you set the minimum to 50% / 128. So somethings are going better, but I'm still finding problems where ever I look!
  6. So I have two interesting problems - I have installed it87 and Dynamix System Temperature on a Gigabyte B650M Gaming Plus Wifi board. PROBLEM #1: If I run sensors I get this: root@Skadi:~# sensors amdgpu-pci-1000 Adapter: PCI adapter vddgfx: 1.25 V vddnb: 1.02 V edge: +37.0°C PPT: 29.17 W nvme-pci-0200 Adapter: PCI adapter Composite: +34.9°C (low = -20.1°C, high = +83.8°C) (crit = +88.8°C) Sensor 2: +45.9°C k10temp-pci-00c3 Adapter: PCI adapter Tctl: +39.6°C Tccd1: +30.0°C it8689-isa-0a40 Adapter: ISA adapter in0: 780.00 mV (min = +0.00 V, max = +3.06 V) in1: 1.99 V (min = +0.00 V, max = +3.06 V) in2: 2.02 V (min = +0.00 V, max = +3.06 V) in3: 2.03 V (min = +0.00 V, max = +3.06 V) in4: 1.03 V (min = +0.00 V, max = +3.06 V) in5: 1.12 V (min = +0.00 V, max = +3.06 V) in6: 1.99 V (min = +0.00 V, max = +3.06 V) 3VSB: 3.31 V (min = +0.00 V, max = +6.12 V) Vbat: 3.12 V fan1: 1352 RPM (min = 10 RPM) Array Fan: 1268 RPM (min = 0 RPM) Array Fan: 1013 RPM (min = 0 RPM) temp1: +31.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +54.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp3: +39.0°C (low = +127.0°C, high = +127.0°C) sensor = AMD AMDSI temp4: +34.0°C (low = +0.0°C, high = +127.0°C) sensor = thermistor temp5: +37.0°C (low = +0.0°C, high = -125.0°C) sensor = thermistor temp6: +39.0°C (low = +0.0°C, high = -125.0°C) sensor = thermistor intrusion0: ALARM acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C (crit = +20.8°C) nvme-pci-0d00 Adapter: PCI adapter Composite: +39.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +39.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +44.9°C (low = -273.1°C, high = +65261.8°C) But Dynamix System Temp returns nada: I would have expected it to detect something since sensors definitely returns a pile of data! What am I missing? PROBLEM #2: I had a go at Dynamix Fan Control: Set BIOS fan mode to PWM Set Fan speed to Full Set Fan Stop = Disabled Set Fan Alarm = Disabled In system Temp (as per above) it detects the CPU fan and the two case fans. I can click Detect on PWM2 and PWM3 and it finds the fans, and turns then down to 0. And then nothing....but fat nothing. Fans will not come back to any sort of speed. I have tried setting a minimum PWM of 50 or 128 - it will always spin the fan down and then stop. So, it definitely can control, but it doesn't succeed in restarting the fan, or reducing it down to a lower speed (like it ramps down, but it doesn't hold at say 50 or 128). I've tried with Fan Alarm Enabled/Disabled and Fan Stop Enabled/Disabled. The result is always the same. Other interesting points to note: Once the fan drops to zero Dynamic System Temp loses the two array fans (it shows only CPU). Dynamix Fan Controller will continue to try and control fans (spin up to some value like 63%), but this always fails. In addition it actually starts (unsuccessfully) trying to play with PWM1 - the CPU fan. This is the only fan still reporting rpm at this point....but why does it try to control the wrong fan? I get the sense that this should work....but I'm not having a lot of luck!
  7. No joy I'm afraid. The Proxmox script is looking for file paths that just don't exist in a default Debian install. Getting this running may be a lot harder than running a script. And even then, there is no guarantee of success.... 😞 I've also been reading through the other thread(s) specifically on Frigate. There is a suspicion that it might be something at the kernel or CPU level....on my side it was rock solid until it wasn't....and really aside from some networking changes (which we now know don't make the problem better or worse), more/less docker containers (which don't seem to have any affect), and changes to disk / addition of an HBA (also ruled out), and more plugins (also ruled out as I crash in safe mode), the *only* other things that have changed are my old CPU died and got replaced (but I was stable after that for a month), and the adoption of the latest Intel microcode through a BIOS update. This problem has been so perplexing - but I am glad we can now firmly park it on Frigate's interaction with the host - and I'm grateful to have been pointed to the other thread as it makes it clear this is a random 'me' thing. It may yet come to pass that it runs fine on a AM5 7000 series CPU, and just not on an Intel Core i7 system. If Frigate-2 stays up I'm going to try moving Frigate-1 to the AMD host as well. It is only a matter of moving over one HDD, so not a major effort. The put the burn on that machine I've just kicked off a parity check - it doesn't need to finish, but if it runs for the day 18+ hours without any issues that might point to better stability, because the other one won't go for more than a few hours without collapsing. After that there's also the other detection modes to try - there are reports in the other thread some combinations of CPU/GPU/OpenVimo/Yolo8 work better than others...
  8. I'm going to make a start on this now. I can report that using the CPU/GPU and openvino as the detector made things much worse (unless I did something wrong). Within a few minutes I could log into Unraid via GUI or SSH but could not get anywhere beyond authentication (no GUI, no console prompt). Also, before it burned to the ground again it crashed and many of the docker containers were tagged as 'unhealthy'. So the Frigate container is now off and everything is (so far) calm, On the new AMD-based machine the Frigate Docker is still running with no issues (four cameras, versus six). But it will go when the other instance goes....no point having non-standard configurations. The only downside of any VM is that you have another host to maintain, patch etc.... I'll get working and loop back if I manage to stick the landing (or not). 🙂
  9. Just saw your linux-based VM...that might indeed be a better option! Something to look at after work....
  10. So on v7 it still crashes. I've unplugged the TPU, so will try CPU detection now.... I don't really want to run a low resolution camera feed as that means no clear images (which is the point). I don't really want to go back to Blue Iris either where it is running on a Windows VM, or worse yet, natively on Windows - but we might yet end up there....
  11. Well dang! I have memory to burn, and feeds are not super high resolution (except for a 180 camera). Moving to CPU/GPU detection might be the answer....that's a pity...the coral.ai unit really lowered the CPU%.
  12. Pinned. And upgraded to Unraid v7RC1 now.
  13. So a fun new problem today.....it's always nice when things mix it up a little...lets you focus on something else for a bit! This morning I get up, sign in and see what's what. This is new: Locked CPU cores. Oh...and docker is running, but all the containers bar one are stopped, and the docker tab opens but nothing loads - you just get the unraid 'wave', and Apps doesn't load at all. Can get to tools and see the log - looks like we had a crash when AppBackup was running: Dec 12 00:01:28 Svalbard kernel: veth7196d3b: renamed from eth0 Dec 12 00:01:28 Svalbard kernel: veth3a161a9: renamed from eth0 Dec 12 00:01:28 Svalbard kernel: vethe140ebc: renamed from eth0 Dec 12 00:01:31 Svalbard kernel: vethdcf506a: renamed from eth0 Dec 12 00:02:08 Svalbard kernel: veth4b3afd2: renamed from eth0 Dec 12 00:02:08 Svalbard kernel: veth6aec7d4: renamed from eth0 Dec 12 00:02:19 Svalbard kernel: veth56d4257: renamed from eth0 Dec 12 00:02:22 Svalbard kernel: vethbdb3af6: renamed from eth0 Dec 12 00:02:30 Svalbard kernel: vethf7bebb7: renamed from eth0 Dec 12 00:02:31 Svalbard kernel: veth6493f9c: renamed from eth0 Dec 12 00:02:41 Svalbard kernel: veth2a00935: renamed from eth0 Dec 12 00:02:45 Svalbard kernel: vethd666ffc: renamed from eth0 Dec 12 00:02:48 Svalbard kernel: veth2a06c9e: renamed from eth0 Dec 12 00:04:05 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020 Dec 12 00:04:05 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 12 00:04:05 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 12 00:04:05 Svalbard kernel: PGD 0 P4D 0 Dec 12 00:04:05 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Dec 12 00:04:05 Svalbard kernel: CPU: 14 PID: 1346 Comm: arc_evict Tainted: P O 6.1.118-Unraid #1 Dec 12 00:04:05 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31 Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286 Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000 Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280 Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Dec 12 00:04:05 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000 Dec 12 00:04:05 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000000420a000 CR4: 0000000000750ee0 Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554 Dec 12 00:04:05 Svalbard kernel: Call Trace: Dec 12 00:04:05 Svalbard kernel: <TASK> Dec 12 00:04:05 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 12 00:04:05 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 12 00:04:05 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 12 00:04:05 Svalbard kernel: ? common_interrupt+0xb7/0xd0 Dec 12 00:04:05 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 12 00:04:05 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: ? buf_hash_remove+0x19/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: arc_change_state.constprop.0+0x195/0x347 [zfs] Dec 12 00:04:05 Svalbard kernel: arc_evict_state+0x30d/0x701 [zfs] Dec 12 00:04:05 Svalbard kernel: ? random_get_pseudo_bytes+0xc4/0xf8 [spl] Dec 12 00:04:05 Svalbard kernel: arc_evict_cb+0x424/0x564 [zfs] Dec 12 00:04:05 Svalbard kernel: ? _raw_spin_unlock_irq+0x1a/0x2f Dec 12 00:04:05 Svalbard kernel: ? sigprocmask+0x6e/0x8e Dec 12 00:04:05 Svalbard kernel: zthr_procedure+0x89/0x12c [zfs] Dec 12 00:04:05 Svalbard kernel: ? zrl_is_locked+0x15/0x15 [zfs] Dec 12 00:04:05 Svalbard kernel: ? __thread_exit+0x13/0x13 [spl] Dec 12 00:04:05 Svalbard kernel: thread_generic_wrapper+0x57/0x65 [spl] Dec 12 00:04:05 Svalbard kernel: kthread+0xe4/0xef Dec 12 00:04:05 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 12 00:04:05 Svalbard kernel: ret_from_fork+0x1f/0x30 Dec 12 00:04:05 Svalbard kernel: </TASK> Dec 12 00:04:05 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) coretemp zzstd(O) iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper btusb btrtl zavl(PO) btbcm drm_kms_helper icp(PO) btintel kvm bluetooth crct10dif_pclmul crc32_pclmul drm crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) Dec 12 00:04:05 Svalbard kernel: crypto_simd cryptd ecdh_generic spl(O) rapl mei_pxp mei_hdcp gigabyte_wmi wmi_bmof ecc intel_cstate mpt3sas i2c_i801 intel_gtt nvme intel_uncore agpgart i2c_smbus mei_me i2c_algo_bit nvme_core ahci i2c_core mei libahci raid_class syscopyarea scsi_transport_sas sysfillrect sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 Dec 12 00:04:05 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 12 00:04:05 Svalbard kernel: RIP: 0010:buf_hash_remove+0x2b/0x83 [zfs] Dec 12 00:04:05 Svalbard kernel: Code: 1f 44 00 00 53 48 89 fb 48 8b 57 10 48 8b 7f 30 48 89 de e8 5b fe ff ff 48 8b 15 d7 07 22 00 48 23 05 c8 07 22 00 48 8d 14 c2 <48> 8b 0a 48 39 cb 74 06 48 8d 51 20 eb f2 48 8b 4b 20 48 89 0a 31 Dec 12 00:04:05 Svalbard kernel: RSP: 0018:ffffc90000cdfd48 EFLAGS: 00010286 Dec 12 00:04:05 Svalbard kernel: RAX: 000000000009c334 RBX: ffff8881809f5400 RCX: 0000000000000000 Dec 12 00:04:05 Svalbard kernel: RDX: 0000000000000020 RSI: ce70e04a9d9425bb RDI: 2b1292b03e72bbdb Dec 12 00:04:05 Svalbard kernel: RBP: ffffffffa110e400 R08: 9ae16a3b2f90408f R09: 9ae16a3b2f90404f Dec 12 00:04:05 Svalbard kernel: R10: ffff8881611dc140 R11: 0000000000032d40 R12: ffffffffa110e280 Dec 12 00:04:05 Svalbard kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Dec 12 00:04:05 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f780000(0000) knlGS:0000000000000000 Dec 12 00:04:05 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 12 00:04:05 Svalbard kernel: CR2: 0000000000000020 CR3: 000000015ebfa000 CR4: 0000000000750ee0 Dec 12 00:04:05 Svalbard kernel: PKRU: 55555554 Dec 12 00:04:05 Svalbard kernel: note: arc_evict[1346] exited with irqs disabled Dec 12 00:32:50 Svalbard emhttpd: spinning down /dev/sdd Dec 12 01:03:25 Svalbard emhttpd: spinning down /dev/sde Dec 12 01:03:27 Svalbard emhttpd: read SMART /dev/sde As to the CPU locking, top returns: top - 06:58:28 up 9:15, 0 users, load average: 19.02, 18.76, 17.97 Tasks: 636 total, 1 running, 635 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.6 us, 0.7 sy, 0.0 ni, 78.1 id, 17.6 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64082.5 total, 569.5 free, 10164.9 used, 53348.1 buff/cache MiB Swap: 2048.0 total, 2014.0 free, 34.0 used. 53033.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20777 root 20 0 1240368 18272 8404 S 100.0 0.0 402:23.88 unpackerr 12200 root 20 0 0 0 0 S 5.6 0.0 43:57.54 unraidd0 7527 root 20 0 0 0 0 D 1.7 0.0 12:30.74 mdrecoveryd 14632 root 20 0 4559768 92920 38868 S 0.3 0.1 0:30.29 dockerd 25476 root 20 0 96460 17032 9936 S 0.3 0.0 0:00.02 php-fpm 25498 root 20 0 95916 14456 7904 S 0.3 0.0 0:00.01 php-fpm 27345 root 20 0 95916 14844 8288 S 0.3 0.0 0:00.02 php-fpm 28084 root 20 0 95996 30948 24156 S 0.3 0.0 0:00.15 update_3 1 root 20 0 2592 1808 1688 S 0.0 0.0 0:01.06 init 2 root 20 0 0 0 0 S 0.0 0.0 0:01.43 kthreadd Which is odd for a couple of reasons - firstly the there's only one thread holding the CPU high so why are the multiple cores high. So I kill the process: Ending process 20777... Checking... Process 20777 could not be gently killed... will use SIGKILL... Process 20777 ()... Success... And my reward is that the CPU stays high: CPU10 is released, but now CPU2 has gone high. And top still thinks there's nothing happening: top - 07:04:08 up 9:21, 0 users, load average: 18.17, 18.70, 18.23 Tasks: 636 total, 1 running, 635 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.4 sy, 0.0 ni, 78.2 id, 21.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64082.5 total, 561.8 free, 10171.1 used, 53349.6 buff/cache MiB Swap: 2048.0 total, 2014.0 free, 34.0 used. 53027.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12200 root 20 0 0 0 0 S 7.3 0.0 44:19.24 unraidd0 7527 root 20 0 0 0 0 D 2.0 0.0 12:36.75 mdrecoveryd 4060 root 20 0 6204 2188 1880 S 0.3 0.0 0:02.50 blazer_usb 7505 root 20 0 279852 5080 4360 S 0.3 0.0 1:11.19 emhttpd 28084 root 20 0 95996 30948 24156 S 0.3 0.0 0:00.81 update_3 1 root 20 0 2592 1808 1688 S 0.0 0.0 0:01.06 init 2 root 20 0 0 0 0 S 0.0 0.0 0:01.43 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 slub_flushwq Do we trust top or do we trust the dashboard? Personally I'm with top - so then given that the dashboard is updating why is it lying? Then, on top of this docker has curled up it's toes and died: And the logs having nothing to say: Dec 12 06:53:14 Svalbard webGUI: Successful login user root from 192.168.2.101 Dec 12 07:00:01 Svalbard Plugin Auto Update: Checking for available plugin updates Dec 12 07:00:09 Svalbard Plugin Auto Update: Community Applications Plugin Auto Update finished Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777 But the parity check, now at 29%, is still running....which would normally have packed up and gone home by now. What to do? OK....let's go to Settings > Docker > Disable...then we can restart docker...nope....just the unraid wave again. The staus bar says "Services starting...." but it never ends. Now Docker is set to "n" but it's status is still "Running". The logs say: Dec 12 07:01:14 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/open.files/scripts/killprocess 20777 Dec 12 07:07:29 Svalbard ool www[17934]: /usr/local/emhttp/plugins/dynamix/scripts/emcmd 'cmdStatus=Apply' Dec 12 07:07:29 Svalbard emhttpd: Starting services... Dec 12 07:07:29 Svalbard emhttpd: shcmd (27301): /etc/rc.d/rc.samba restart Dec 12 07:07:29 Svalbard winbindd[14469]: [2024/12/12 07:07:29.359923, 0] ../../source3/winbindd/winbindd_dual.c:1964(winbindd_sig_term_handler) Dec 12 07:07:29 Svalbard winbindd[14469]: Got sig[15] terminate (is_parent=1) Dec 12 07:07:29 Svalbard wsdd2[14466]: 'Terminated' signal received. Dec 12 07:07:29 Svalbard wsdd2[14466]: terminating. Dec 12 07:07:31 Svalbard root: Starting Samba: /usr/sbin/smbd -D Dec 12 07:07:31 Svalbard root: /usr/sbin/wsdd2 -d -4 Dec 12 07:07:31 Svalbard wsdd2[21498]: starting. Dec 12 07:07:31 Svalbard root: /usr/sbin/winbindd -D Dec 12 07:07:31 Svalbard emhttpd: shcmd (27306): /etc/rc.d/rc.avahidaemon restart Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD Daemon: stopped Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Got SIGTERM, quitting. Dec 12 07:07:31 Svalbard avahi-daemon[14538]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2. Dec 12 07:07:31 Svalbard avahi-dnsconfd[14547]: read(): EOF Dec 12 07:07:31 Svalbard avahi-daemon[14538]: avahi-daemon 0.8 exiting. Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD Daemon: /usr/sbin/avahi-daemon -D Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Found user 'avahi' (UID 61) and group 'avahi' (GID 214). Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped root privileges. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: avahi-daemon 0.8 starting up. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully called chroot(). Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Successfully dropped remaining capabilities. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/sftp-ssh.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/smb.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Loading service file /services/ssh.service. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.6.2. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: New relevant interface eth0.IPv4 for mDNS. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Network interface enumeration completed. Dec 12 07:07:31 Svalbard avahi-daemon[21571]: Registering new address record for 192.168.6.2 on eth0.IPv4. Dec 12 07:07:31 Svalbard emhttpd: shcmd (27307): /etc/rc.d/rc.avahidnsconfd restart Dec 12 07:07:31 Svalbard root: Stopping Avahi mDNS/DNS-SD DNS Server Configuration Daemon: stopped Dec 12 07:07:31 Svalbard root: Starting Avahi mDNS/DNS-SD DNS Server Configuration Daemon: /usr/sbin/avahi-dnsconfd -D Dec 12 07:07:31 Svalbard avahi-dnsconfd[21580]: Successfully connected to Avahi daemon. Dec 12 07:07:32 Svalbard emhttpd: shcmd (27312): /etc/rc.d/rc.docker stop Dec 12 07:07:32 Svalbard avahi-daemon[21571]: Server startup complete. Host name is Svalbard.local. Local service cookie is 3045481838. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/ssh.service) successfully established. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/smb.service) successfully established. Dec 12 07:07:33 Svalbard avahi-daemon[21571]: Service "Svalbard" (/services/sftp-ssh.service) successfully established. So it definitely did something.... While we wait lets go to AppBackup to see if there's anything in its log....Settings > AppBackup yields a blank page but after a few minutes it comes up. Check the log, and it still thinks it is running...so hit stop and copy the log: [12.12.2024 00:00:02][ℹ️][Main] 👋 WELCOME TO APPDATA.BACKUP!! :D [12.12.2024 00:00:03][ℹ️][Main] Backing up from: /mnt/user/appdata, /mnt/cache/appdata [12.12.2024 00:00:03][ℹ️][Main] Backing up to: /mnt/scratch/archive_unraid/appdata_backups/ab_20241212_000003 [12.12.2024 00:00:03][ℹ️][Main] Selected containers: Calibre, Fenrus, Jellyfin, Mealie, Overseerr, PhotoPrism, Plex, PostgreSQL15, Starr, TubeSync, frigate-1, syncthing, tautulli [12.12.2024 00:00:03][ℹ️][Main] Saving container XML files... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Bazarr' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Calibre' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Czkawka' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Fenrus' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Flaresolverr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'frigate-1' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellyfin' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Jellystat' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'lidarr' is enabled and update is available! Schedule update after backup... [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Mealie' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Overseerr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PhotoPrism' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Plex' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'PostgreSQL15' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Prowlarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'qBittorrent' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Radarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Readarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'SABnzbd' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Sonarr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'syncthing' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'tautulli' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'TubeSync' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Auto-Update for 'Unpackerr' is enabled but no update is available. [12.12.2024 00:01:24][ℹ️][Main] Method: Stop all container before continuing. [12.12.2024 00:01:24][ℹ️][Calibre] Stopping Calibre... done! (took 4 seconds) [12.12.2024 00:01:28][ℹ️][Fenrus] Stopping Fenrus... done! (took 0 seconds) [12.12.2024 00:01:28][ℹ️][PhotoPrism] Stopping PhotoPrism... done! (took 0 seconds) [12.12.2024 00:01:28][ℹ️][TubeSync] Stopping TubeSync... done! (took 3 seconds) [12.12.2024 00:01:31][ℹ️][Starr][Bazarr] Stopping Bazarr... done! (took 10 seconds) [12.12.2024 00:01:41][ℹ️][Starr][Readarr] Stopping Readarr... done! (took 2 seconds) [12.12.2024 00:01:43][ℹ️][Starr][lidarr] Stopping lidarr... done! (took 4 seconds) [12.12.2024 00:01:47][ℹ️][Starr][Radarr] Stopping Radarr... done! (took 5 seconds) [12.12.2024 00:01:52][ℹ️][Starr][Sonarr] Stopping Sonarr... done! (took 3 seconds) [12.12.2024 00:01:55][ℹ️][Starr][Prowlarr] Stopping Prowlarr... done! (took 4 seconds) [12.12.2024 00:01:59][ℹ️][Starr][qBittorrent] Stopping qBittorrent... done! (took 6 seconds) [12.12.2024 00:02:05][ℹ️][Starr][SABnzbd] Stopping SABnzbd... done! (took 3 seconds) [12.12.2024 00:02:08][ℹ️][Starr][GluetunVPN] Stopping GluetunVPN... done! (took 0 seconds) [12.12.2024 00:02:08][ℹ️][PostgreSQL15] Stopping PostgreSQL15... done! (took 0 seconds) [12.12.2024 00:02:08][ℹ️][Jellyfin] Stopping Jellyfin... done! (took 11 seconds) [12.12.2024 00:02:19][ℹ️][tautulli] Stopping tautulli... done! (took 3 seconds) [12.12.2024 00:02:22][ℹ️][Plex] Stopping Plex... done! (took 8 seconds) [12.12.2024 00:02:30][ℹ️][Mealie] Stopping Mealie... done! (took 1 seconds) [12.12.2024 00:02:31][ℹ️][frigate-1] Stopping frigate-pembroke... done! (took 10 seconds) [12.12.2024 00:02:41][ℹ️][syncthing] Stopping syncthing... done! (took 4 seconds) [12.12.2024 00:02:45][ℹ️][Overseerr] Stopping Overseerr... done! (took 3 seconds) [12.12.2024 00:02:48][ℹ️][Main] Starting backup for containers [12.12.2024 00:02:48][ℹ️][Calibre] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:02:48][ℹ️][Calibre] Calculated volumes to back up: /mnt/user/appdata/calibre [12.12.2024 00:02:48][ℹ️][Calibre] Backing up Calibre... [12.12.2024 00:02:49][ℹ️][Calibre] Backup created without issues (took 00:00:01 (hours:mins:secs)) [12.12.2024 00:02:49][ℹ️][Calibre] Verifying backup... [12.12.2024 00:02:49][ℹ️][Calibre] Verification ended without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:02:49][ℹ️][Calibre] Installing planned update for Calibre... [12.12.2024 00:03:04][ℹ️][Fenrus] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:03:04][ℹ️][Fenrus] Calculated volumes to back up: /mnt/user/appdata/fenrus/data [12.12.2024 00:03:04][ℹ️][Fenrus] Backing up Fenrus... [12.12.2024 00:03:04][ℹ️][Fenrus] Backup created without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:03:04][ℹ️][Fenrus] Verifying backup... [12.12.2024 00:03:04][ℹ️][Fenrus] Verification ended without issues (took 00:00:00 (hours:mins:secs)) [12.12.2024 00:03:04][ℹ️][PhotoPrism] Should NOT backup external volumes, sanitizing them... [12.12.2024 00:03:04][ℹ️][PhotoPrism] Calculated volumes to back up: /mnt/user/appdata/photoprism [12.12.2024 00:03:04][ℹ️][PhotoPrism] Backing up PhotoPrism... [12.12.2024 07:23:01][❌][PhotoPrism] tar creation failed! Tar said: But....spoiler alert.....it still thinks its running. Jump back to console: root@Svalbard:~# docker container list CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0ca070369ca0 golift/unpackerr "/unpackerr" 4 days ago Up 10 hours Unpackerr d96d888e2e34 flaresolverr/flaresolverr "/usr/bin/dumb-init …" 4 days ago Up 10 hours Flaresolverr 32a5d349b413 cyfershepard/jellystat "docker-entrypoint.s…" 4 days ago Up 10 hours (healthy) Jellystat But no amount of kill commands will make anything stop. So we have zombies. And the only solution for that is a reboot....so that's now the only way out from here. In the meantime though on the dashboard all of a sudden the docker listing is back, with everything stopped and three 'unknown' containers (presumably the ones I tried to kill). The system is also super slow....things appear but it takes a super long time... Interestingly the parity check is still running, the read speeds are still updating, but the progress is stuck at 4.65TB / 29.1% - so that also isn't right. That should have progressed by now.... And the CPU keeps locking up: It is a sick sick machine. But not in anyway that is possible to meaningfully diagnose....so far.... Can't reboot now either - not from console and not from GUI..... :-(
  14. Definitely getting closer..... Just had another crash, but it was non-fatal: Dec 11 13:05:19 Svalbard emhttpd: read SMART /dev/sdc Dec 11 13:54:53 Svalbard emhttpd: spinning down /dev/sdc Dec 11 14:42:52 Svalbard kernel: ------------[ cut here ]------------ Dec 11 14:42:52 Svalbard kernel: WARNING: CPU: 12 PID: 32398 at fs/dcache.c:430 retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc Dec 11 14:42:52 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 11 14:42:52 Svalbard kernel: CPU: 12 PID: 32398 Comm: lsof Tainted: P O 6.1.118-Unraid #1 Dec 11 14:42:52 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 11 14:42:52 Svalbard kernel: RIP: 0010:retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: Code: 74 18 eb e9 48 8b 43 60 48 89 df 48 8b 40 20 ff d0 0f 1f 00 85 c0 74 e4 eb d3 ff 4b 5c 0f ba e0 13 72 49 a9 00 04 08 00 74 02 <0f> 0b 0d 00 00 08 00 89 03 65 48 ff 05 fe ea dc 7e f7 03 00 00 70 Dec 11 14:42:52 Svalbard kernel: RSP: 0018:ffffc9002debbd98 EFLAGS: 00010206 Dec 11 14:42:52 Svalbard kernel: RAX: 0000000000600c00 RBX: ffff88841b174900 RCX: 0000000000000064 Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88841b174900 Dec 11 14:42:52 Svalbard kernel: RBP: ffffc9002debbe65 R08: 00000000009461d4 R09: 000000000000000a Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Dec 11 14:42:52 Svalbard kernel: R13: ffffffff812b1d42 R14: ffff88841b174900 R15: ffff88841b174900 Dec 11 14:42:52 Svalbard kernel: FS: 00001540446a8e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 11 14:42:52 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 11 14:42:52 Svalbard kernel: CR2: 00005590cfeb8000 CR3: 00000004b5f4e000 CR4: 0000000000750ee0 Dec 11 14:42:52 Svalbard kernel: PKRU: 55555554 Dec 11 14:42:52 Svalbard kernel: Call Trace: Dec 11 14:42:52 Svalbard kernel: <TASK> Dec 11 14:42:52 Svalbard kernel: ? __warn+0xab/0x122 Dec 11 14:42:52 Svalbard kernel: ? report_bug+0x109/0x17e Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: ? handle_bug+0x41/0x6f Dec 11 14:42:52 Svalbard kernel: ? exc_invalid_op+0x13/0x60 Dec 11 14:42:52 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20 Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d Dec 11 14:42:52 Svalbard kernel: ? retain_dentry+0x52/0xa5 Dec 11 14:42:52 Svalbard kernel: dput+0x41/0x17b Dec 11 14:42:52 Svalbard kernel: proc_fill_cache+0x110/0x156 Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a Dec 11 14:42:52 Svalbard kernel: proc_readfd_common+0x16b/0x1bc Dec 11 14:42:52 Svalbard kernel: ? tid_fd_update_inode+0x4d/0x4d Dec 11 14:42:52 Svalbard kernel: iterate_dir+0x94/0x149 Dec 11 14:42:52 Svalbard kernel: __do_sys_getdents64+0x6b/0xd8 Dec 11 14:42:52 Svalbard kernel: ? compat_filldir+0x17a/0x17a Dec 11 14:42:52 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 11 14:42:52 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 11 14:42:52 Svalbard kernel: RIP: 0033:0x154044908283 Dec 11 14:42:52 Svalbard kernel: Code: 89 df e8 20 05 fb ff 48 83 c4 08 48 89 e8 5b 5d c3 66 0f 1f 44 00 00 b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 61 0b 11 00 f7 d8 Dec 11 14:42:52 Svalbard kernel: RSP: 002b:00007ffde2feab28 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 Dec 11 14:42:52 Svalbard kernel: RAX: ffffffffffffffda RBX: 00000000004c8c80 RCX: 0000154044908283 Dec 11 14:42:52 Svalbard kernel: RDX: 0000000000008000 RSI: 00000000004c8cb0 RDI: 0000000000000004 Dec 11 14:42:52 Svalbard kernel: RBP: 00000000004c8c84 R08: 0000154044a1a2d0 R09: 0000154044a1a2d0 Dec 11 14:42:52 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000293 R12: ffffffffffffff88 Dec 11 14:42:52 Svalbard kernel: R13: 0000000000000002 R14: 0000000000433dd0 R15: 0000154044a9b000 Dec 11 14:42:52 Svalbard kernel: </TASK> Dec 11 14:42:52 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 11 14:43:02 Svalbard kernel: ------------[ cut here ]------------ Dec 11 14:43:02 Svalbard kernel: WARNING: CPU: 10 PID: 314 at fs/dcache.c:472 dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs 8021q garp mrp stp llc macvtap macvlan tap igb r8169 realtek zfs(PO) i915 zunicode(PO) zzstd(O) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp iosf_mbi drm_buddy kvm_intel ttm zlua(O) drm_display_helper zavl(PO) icp(PO) drm_kms_helper btusb btrtl btbcm kvm btintel bluetooth crct10dif_pclmul drm crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel zcommon(PO) crypto_simd cryptd znvpair(PO) rapl ecdh_generic spl(O) ecc Dec 11 14:43:02 Svalbard kernel: mei_hdcp mei_pxp intel_gtt intel_cstate gigabyte_wmi wmi_bmof mpt3sas agpgart i2c_algo_bit i2c_i801 nvme intel_uncore mei_me i2c_smbus ahci i2c_core nvme_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix [last unloaded: igb] Dec 11 14:43:02 Svalbard kernel: CPU: 10 PID: 314 Comm: kswapd0 Tainted: P W O 6.1.118-Unraid #1 Dec 11 14:43:02 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 11 14:43:02 Svalbard kernel: RIP: 0010:dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: Code: ef e8 d3 d1 62 00 89 c2 b8 03 00 00 00 85 d2 74 7b 83 7b dc 00 8b 43 80 74 40 89 c2 81 e2 00 04 08 00 81 fa 00 00 08 00 74 02 <0f> 0b 25 ff ff f7 ff 89 43 80 65 48 ff 0d 34 e8 dc 7e f7 43 80 00 Dec 11 14:43:02 Svalbard kernel: RSP: 0018:ffffc90000cd7ab8 EFLAGS: 00010206 Dec 11 14:43:02 Svalbard kernel: RAX: 0000000000888c40 RBX: ffff88841b174980 RCX: ffffc90000cd7b78 Dec 11 14:43:02 Svalbard kernel: RDX: 0000000000080400 RSI: ffff888109889888 RDI: ffff88841b174958 Dec 11 14:43:02 Svalbard kernel: RBP: ffff88841b174958 R08: 0000000000000000 R09: 0000000000000014 Dec 11 14:43:02 Svalbard kernel: R10: ffff888106454380 R11: ffff888145095240 R12: ffff888109889888 Dec 11 14:43:02 Svalbard kernel: R13: ffffc90000cd7b78 R14: ffffffff8125cee6 R15: ffff88841b174980 Dec 11 14:43:02 Svalbard kernel: FS: 0000000000000000(0000) GS:ffff88907f280000(0000) knlGS:0000000000000000 Dec 11 14:43:02 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 11 14:43:02 Svalbard kernel: CR2: 00000000004d8098 CR3: 000000000420a000 CR4: 0000000000750ee0 Dec 11 14:43:02 Svalbard kernel: PKRU: 55555554 Dec 11 14:43:02 Svalbard kernel: Call Trace: Dec 11 14:43:02 Svalbard kernel: <TASK> Dec 11 14:43:02 Svalbard kernel: ? __warn+0xab/0x122 Dec 11 14:43:02 Svalbard kernel: ? report_bug+0x109/0x17e Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: ? handle_bug+0x41/0x6f Dec 11 14:43:02 Svalbard kernel: ? exc_invalid_op+0x13/0x60 Dec 11 14:43:02 Svalbard kernel: ? asm_exc_invalid_op+0x16/0x20 Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38 Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x44/0xb1 Dec 11 14:43:02 Svalbard kernel: ? dentry_lru_isolate+0x20/0xb1 Dec 11 14:43:02 Svalbard kernel: __list_lru_walk_one+0x90/0x123 Dec 11 14:43:02 Svalbard kernel: list_lru_walk_one+0x60/0x7d Dec 11 14:43:02 Svalbard kernel: ? d_lru_shrink_move+0x38/0x38 Dec 11 14:43:02 Svalbard kernel: prune_dcache_sb+0x46/0x73 Dec 11 14:43:02 Svalbard kernel: super_cache_scan+0xf4/0x17c Dec 11 14:43:02 Svalbard kernel: do_shrink_slab+0x188/0x2a1 Dec 11 14:43:02 Svalbard kernel: shrink_slab+0x1f9/0x267 Dec 11 14:43:02 Svalbard kernel: shrink_node+0x334/0x588 Dec 11 14:43:02 Svalbard kernel: balance_pgdat+0x4e9/0x6a2 Dec 11 14:43:02 Svalbard kernel: ? update_cfs_rq_load_avg+0x176/0x189 Dec 11 14:43:02 Svalbard kernel: ? update_load_avg+0x46/0x398 Dec 11 14:43:02 Svalbard kernel: kswapd+0x2f0/0x333 Dec 11 14:43:02 Svalbard kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20 Dec 11 14:43:02 Svalbard kernel: ? balance_pgdat+0x6a2/0x6a2 Dec 11 14:43:02 Svalbard kernel: kthread+0xe4/0xef Dec 11 14:43:02 Svalbard kernel: ? kthread_complete_and_exit+0x1b/0x1b Dec 11 14:43:02 Svalbard kernel: ret_from_fork+0x1f/0x30 Dec 11 14:43:02 Svalbard kernel: </TASK> Dec 11 14:43:02 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 11 14:54:15 Svalbard kernel: vetha28b7f0: renamed from eth0 Dec 11 14:54:17 Svalbard kernel: eth0: renamed from vethb9c7ce8 Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 5 using xhci_hcd Dec 11 14:54:35 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM. Dec 11 15:10:47 Svalbard emhttpd: read SMART /dev/sdc The system has been up for 6.5 hours but not Frigate: The crash happened a minute or two after a detection, and the crash happened, and then a few minutes later Frigate restarted. What *really* interesting is that this time the parity sync didn't stop....normally it is immediately dead. That will probably still happen, but for now it is still hanging in there.... Next time it dies I will remove one of the frigate drives to put into the test machine (so I can run it there too), reset the this problem machine's BIOS settings all back to defaults (except for some boot options - since the settings changes have all made no difference), and then I might upgrade to v7 for the newer kernel...
  15. And.....we're dead. A hard crash this time. Completely fell off the network again. The last syslog entry offers a potential clue: Dec 11 08:25:08 Svalbard kernel: veth6c58b38: renamed from eth0 Dec 11 08:25:10 Svalbard kernel: eth0: renamed from veth970b4eb Anyway...where to from here: Power saving in the BIOS is disabled (as best I can tell) I'm already running usbcore.autosuspend=-1 as a boot option, so that's not a fix The Unraid flash is on a USB2 bus, so separate from USB3 (where the TPU is). Aside from the UPS, the (now single) TPU, and the boot flash there's no other USB devices. I've tried the TPU mapping as both /dev/bus/usb and also as /dev/bus/usb/002/004 but the bus numbering keeps changing so it's not a static setting. ls /dev/serial/by-id returns nothing as there is no /dev/serial path lsusb -t returns: /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/9p, 20000M/x2 |__ Port 8: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 9: Dev 3, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 1: Dev 5, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M |__ Port 5: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 3: Dev 5, If 0, Class=Mass Storage, Driver=usb-storage, 480M |__ Port 9: Dev 4, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 10: Dev 6, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 4: Dev 8, If 0, Class=Human Interface Device, Driver=usbfs, 1.5M |__ Port 11: Dev 7, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 14: Dev 9, If 0, Class=Wireless, Driver=btusb, 12M |__ Port 14: Dev 9, If 1, Class=Wireless, Driver=btusb, 12M lsusb returns: Bus 002 Device 005: ID 18d1:9302 Google Inc. Bus 002 Device 003: ID 0bda:0411 Realtek Semiconductor Corp. Hub Bus 002 Device 002: ID 0bda:0411 Realtek Semiconductor Corp. Hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 004: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub Bus 001 Device 005: ID 0951:1666 Kingston Technology DataTraveler 100 G3/G4/SE9 G2/50 Bus 001 Device 003: ID 05e3:0608 Genesys Logic, Inc. Hub Bus 001 Device 002: ID 05e3:0608 Genesys Logic, Inc. Hub Bus 001 Device 009: ID 8087:0033 Intel Corp. Bus 001 Device 007: ID 048d:5702 Integrated Technology Express, Inc. ITE Device Bus 001 Device 008: ID 0665:5161 Cypress Semiconductor USB to Serial Bus 001 Device 006: ID 0bda:5411 Realtek Semiconductor Corp. RTS5411 Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub The lsusb -v of the Google devices returns (with no serial number): Bus 002 Device 005: ID 18d1:9302 Google Inc. Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.10 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x18d1 Google Inc. idProduct 0x9302 bcdDevice 1.00 iManufacturer 0 iProduct 0 iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0060 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 896mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 6 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 255 Vendor Specific Subclass bInterfaceProtocol 255 Vendor Specific Protocol iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x03 EP 3 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 15 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0040 1x 64 bytes bInterval 1 bMaxBurst 0 Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 0x0016 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000002 HIRD Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000c Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 2 Lowest fully-functional device speed is High Speed (480Mbps) bU1DevExitLat 0 micro seconds bU2DevExitLat 0 micro seconds Device Status: 0x0000 (Bus Powered) Not sure if that's useful or not..... In the meantime running again, and waiting for the next crash. If it happens again I'm pulling the TPU out completely and running on CPU-based detection. Interestingly when I did that on the other instance yesterday I *still* had to runb it as privileged as it wouldn't connect to the cameras without that....unexpected I think as I though privileged was only for the TPU - however it is possible that it is also needed for the Intel GPU access....
  16. It died as well - I saw it when I woke up int he middle of the night. So I removed one of the frigate instances, left the coral.ai units plugged in, shutdown it down (again didn't actually power down), forced it off, and powered it back on. So now running with just one instance of Frigate, as privileged, with TPU detection enabled, and now we wait again.... I did see now in reviewing the logs some odd USB behaviour during the boot (look for usb 2-9.1): Dec 11 02:28:41 Svalbard kernel: IPMI message handler: version 39.2 Dec 11 02:28:41 Svalbard kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled Dec 11 02:28:41 Svalbard kernel: Freeing initrd memory: 30324K Dec 11 02:28:41 Svalbard kernel: lp: driver loaded but no devices found Dec 11 02:28:41 Svalbard kernel: hpet_acpi_add: no address or irqs in _CRS Dec 11 02:28:41 Svalbard kernel: Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds). Dec 11 02:28:41 Svalbard kernel: AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug. Dec 11 02:28:41 Svalbard kernel: Floppy drive(s): fd1 is 1.2M Dec 11 02:28:41 Svalbard kernel: loop: module loaded Dec 11 02:28:41 Svalbard kernel: Rounding down aligned max_sectors from 4294967295 to 4294967288 Dec 11 02:28:41 Svalbard kernel: db_root: cannot open: /etc/target Dec 11 02:28:41 Svalbard kernel: VFIO - User Level meta-driver version: 0.3 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: xHCI Host Controller Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Host supports USB 3.2 Enhanced SuperSpeed Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-0:1.0: 16 ports detected Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-0:1.0: 9 ports detected Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usb-storage Dec 11 02:28:41 Svalbard kernel: i8042: PNP: No PS/2 controller found. Dec 11 02:28:41 Svalbard kernel: mousedev: PS/2 mouse device common for all mice Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver synaptics_usb Dec 11 02:28:41 Svalbard kernel: input: PC Speaker as /devices/platform/pcspkr/input/input0 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: RTC can wake from S4 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: registered as rtc0 Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: setting system clock to 2024-12-10T13:27:52 UTC (1733837272) Dec 11 02:28:41 Svalbard kernel: rtc_cmos rtc_cmos: alarms up to one month, y3k, 114 bytes nvram Dec 11 02:28:41 Svalbard kernel: intel_pstate: Intel P-state driver initializing Dec 11 02:28:41 Svalbard kernel: intel_pstate: HWP enabled Dec 11 02:28:41 Svalbard kernel: pstore: Registered efi as persistent store backend Dec 11 02:28:41 Svalbard kernel: hid: raw HID events driver (C) Jiri Kosina Dec 11 02:28:41 Svalbard kernel: usbcore: registered new interface driver usbhid Dec 11 02:28:41 Svalbard kernel: usbhid: USB HID core driver Dec 11 02:28:41 Svalbard kernel: ipip: IPv4 and MPLS over IPv4 tunneling driver Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_INET6 protocol family Dec 11 02:28:41 Svalbard kernel: Segment Routing with IPv6 Dec 11 02:28:41 Svalbard kernel: RPL Segment Routing with IPv6 Dec 11 02:28:41 Svalbard kernel: In-situ OAM (IOAM) with IPv6 Dec 11 02:28:41 Svalbard kernel: 9pnet: Installing 9P2000 support Dec 11 02:28:41 Svalbard kernel: microcode: sig=0xb0671, pf=0x2, revision=0x12b Dec 11 02:28:41 Svalbard kernel: microcode: Microcode Update Driver: v2.2. Dec 11 02:28:41 Svalbard kernel: IPI shorthand broadcast: enabled Dec 11 02:28:41 Svalbard kernel: sched_clock: Marking stable (2528000652, 6582841)->(2556612702, -22029209) Dec 11 02:28:41 Svalbard kernel: registered taskstats version 1 Dec 11 02:28:41 Svalbard kernel: Btrfs loaded, crc32c=crc32c-generic, zoned=no, fsverity=no Dec 11 02:28:41 Svalbard kernel: pstore: Using crash dump compression: deflate Dec 11 02:28:41 Svalbard kernel: clk: Disabling unused clocks Dec 11 02:28:41 Svalbard kernel: usb 1-5: new high-speed USB device number 2 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-5:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 2-8: new SuperSpeed USB device number 2 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-8:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6: new high-speed USB device number 3 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-6:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 2-9: new SuperSpeed USB device number 3 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 2-9:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-9: new high-speed USB device number 4 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-9:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6.1: new low-speed USB device number 5 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:0665:5161.0001: hiddev96,hidraw0: USB HID v1.00 Device [INNO TECH USB to Serial] on usb-0000:00:14.0-6.1/input0 Dec 11 02:28:41 Svalbard kernel: floppy0: no floppy controllers found Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (initmem) memory: 1884K Dec 11 02:28:41 Svalbard kernel: Write protecting the kernel read-only data: 18432k Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (text/rodata gap) memory: 2040K Dec 11 02:28:41 Svalbard kernel: Freeing unused kernel image (rodata/data gap) memory: 140K Dec 11 02:28:41 Svalbard kernel: rodata_test: all tests were successful Dec 11 02:28:41 Svalbard kernel: Run /init as init process Dec 11 02:28:41 Svalbard kernel: with arguments: Dec 11 02:28:41 Svalbard kernel: /init Dec 11 02:28:41 Svalbard kernel: with environment: Dec 11 02:28:41 Svalbard kernel: HOME=/ Dec 11 02:28:41 Svalbard kernel: TERM=linux Dec 11 02:28:41 Svalbard kernel: BOOT_IMAGE=/bzimage Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 4, error -62 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 5, error -62 Dec 11 02:28:41 Svalbard kernel: usb 2-9-port1: attempt power cycle Dec 11 02:28:41 Svalbard kernel: usb 1-10: new high-speed USB device number 6 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: USB hub found Dec 11 02:28:41 Svalbard kernel: hub 1-10:1.0: 4 ports detected Dec 11 02:28:41 Svalbard kernel: usb 1-6.3: new high-speed USB device number 7 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: usb-storage 1-6.3:1.0: USB Mass Storage device detected Dec 11 02:28:41 Svalbard kernel: scsi host0: usb-storage 1-6.3:1.0 Dec 11 02:28:41 Svalbard kernel: usb 1-11: new full-speed USB device number 8 using xhci_hcd Dec 11 02:28:41 Svalbard kernel: hid-generic 0003:048D:5702.0002: hiddev97,hidraw1: USB HID v1.12 Device [ITE Tech. Inc. ITE Device] on usb-0000:00:14.0-11/input0 Dec 11 02:28:41 Svalbard kernel: scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PMAP PQ: 0 ANSI: 6 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] 121110528 512-byte logical blocks: (62.0 GB/57.8 GiB) Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write Protect is off Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Mode Sense: 45 00 00 00 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Dec 11 02:28:41 Svalbard kernel: sda: sda1 Dec 11 02:28:41 Svalbard kernel: sd 0:0:0:0: [sda] Attached SCSI removable disk Dec 11 02:28:41 Svalbard kernel: random: crng init done Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: loop0: detected capacity change from 0 to 130016 Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: usb 2-9.1: device not accepting address 6, error -62 Dec 11 02:28:41 Svalbard kernel: loop1: detected capacity change from 0 to 713824 Dec 11 02:28:41 Svalbard kernel: NET: Registered PF_UNIX/PF_LOCAL protocol family Dec 11 02:28:41 Svalbard kernel: xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command Dec 11 02:28:41 Svalbard kernel: input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input1 Dec 11 02:28:41 Svalbard kernel: ACPI: button: Sleep Button [SLPB] Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2 Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRB] Dec 11 02:28:41 Svalbard kernel: input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3 Dec 11 02:28:41 Svalbard kernel: intel_pmc_core INT33A1:00: initialized Dec 11 02:28:41 Svalbard kernel: ACPI: button: Power Button [PWRF] Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: version 3.0 Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode Dec 11 02:28:41 Svalbard kernel: ahci 0000:00:17.0: flags: 64bit ncq sntf led clo only pio slum part ems deso sadm sds Dec 11 02:28:41 Svalbard kernel: mei_me 0000:00:16.0: enabling device (0000 -> 0002) The full boot log is attached.... With these errors however something has gone wrong with one of the coral.ai devices as now there is only one "google" device showing: So it seems one of the units definitely rejected the address supplied and stayed off line. If I unplug the device that isn't being used and plug it back in it shows up in the system device list now as a completely different device: If I plug in into another port I also get the same weird result: So for now I've left it unplugged. Parity check is on 16%, which is again higher than it has managed in the last day.... And now. We. Wait. Again. unraid_boot.log
  17. Righto. So one crash later and a new plan....fooled around for a while to see if I could pin the coral devices, but no. Might be possible, but definitely too much hassle. So the instance with the fewest cameras has had the coral removed and must now use CPU-based detection. So that leaves me no shared USB device as *only* one instance will be using it. And now we wait....hopefully a long time....I'll be a bit disappointed if it dies again with an hour or so....
  18. Hmmmm....the lsblck gives me nothing - just disk mounts.....but the ls on the /dev/bus/usb yields the same result on both containers: # cd /dev/bus/usb # ls 001 002 If both claim it then it seems that sharing isn't possible after all? Or is some other way of preventing containers from helping themselves? A bit more digging - the 001 and 002 are the two usb busses....inside each there are more 001, 002, 003 etc.... so this isn't definitive - it seems the container can see everything. I've pondered using the docker compose manager approach....but interesting when the machine rebooted after this afternoon's crash the USB addresses have actually moved even though the USB devices have *not* been moved to different ports. Compare this to the previous screenshot: So assigning a specific address is a bust as they keep moving.
  19. Yup - two different names for the templates - like frigate-baker and frigate-jones. Here is my two TPU instances: On the templates I'm using the default /dev/bus/usb notation for both. I can update this /dev/bus/usb/002/004 and 005 respectively in the docker configuration and it will start fine. In theory I suppose this hard codes the TPU (based on USB location) to the Frigate instance (so also no more changing ports when unplugging / replugging the devices). I cannot set this in the application configuration as the system fails to start successfully. The default in any case is also just USB...the usb:0 and usb:1 I believe are to keep them separate (somehow). I have the latest template (I think) as only really started building this in August when this was only just released - certainly I can't see any obvious differences so probably OK. [Also, which of your donate options results in the most cash actually getting to you?]
  20. Yes...the crash happens with the TPU connected and used (with Frigate not running there is no crash, but the TPUs are still connected)....but there are two TPUs and the Frigate instances just take one each..... Even if that was a cause....how does that then relate to a Unraid kernel crash? I can see how that might cause a problem in the dockers if they were fighting for it. I followed a guide (from somewhere) to set it up. In the first instance it is configured like this: detectors: coral: type: edgetpu device: usb:0 In the second the device is set to usb:1. With one Frigate running only one TPU is active (can tell by the flashing light). When a second instance is started the other TPU comes on line and both start flashing. We might be getting somewhere though as this is definitely Frigate related one way or another I think.... I will dig some more, and I can (once the the other box is ready) move a frigate instance onto another platform so there are no longer dual TPUs.....
  21. I am running two copies of the default repository: And I'm pointing to ghcr.io/blakeblackshear/frigate:stable. I'm running coral.ai via USB so I have the containers as privileged - everything works just fine, but during a parity sync if Frigate is running it will crash. I had a hard crash last night (Unraid dropped off the network) and so on restart it started a parity check....which crashed in the usual way at 6.5% progress. I have gradually been re-instating my old settings (second NIC with vLANs) but the crashes are still doing my head in....everything is fine and then bang - unraid kernel bug and we're dead in the water. There's an outside chance that the cause is running a pair of Frigate instances....but I can't see why that would cause parity-check problems when then frigate data paths are nowhere near the array itself. It just makes ZERO sense......
  22. Hi! I've poked around but am struggling to find an answer so I thought I'd ask. In short, everything is working fine, BUT, I have very high writes to the cache drive (bursting 20 - 40 MB/s). If I stop Frigate this stops completely....if I start it then the activity resumes - so I'm 100% certain that Frigate is to blame. I'm running: 2x Frigate instances on Unraid (6.12.14) Each instance of frigate has it's own TPU Each instance has it's own dedicated disk to write to (a single disk pool, formatted btrfs) The appdata is on a mirrored NVME cache pool (formatted zfs) The instances run as privileged The extra parameters are: --shm-size=2048mb --log-opt max-size=50m --log-opt max-file=1 --mount type=tmpfs,target=/tmp,tmpfs-mode=1777,tmpfs-size=2G --restart unless-stopped The config path is: /mnt/user/appdata/frigate_1/ The media path is: /mnt/user/cctv_1/ Aside from appdata there should be NOTHING on the NVME pool as no connected share (except appdata) is on the cache drives or routing via the cache. What can I do to stop this excessive writing? Thanks!
  23. A few days have passed.....it is hard being patient....but everything was going well until this afternoon and then it crashed....I'm pretty certain that it is related to Frigate because everything was fine up and until I restarted that. I'd noticed a lot of cache writes and so I stopped everything until I found the ones responsible for the writes...and just after restarting it crashed: Dec 9 14:14:09 Svalbard kernel: eth0: renamed from veth1ee9a31 Dec 9 14:14:09 Svalbard kernel: python3[7289]: segfault at 1f00000049 ip 0000000000544235 sp 00007ffee9511880 error 4 in python3.9[41f000+288000] likely on CPU 12 (core 24, socket 0) Dec 9 14:14:09 Svalbard kernel: Code: 3d d0 57 8f 00 0f 84 26 01 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 80 00 00 00 00 4c 8b 4f 60 4d 85 c9 0f 84 81 01 00 00 <4f> 8b 2c 01 4c 39 eb 0f 84 e2 00 00 00 48 85 db 74 23 4d 85 ed 75 Dec 9 14:14:14 Svalbard kernel: veth1ee9a31: renamed from eth0 Dec 9 14:14:14 Svalbard kernel: eth0: renamed from vethc744450 Dec 9 14:14:33 Svalbard kernel: usb 2-9.1: reset SuperSpeed USB device number 6 using xhci_hcd Dec 9 14:14:33 Svalbard kernel: usb 2-9.1: LPM exit latency is zeroed, disabling LPM. Dec 9 14:19:13 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038 Dec 9 14:19:13 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 9 14:19:13 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 9 14:19:13 Svalbard kernel: PGD 3c6c80067 P4D 3c6c80067 PUD 3d7958067 PMD 0 Dec 9 14:19:13 Svalbard kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Dec 9 14:19:13 Svalbard kernel: CPU: 12 PID: 26181 Comm: lsof Tainted: P O 6.1.118-Unraid #1 Dec 9 14:19:13 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:19:13 Svalbard kernel: FS: 000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:19:13 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0 Dec 9 14:19:13 Svalbard kernel: PKRU: 55555554 Dec 9 14:19:13 Svalbard kernel: Call Trace: Dec 9 14:19:13 Svalbard kernel: <TASK> Dec 9 14:19:13 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 9 14:19:13 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 9 14:19:13 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 9 14:19:13 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 9 14:19:13 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: ? memcg_slab_free_hook+0x20/0xcf Dec 9 14:19:13 Svalbard kernel: ? kmem_cache_alloc+0x122/0x14d Dec 9 14:19:13 Svalbard kernel: kmem_cache_free+0xb7/0x154 Dec 9 14:19:13 Svalbard kernel: ? user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: user_path_at_empty+0x42/0x4f Dec 9 14:19:13 Svalbard kernel: do_readlinkat+0x61/0x106 Dec 9 14:19:13 Svalbard kernel: __x64_sys_readlink+0x1a/0x21 Dec 9 14:19:13 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 9 14:19:13 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 9 14:19:13 Svalbard kernel: RIP: 0033:0x14a122677197 Dec 9 14:19:13 Svalbard kernel: Code: 73 01 c3 48 8b 0d 81 2c 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 2c 0e 00 f7 d8 64 89 02 48 Dec 9 14:19:13 Svalbard kernel: RSP: 002b:00007ffdd4212428 EFLAGS: 00000206 ORIG_RAX: 0000000000000059 Dec 9 14:19:13 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000014a122677197 Dec 9 14:19:13 Svalbard kernel: RDX: 0000000000001000 RSI: 00007ffdd42124a0 RDI: 0000000000496870 Dec 9 14:19:13 Svalbard kernel: RBP: 00007ffdd4212460 R08: 0000000000000064 R09: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 00007ffdd4215b98 R14: 0000000000433dd0 R15: 000014a1227dc000 Dec 9 14:19:13 Svalbard kernel: </TASK> Dec 9 14:19:13 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 Dec 9 14:19:13 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm] Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 Dec 9 14:19:13 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 9 14:19:13 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:19:13 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:19:13 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:19:13 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:19:13 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:19:13 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:19:13 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:19:13 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:19:13 Svalbard kernel: FS: 000014a1223e9e00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:19:13 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:19:13 Svalbard kernel: CR2: 0000000000000038 CR3: 000000027755a000 CR4: 0000000000750ee0 Dec 9 14:19:13 Svalbard kernel: PKRU: 55555554 Dec 9 14:19:13 Svalbard kernel: note: lsof[26181] exited with irqs disabled Dec 9 14:23:37 Svalbard emhttpd: spinning down /dev/sdh Dec 9 14:25:19 Svalbard emhttpd: spinning down /dev/sdb Dec 9 14:25:40 Svalbard emhttpd: spinning down /dev/sdg Dec 9 14:25:44 Svalbard emhttpd: spinning down /dev/sdj Dec 9 14:29:26 Svalbard kernel: BUG: kernel NULL pointer dereference, address: 0000000000000038 Dec 9 14:29:26 Svalbard kernel: #PF: supervisor read access in kernel mode Dec 9 14:29:26 Svalbard kernel: #PF: error_code(0x0000) - not-present page Dec 9 14:29:26 Svalbard kernel: PGD 2cfb02067 P4D 2cfb02067 PUD 385ec1067 PMD 0 Dec 9 14:29:26 Svalbard kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Dec 9 14:29:26 Svalbard kernel: CPU: 12 PID: 336 Comm: lsof Tainted: P D O 6.1.118-Unraid #1 Dec 9 14:29:26 Svalbard kernel: Hardware name: Gigabyte Technology Co., Ltd. Z790M AORUS ELITE AX/Z790M AORUS ELITE AX, BIOS F10 09/27/2024 Dec 9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:29:26 Svalbard kernel: RSP: 0018:ffffc900259bbdd0 EFLAGS: 00010202 Dec 9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:29:26 Svalbard kernel: RDX: ffffc900259bbe20 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8124c0c2 Dec 9 14:29:26 Svalbard kernel: R10: ffffc900259bbd20 R11: ffffc900259bbe94 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: ffffc900259bbe90 R14: ffffc900259bbe20 R15: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: FS: 00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:29:26 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0 Dec 9 14:29:26 Svalbard kernel: PKRU: 55555554 Dec 9 14:29:26 Svalbard kernel: Call Trace: Dec 9 14:29:26 Svalbard kernel: <TASK> Dec 9 14:29:26 Svalbard kernel: ? __die_body+0x1a/0x5c Dec 9 14:29:26 Svalbard kernel: ? page_fault_oops+0x329/0x376 Dec 9 14:29:26 Svalbard kernel: ? do_user_addr_fault+0x12e/0x465 Dec 9 14:29:26 Svalbard kernel: ? exc_page_fault+0xfb/0x11d Dec 9 14:29:26 Svalbard kernel: ? asm_exc_page_fault+0x22/0x30 Dec 9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: ? memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: kmem_cache_free+0xb7/0x154 Dec 9 14:29:26 Svalbard kernel: ? vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: vfs_fstatat+0x52/0x62 Dec 9 14:29:26 Svalbard kernel: __do_sys_newfstatat+0x26/0x5c Dec 9 14:29:26 Svalbard kernel: do_syscall_64+0x65/0x7b Dec 9 14:29:26 Svalbard kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Dec 9 14:29:26 Svalbard kernel: RIP: 0033:0x1514d1ae71ca Dec 9 14:29:26 Svalbard kernel: Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 19 4c 0e 00 f7 Dec 9 14:29:26 Svalbard kernel: RSP: 002b:00007fff7debeb98 EFLAGS: 00000246 ORIG_RAX: 0000000000000106 Dec 9 14:29:26 Svalbard kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00001514d1ae71ca Dec 9 14:29:26 Svalbard kernel: RDX: 00007fff7debecb0 RSI: 00007fff7debebc0 RDI: 00000000ffffff9c Dec 9 14:29:26 Svalbard kernel: RBP: 00007fff7dec0e10 R08: 0000000000000073 R09: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: 00007fff7dec4548 R14: 0000000000433dd0 R15: 00001514d1c4e000 Dec 9 14:29:26 Svalbard kernel: </TASK> Dec 9 14:29:26 Svalbard kernel: Modules linked in: vhost_net vhost kvm_intel kvm xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha tun nft_compat nf_tables xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter bridge stp llc xfs md_mod tcp_diag inet_diag it87(O) hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs macvtap macvlan tap igb r8169 realtek zfs(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp zunicode(PO) zzstd(O) coretemp iosf_mbi drm_buddy ttm zlua(O) drm_display_helper btusb zavl(PO) icp(PO) drm_kms_helper btrtl btbcm btintel bluetooth drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 Dec 9 14:29:26 Svalbard kernel: sha1_ssse3 zcommon(PO) aesni_intel znvpair(PO) crypto_simd cryptd spl(O) rapl ecdh_generic mei_hdcp mei_pxp gigabyte_wmi wmi_bmof intel_cstate ecc intel_gtt i2c_algo_bit mpt3sas nvme i2c_i801 agpgart intel_uncore i2c_smbus mei_me ahci nvme_core i2c_core mei raid_class libahci scsi_transport_sas syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan tpm_crb video tpm_tis tpm_tis_core wmi tpm intel_pmc_core backlight acpi_pad acpi_tad button unix [last unloaded: kvm] Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 Dec 9 14:29:26 Svalbard kernel: ---[ end trace 0000000000000000 ]--- Dec 9 14:29:26 Svalbard kernel: RIP: 0010:memcg_slab_free_hook+0x28/0xcf Dec 9 14:29:26 Svalbard kernel: Code: cc cc 41 57 41 56 49 89 d6 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 10 89 4c 24 0c e8 60 e1 ff ff 84 c0 0f 84 94 00 00 00 <4c> 8b 65 38 49 83 fc 03 0f 86 86 00 00 00 49 83 e4 fc 45 31 ed 41 Dec 9 14:29:26 Svalbard kernel: RSP: 0018:ffffc90090e5fe28 EFLAGS: 00010202 Dec 9 14:29:26 Svalbard kernel: RAX: 0000000000000001 RBX: ffff8881001dee00 RCX: 0000000000000001 Dec 9 14:29:26 Svalbard kernel: RDX: ffffc90090e5fe78 RSI: 0000000000000000 RDI: ffff8881001dee00 Dec 9 14:29:26 Svalbard kernel: RBP: 0000000000000000 R08: 0000000000004000 R09: ffffffff8125541e Dec 9 14:29:26 Svalbard kernel: R10: 0000000000000000 R11: 0000000000000fe0 R12: 0000000000000000 Dec 9 14:29:26 Svalbard kernel: R13: 0000000000496870 R14: ffffc90090e5fe78 R15: 0000000000000002 Dec 9 14:29:26 Svalbard kernel: FS: 00001514d185be00(0000) GS:ffff88907f300000(0000) knlGS:0000000000000000 Dec 9 14:29:26 Svalbard kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 9 14:29:26 Svalbard kernel: CR2: 0000000000000038 CR3: 0000000303d1c000 CR4: 0000000000750ee0 Dec 9 14:29:26 Svalbard kernel: PKRU: 55555554 Dec 9 14:29:26 Svalbard kernel: note: lsof[336] exited with irqs disabled My second Unraid is now built, the new disks are pre-clearing / testing so soon that will be up and running. In the meantime I have to take a side trip to see if I can figure out why Frigate is writing so much to the cache - it certainly isn't the share - that's pointed to a single disk pool with no caching. So it is some sort of appdata activity.....
  24. Just popped on to say that between work today and other commitments not much progress has been made, but I do now have all the dockers running again except for Frigate (x2). They will come back tomorrow all going well as it is quick and easy to reactivate the containers. I will update my boot parameters tomorrow (or Sunday) - some of these I have already, but there's some really good tweaks in that mix. Apparently there are "plans" for tomorrow which means we will not be home for most of the day so probably will not have a lot of time. So Sunday is a good day to update that, reinstall the additional NIC and relocate the server back to where it belongs. On the USB front I have no USB disks (at all), but the USB messages relate to the coral.ai devices that Frigate uses to do object detection. The "firmware change" message is normal and happens when the device is activated, but the LPM stuff is probably not what we want. Hopefully some of the boot tweaks (or power tweaks) will help to stop some of the errant behaviour. I've also read somewhere that if you have two of them you should put them on separate USB busses.....so I will plug one into the USB3 port and another into a USB 3.2 or USB C port (which should be separate from default USB3). We have now been up over three days with no crash....so I think the hardware is all perfectly fine. If we are still here this time tomorrow I will see what Frigate does.... 🙂
  25. OK....update. The parity completed last night while I was asleep. No errors, no crashes. Docker disk has been deleted and re-created. Now have a xfs imag running on a ZFS pool: I have re-created just the Gluetun and Arr stack based on the previous no-bridge configuration and have given it some work to do. No issues encountered, no crashes. Currently I've triggered move to push 500Gb onto the array from the cache just so I can give the array a bit of a workout with the stack up. I will add back one 'set' of dockers at a time and leave them for least half a day to see if there are problems. I have divided them like this: Arrs - first group, exerts load on array, network, cache. So far so good. Randoms (e.g. mealie, calibre, Twingate, Syncthing, TubeSync) - These add more load, more compute, add extra IP addresses to eth0, but otherwise should be benign. Plex - runs privilieged, does decoding, not really expecting an issue as when we had the crashes I'm 99.99% sure nothing was streaming so it should just have been pretty much idle. Jellyfin (ditto plex) Stats/Helper containers (Tautulli, JellyStat, JellySync, Postgres) - should be benign, just more load Frigate x2 - these were running all the time (as it is a high priority) so could just as easily be a problem cause. These run privileged, access both CPU for decoding and also USB for coral.ai object detection, and write to two separate pool drives (instead of array). The USB corals definitely some weird stuff, but never near the crash, but that doesn't mean there's no link. Nov 27 16:02:11 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: reset SuperSpeed USB device number 5 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: device firmware changed Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: USB disconnect, device number 5 Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: new SuperSpeed USB device number 6 using xhci_hcd Nov 27 16:02:13 Svalbard kernel: usb 2-9.2: LPM exit latency is zeroed, disabling LPM. Might not be material, but it is a little weird. If found a post about this here (right at the bottom) and an Unraid forum article on this here. So that powertop thing is now on my list to do as well just to eliminate another possible issue...unless you think that's a bad idea.... Lastly, I have also tried to install the swap plugin - I only have one btrfs drive (an SSD) as the array is xfs and the swap is zfs. When I try to start the swap file I get this in the logs: Dec 5 18:25:34 Svalbard rc.swapfile[13294]: Plugin configuration written Dec 5 18:26:15 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start Dec 5 18:26:15 Svalbard rc.swapfile[20304]: Creating swap file /mnt/scratch/swapfile please wait ... Dec 5 18:26:19 Svalbard rc.swapfile[20621]: Swap file /mnt/scratch/swapfile created and started Dec 5 18:26:19 Svalbard kernel: BTRFS warning (device sdd1): swapfile must not be copy-on-write Dec 5 18:26:19 Svalbard rc.swapfile[20622]: Setting swappiness to 60 Dec 5 18:26:41 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile updatecfg true true /mnt/scratch swapfile UNRAID-SWAP 2048 60 Dec 5 18:26:42 Svalbard rc.swapfile[23332]: Plugin configuration written Dec 5 18:26:48 Svalbard emhttpd: cmd: /usr/local/emhttp/plugins/swapfile/scripts/rc.swapfile start Dec 5 18:26:48 Svalbard rc.swapfile[24495]: Swap file /mnt/scratch/swapfile is on a BTRFS file system but does not have the No_COW attribute. How now brown cow....no cow? Found your post with the script, ran it, and sorted: Might be coincidence but after setting that up I got my first fault (not crash) in two days: Dec 5 19:47:56 Svalbard kernel: Adding 4194300k swap on /mnt/scratch/swapfile. Priority:-2 extents:11 across:130568864k Dec 5 19:51:23 Svalbard kernel: cgroup: fork rejected by pids controller in /docker/e7b0ac6467266b5fb595bca74d953c400e486175fd98e22fb74df13af3942211 Dec 5 19:54:54 Svalbard kernel: device_list[2339]: segfault at 0 ip 000000000093454b sp 00007ffeeeedd200 error 6 in php[600000+3b3000] likely on CPU 12 (core 24, socket 0) Dec 5 19:54:54 Svalbard kernel: Code: 08 e9 0a ab ff ff e8 14 1b ff ff 41 ff 27 e8 8c 0d fe ff 41 ff 27 e8 14 0c fe ff 41 ff 27 e8 4c 1f ff ff 41 ff 27 49 83 c7 20 <83> 02 01 41 ff 27 e8 ea 17 fb ff e9 65 c7 ff ff e8 e0 17 fb ff e9 Most of my dockers also run with these parameters to create a RAM-base swap file (since I 64GB to burn): --mount type=tmpfs,target=/tmp,tmpfs-mode=1777,tmpfs-size=256M --log-driver none --no-healthcheck Anyway, just shy of 48 hours with no crash. New PC is mostly built, but still waiting on power supply (tomorrow one hopes) and 4 drives from ServerPartDeals (Tue/Wed). For now, still chipping away on the old one....

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.