Jump to content

JorgeB

Moderators
  • Posts

    67,871
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. As mentioned cloning them with ddrescue is probably the best bet to recover the data, but keep the array intact for now, if you can recover a lot of data from one disk and less from another it might be possible to use the clone in the array to rebuild the other one, assuming all the other disks are OK.
  2. Both disks failed, so single parity is not enough to recover, you can use ddrescue to try and recover as much data as possible from both disks.
  3. Memtest boot option only works for CSM boot, it won't with UEFI boot.
  4. Strange that this would help, did you look for a BIOS update for the board?
  5. Try booting Unraid in safe mode to rule out any plugin.
  6. Stop the check, replacement disk can be same size or large than current disk as long as no larger than current parity, if larger than parity you'd need to do a parity swap.
  7. Problem appears to start with a USB controller issue: Oct 11 22:40:33 unraid kernel: xhci_hcd 0000:02:00.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state. Oct 11 22:40:33 unraid kernel: xhci_hcd 0000:02:00.0: WARN Successful completion on short TX Oct 11 22:40:33 unraid kernel: xhci_hcd 0000:02:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 1 Oct 11 22:40:33 unraid kernel: xhci_hcd 0000:02:00.0: Looking for event-dma 0000000106065080 trb-start 0000000106065090 trb-end 0000000106065090 seg-start 0000000106065000 seg-end 0000000106065ff0 This controller: 02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 500 Series Chipset USB 3.1 XHCI Controller [1022:43ee] Subsystem: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142] Kernel driver in use: xhci_hcd
  8. You'll need to update manually.
  9. Try switching to ipvlan (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)).
  10. Please post output of: ls -lh /var/log
  11. Possibly not the only problem, but it's a problem, so start here: Oct 7 10:21:22 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan] Oct 7 10:21:22 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan] Macvlan call traces are usually the result of having dockers with a custom IP address and will end up crashing the server, switching to ipvlan should fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enabled, top right)). Then reboot to clear the logs and post new diags if there are other issues.
  12. Disk3 has a failing now SMART attribute, you need to replace it, reboot to clear the logs, unassign disk3, start the array and post new diags.
  13. This is a device error, you can try: first reboot, then wipe it with blkdiscard -f /dev/nvme0n1 then try again to format
  14. EDIT: issue was traced to libtorrent 2.x, it's not an Unraid problem, more info in this post: https://forums.unraid.net/bug-reports/stable-releases/crashes-since-updating-to-v611x-for-qbittorrent-and-deluge-users-r2153/?do=findComment&comment=21671 Original Post: I'm creating this to better track an issue that some users have been reporting where Unraid started crashing after updating to v6.11.x (it happens with both 6.11.0 and 6.11.1), there's a very similar call traced logged for all cases, e.g: Oct 12 04:18:27 zaBOX kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b6 Oct 12 04:18:27 zaBOX kernel: #PF: supervisor read access in kernel mode Oct 12 04:18:27 zaBOX kernel: #PF: error_code(0x0000) - not-present page Oct 12 04:18:27 zaBOX kernel: PGD 0 P4D 0 Oct 12 04:18:27 zaBOX kernel: Oops: 0000 [#1] PREEMPT SMP PTI Oct 12 04:18:27 zaBOX kernel: CPU: 4 PID: 28596 Comm: Disk Tainted: P U W O 5.19.14-Unraid #1 Oct 12 04:18:27 zaBOX kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS PRO WIFI/Z390 AORUS PRO WIFI-CF, BIOS F12 11/05/2021 Oct 12 04:18:27 zaBOX kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Oct 12 04:18:27 zaBOX kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b c3 cc cc cc cc <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Oct 12 04:18:27 zaBOX kernel: RSP: 0000:ffffc900070dbcc0 EFLAGS: 00010246 Oct 12 04:18:27 zaBOX kernel: RAX: 0000000000000082 RBX: 0000000000000082 RCX: 0000000000000082 Oct 12 04:18:27 zaBOX kernel: RDX: 0000000000000001 RSI: ffff888757426fe8 RDI: 0000000000000082 Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000028 R09: ffffc900070dbcd0 Oct 12 04:18:27 zaBOX kernel: R10: ffffc900070dbcd0 R11: ffffc900070dbd48 R12: 0000000000000000 Oct 12 04:18:27 zaBOX kernel: R13: ffff88824f95d138 R14: 000000000007292c R15: ffff88824f95d140 Oct 12 04:18:27 zaBOX kernel: FS: 000014ed38204b38(0000) GS:ffff8888a0500000(0000) knlGS:0000000000000000 Oct 12 04:18:27 zaBOX kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6 CR3: 0000000209854005 CR4: 00000000003706e0 Oct 12 04:18:27 zaBOX kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 12 04:18:27 zaBOX kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Oct 12 04:18:27 zaBOX kernel: Call Trace: Oct 12 04:18:27 zaBOX kernel: <TASK> Oct 12 04:18:27 zaBOX kernel: __filemap_get_folio+0x98/0x1ff Oct 12 04:18:27 zaBOX kernel: ? _raw_spin_unlock_irqrestore+0x24/0x3a Oct 12 04:18:27 zaBOX kernel: filemap_fault+0x6e/0x524 Oct 12 04:18:27 zaBOX kernel: __do_fault+0x2d/0x6e Oct 12 04:18:27 zaBOX kernel: __handle_mm_fault+0x9a5/0xc7d Oct 12 04:18:27 zaBOX kernel: handle_mm_fault+0x113/0x1d7 Oct 12 04:18:27 zaBOX kernel: do_user_addr_fault+0x36a/0x514 Oct 12 04:18:27 zaBOX kernel: exc_page_fault+0xfc/0x11e Oct 12 04:18:27 zaBOX kernel: asm_exc_page_fault+0x22/0x30 Oct 12 04:18:27 zaBOX kernel: RIP: 0033:0x14ed3a0ae7b5 Oct 12 04:18:27 zaBOX kernel: Code: 8b 48 08 48 8b 32 48 8b 00 48 39 f0 73 09 48 8d 14 08 48 39 d6 eb 0c 48 39 c6 73 0b 48 8d 14 0e 48 39 d0 73 02 0f 0b 48 89 c7 <f3> a4 66 48 8d 3d 59 b7 22 00 66 66 48 e8 d9 d8 f6 ff 48 89 28 48 Oct 12 04:18:27 zaBOX kernel: RSP: 002b:000014ed38203960 EFLAGS: 00010206 Oct 12 04:18:27 zaBOX kernel: RAX: 000014ed371aa160 RBX: 000014ed38203ad0 RCX: 0000000000004000 Oct 12 04:18:27 zaBOX kernel: RDX: 000014c036530000 RSI: 000014c03652c000 RDI: 000014ed371aa160 Oct 12 04:18:27 zaBOX kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000014ed38203778 Oct 12 04:18:27 zaBOX kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000 Oct 12 04:18:27 zaBOX kernel: R13: 000014ed38203b40 R14: 000014ed384fe940 R15: 000014ed38203ac0 Oct 12 04:18:27 zaBOX kernel: </TASK> Oct 12 04:18:27 zaBOX kernel: Modules linked in: macvlan xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net vhost vhost_iotlb tap tun veth xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs md_mod kvmgt mdev i915 iosf_mbi drm_buddy i2c_algo_bit ttm drm_display_helper intel_gtt agpgart hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls ipv6 nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) x86_pkg_temp_thermal intel_powerclamp drm_kms_helper btusb btrtl i2c_i801 btbcm coretemp gigabyte_wmi wmi_bmof intel_wmi_thunderbolt mxm_wmi kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd Oct 12 04:18:27 zaBOX kernel: btintel rapl intel_cstate intel_uncore e1000e i2c_smbus bluetooth drm nvme nvme_core ahci i2c_core libahci ecdh_generic ecc syscopyarea sysfillrect input_leds sysimgblt led_class joydev nzxt_kraken2 intel_pch_thermal fb_sys_fops thermal fan video tpm_crb wmi tpm_tis backlight tpm_tis_core tpm acpi_pad button unix Oct 12 04:18:27 zaBOX kernel: CR2: 00000000000000b6 Oct 12 04:18:27 zaBOX kernel: ---[ end trace 0000000000000000 ]--- Another example with very different hardware: Oct 11 21:32:08 Impulse kernel: BUG: kernel NULL pointer dereference, address: 0000000000000056 Oct 11 21:32:08 Impulse kernel: #PF: supervisor read access in kernel mode Oct 11 21:32:08 Impulse kernel: #PF: error_code(0x0000) - not-present page Oct 11 21:32:08 Impulse kernel: PGD 0 P4D 0 Oct 11 21:32:08 Impulse kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI Oct 11 21:32:08 Impulse kernel: CPU: 1 PID: 5236 Comm: Disk Not tainted 5.19.14-Unraid #1 Oct 11 21:32:08 Impulse kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING II, BIOS 4301 03/04/2021 Oct 11 21:32:08 Impulse kernel: RIP: 0010:folio_try_get_rcu+0x0/0x21 Oct 11 21:32:08 Impulse kernel: Code: e8 8e 61 63 00 48 8b 84 24 80 00 00 00 65 48 2b 04 25 28 00 00 00 74 05 e8 9e 9b 64 00 48 81 c4 88 00 00 00 5b e9 cc 5f 86 00 <8b> 57 34 85 d2 74 10 8d 4a 01 89 d0 f0 0f b1 4f 34 74 04 89 c2 eb Oct 11 21:32:08 Impulse kernel: RSP: 0000:ffffc900026ffcc0 EFLAGS: 00010246 Oct 11 21:32:08 Impulse kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000022 Oct 11 21:32:08 Impulse kernel: RDX: 0000000000000001 RSI: ffff88801e450b68 RDI: 0000000000000022 Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 000000000000000c R09: ffffc900026ffcd0 Oct 11 21:32:08 Impulse kernel: R10: ffffc900026ffcd0 R11: ffffc900026ffd48 R12: 0000000000000000 Oct 11 21:32:08 Impulse kernel: R13: ffff888428441cb8 R14: 00000000000028cd R15: ffff888428441cc0 Oct 11 21:32:08 Impulse kernel: FS: 00001548d34fa6c0(0000) GS:ffff88842e840000(0000) knlGS:0000000000000000 Oct 11 21:32:08 Impulse kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056 CR3: 00000001a3fe6000 CR4: 00000000003506e0 Oct 11 21:32:08 Impulse kernel: Call Trace: Oct 11 21:32:08 Impulse kernel: <TASK> Oct 11 21:32:08 Impulse kernel: __filemap_get_folio+0x98/0x1ff Oct 11 21:32:08 Impulse kernel: filemap_fault+0x6e/0x524 Oct 11 21:32:08 Impulse kernel: __do_fault+0x30/0x6e Oct 11 21:32:08 Impulse kernel: __handle_mm_fault+0x9a5/0xc7d Oct 11 21:32:08 Impulse kernel: handle_mm_fault+0x113/0x1d7 Oct 11 21:32:08 Impulse kernel: do_user_addr_fault+0x36a/0x514 Oct 11 21:32:08 Impulse kernel: exc_page_fault+0xfc/0x11e Oct 11 21:32:08 Impulse kernel: asm_exc_page_fault+0x22/0x30 Oct 11 21:32:08 Impulse kernel: RIP: 0033:0x1548dbc04741 Oct 11 21:32:08 Impulse kernel: Code: 48 01 d0 eb 1b 0f 1f 40 00 f3 0f 1e fa 48 39 d1 0f 82 73 28 fc ff 0f 1f 00 f3 0f 1e fa 48 89 f8 48 83 fa 20 0f 82 af 00 00 00 <c5> fe 6f 06 48 83 fa 40 0f 87 3e 01 00 00 c5 fe 6f 4c 16 e0 c5 fe Oct 11 21:32:08 Impulse kernel: RSP: 002b:00001548d34f9808 EFLAGS: 00010202 Oct 11 21:32:08 Impulse kernel: RAX: 000015480c010d30 RBX: 000015480c018418 RCX: 00001548d34f9a40 Oct 11 21:32:08 Impulse kernel: RDX: 0000000000004000 RSI: 000015471f8cd50f RDI: 000015480c010d30 Oct 11 21:32:08 Impulse kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 Oct 11 21:32:08 Impulse kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000 Oct 11 21:32:08 Impulse kernel: R13: 00001548d34f9ac0 R14: 0000000000000003 R15: 0000154814013d10 Oct 11 21:32:08 Impulse kernel: </TASK> Oct 11 21:32:08 Impulse kernel: Modules linked in: xt_connmark xt_comment iptable_raw wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha xt_mark xt_nat xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs md_mod ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc ipv6 mlx4_en mlx4_core igb i2c_algo_bit edac_mce_amd edac_core kvm_amd kvm wmi_bmof mxm_wmi asus_wmi_sensors crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas aesni_intel crypto_simd nvme cryptd ahci i2c_piix4 raid_class rapl k10temp i2c_core nvme_core ccp scsi_transport_sas libahci wmi button acpi_cpufreq unix [last unloaded: mlx4_core] Oct 11 21:32:08 Impulse kernel: CR2: 0000000000000056 Oct 11 21:32:08 Impulse kernel: ---[ end trace 0000000000000000 ]--- So they always start with this (end address will change): Oct 11 05:02:02 Cogsworth kernel: BUG: kernel NULL pointer dereference, address: 0000000000000076 and always have this: Oct 11 05:02:02 Cogsworth kernel: Call Trace: Oct 11 05:02:02 Cogsworth kernel: <TASK> Oct 11 05:02:02 Cogsworth kernel: __filemap_get_folio+0x98/0x1ff The fact that it's happening to various users with very different hardware, both Intel and AMD, makes me think it's not a hardware/firmware issue, so we can try to find if they are running anything in common, these are the plugins I've found in common between the 4 or 5 cases found so far, these are some of the most used plugins so not surprising they are installed in all but it's also easy to rule them out: ca.backup2.plg - 2022.07.23 (Up to date) community.applications.plg - 2022.09.30 (Up to date) dynamix.active.streams.plg - 2020.06.17 (Up to date) file.activity.plg - 2022.08.19 (Up to date) fix.common.problems.plg - 2022.10.09 (Up to date) unassigned.devices.plg - 2022.10.03 (Up to date) unassigned.devices-plus.plg - 2022.08.19 (Up to date) So anyone having this issue try temporarily uninstalling/disabling these plugin to see if there's any difference.
  15. read time tree block corruption detected This suggests some old undetected corruption being now detected by the newer kernel, if you have difficulties backing it up now you can downgrade to previous Unraid and it should mount, then backup pool and re-format the pool.
  16. The used space is currently so low, that it's not possible to balance any further, there's only 1 data chunk, they are 1GiB in size, and about half is being used, hence the 49%, you cannot have less than one data chunk so not possible to balance any further, so nothing to worry about.
  17. Looks good for now, see here for better pool motioning so you're notified if a device drops offline.
  18. Check/replace cables on disk5 and try again.
  19. This has been reported to help: https://forums.unraid.net/bug-reports/stable-releases/6103-samsung-980-temp-warning-r2007/?do=findComment&comment=20180
  20. It should, you just need the /config folder.
  21. It might not be that but you can confirm by blacklisting the iGPU driver.
×
×
  • Create New...