Jump to content

BTRFS error (device loop2): block=4600659968 write time tree block corruption detected


je82

Recommended Posts

It seems my previous unraid installation that has been stable for years is starting to show misc errors, now all the dockers dropped, what is your suggested approach here? The system is an expensive one running ECC ram and passed memtest 10 passes when built.

 

Maybe its the ssd that is failing? I tried to look at the scrub stats of the docker container and it found no issues? "

 

"UUID: 7430bd32-eee4-4538-9616-eb39918be503 Scrub started: Sun Nov 5 12:03:46 2023 Status: aborted Duration: 0:00:00 Total to scrub: 13.33GiB Rate: 0.00B/s Error summary: no errors found"

 

Attached diags in case you experts can find what could be causing it, appreciate any help.

 

(Removed diags for privacy reasons)

 

Edited by je82
Link to comment

trying to shutdown the system gracefully now, capturing these logs:

 

Quote

Nov  5 12:12:01 NAS kernel: CPU: 13 PID: 2589 Comm: umount Tainted: P        W  O      5.15.46-Unraid #1
Nov  5 12:12:01 NAS kernel: Hardware name: Supermicro Super Server/X12SPI-TF, BIOS 1.4 07/11/2022
Nov  5 12:12:01 NAS kernel: RIP: 0010:d_walk+0x8a/0x206
Nov  5 12:12:01 NAS kernel: Code: 45 31 ff 44 89 e0 49 89 ed 83 e0 01 89 44 24 14 49 8b 95 a0 00 00 00 4c 89 eb 48 8d 83 a0 00 00 00 48 39 c2 0f 84 8d 00 00 00 <f6> 82 73 ff ff ff 20 4c 8d aa 70 ff ff ff 4c 8b 32 75 72 4c 8d 42
Nov  5 12:12:01 NAS kernel: RSP: 0018:ffffc9000b3b3db0 EFLAGS: 00010207
Nov  5 12:12:01 NAS kernel: RAX: ffff8881c481ba10 RBX: ffff8881c481b970 RCX: 0000000000000007
Nov  5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: ffffc9000b3b3e28 RDI: ffff8893d0470418
Nov  5 12:12:01 NAS kernel: RBP: ffff88814cab8780 R08: ffff8881c481b9c8 R09: ffffc9000b3b3e68
Nov  5 12:12:01 NAS kernel: R10: ffffffff822d19f0 R11: 000000000000003c R12: 0000000002c6f9b6
Nov  5 12:12:01 NAS kernel: R13: ffff8881c481b970 R14: ffff8893d0470458 R15: 0000000000000000
Nov  5 12:12:01 NAS kernel: FS:  0000147c4d9e4740(0000) GS:ffff889fffd40000(0000) knlGS:0000000000000000
Nov  5 12:12:01 NAS kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  5 12:12:01 NAS kernel: CR2: ffffffffffffff73 CR3: 00000017f1648006 CR4: 0000000000770ee0
Nov  5 12:12:01 NAS kernel: PKRU: 55555554
Nov  5 12:12:01 NAS kernel: Call Trace:
Nov  5 12:12:01 NAS kernel: <TASK>
Nov  5 12:12:01 NAS kernel: ? select_collect2+0x7c/0x7c
Nov  5 12:12:01 NAS kernel: shrink_dcache_parent+0x4c/0x11e
Nov  5 12:12:01 NAS kernel: do_one_tree+0xe/0x31
Nov  5 12:12:01 NAS kernel: shrink_dcache_for_umount+0x36/0x6a
Nov  5 12:12:01 NAS kernel: generic_shutdown_super+0x1a/0x104
Nov  5 12:12:01 NAS kernel: kill_block_super+0x21/0x40
Nov  5 12:12:01 NAS kernel: deactivate_locked_super+0x33/0x6d
Nov  5 12:12:01 NAS kernel: cleanup_mnt+0x67/0xda
Nov  5 12:12:01 NAS kernel: task_work_run+0x6f/0x83
Nov  5 12:12:01 NAS kernel: exit_to_user_mode_prepare+0x9a/0x131
Nov  5 12:12:01 NAS kernel: syscall_exit_to_user_mode+0x18/0x23
Nov  5 12:12:01 NAS kernel: do_syscall_64+0x9f/0xa5
Nov  5 12:12:01 NAS kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov  5 12:12:01 NAS kernel: RIP: 0033:0x147c4db154e7
Nov  5 12:12:01 NAS kernel: Code: 89 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 89 0c 00 f7 d8 64 89 02 b8
Nov  5 12:12:01 NAS kernel: RSP: 002b:00007ffc46f5abc8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
Nov  5 12:12:01 NAS kernel: RAX: 0000000000000000 RBX: 0000147c4dca1f64 RCX: 0000147c4db154e7
Nov  5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000040a5b0
Nov  5 12:12:01 NAS kernel: RBP: 000000000040a380 R08: 0000000000000000 R09: 00000000ffffffff
Nov  5 12:12:01 NAS kernel: R10: 0000147c4da18ec8 R11: 0000000000000246 R12: 0000000000000000
Nov  5 12:12:01 NAS kernel: R13: 000000000040a5b0 R14: 000000000040a490 R15: 0000000000000000
Nov  5 12:12:01 NAS kernel: </TASK>
Nov  5 12:12:01 NAS kernel: Modules linked in: nvidia_modeset(PO) nvidia_uvm(PO) xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs dm_crypt dm_mod dax cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry md_mod nvidia(PO) ipmi_devintf wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm drm_kms_helper i10nm_edac x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm mpt3sas aesni_intel crypto_simd cryptd rapl backlight intel_cstate i2c_i801 nvme agpgart i2c_smbus syscopyarea input_leds sysfillrect raid_class ahci sysimgblt ixgbe
Nov  5 12:12:01 NAS kernel: nvme_core scsi_transport_sas i2c_core fb_sys_fops led_class intel_uncore libahci mdio intel_pch_thermal acpi_ipmi ipmi_si acpi_power_meter acpi_pad button [last unloaded: tun]
Nov  5 12:12:01 NAS kernel: CR2: ffffffffffffff73
Nov  5 12:12:01 NAS kernel: ---[ end trace 478da811436c28f1 ]---
Nov  5 12:12:01 NAS kernel: RIP: 0010:d_walk+0x8a/0x206
Nov  5 12:12:01 NAS kernel: Code: 45 31 ff 44 89 e0 49 89 ed 83 e0 01 89 44 24 14 49 8b 95 a0 00 00 00 4c 89 eb 48 8d 83 a0 00 00 00 48 39 c2 0f 84 8d 00 00 00 <f6> 82 73 ff ff ff 20 4c 8d aa 70 ff ff ff 4c 8b 32 75 72 4c 8d 42
Nov  5 12:12:01 NAS kernel: RSP: 0018:ffffc9000b3b3db0 EFLAGS: 00010207
Nov  5 12:12:01 NAS kernel: RAX: ffff8881c481ba10 RBX: ffff8881c481b970 RCX: 0000000000000007
Nov  5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: ffffc9000b3b3e28 RDI: ffff8893d0470418
Nov  5 12:12:01 NAS kernel: RBP: ffff88814cab8780 R08: ffff8881c481b9c8 R09: ffffc9000b3b3e68
Nov  5 12:12:01 NAS kernel: R10: ffffffff822d19f0 R11: 000000000000003c R12: 0000000002c6f9b6
Nov  5 12:12:01 NAS kernel: R13: ffff8881c481b970 R14: ffff8893d0470458 R15: 0000000000000000
Nov  5 12:12:01 NAS kernel: FS:  0000147c4d9e4740(0000) GS:ffff889fffd40000(0000) knlGS:0000000000000000
Nov  5 12:12:01 NAS kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  5 12:12:01 NAS kernel: CR2: ffffffffffffff73 CR3: 00000017f1648006 CR4: 0000000000770ee0
Nov  5 12:12:01 NAS kernel: PKRU: 55555554
Nov  5 12:12:01 NAS emhttpd: shcmd (2689340): exit status: 137
Nov  5 12:12:01 NAS emhttpd: shcmd (2689341): rmdir /mnt/disk8
Nov  5 12:12:01 NAS emhttpd: shcmd (2689342): umount /mnt/disk9
Nov  5 12:12:01 NAS kernel: XFS (dm-8): Unmounting Filesystem
Nov  5 12:12:01 NAS emhttpd: shcmd (2689343): rmdir /mnt/disk9
Nov  5 12:12:01 NAS emhttpd: shcmd (2689344): umount /mnt/cache_appdata
Nov  5 12:12:02 NAS kernel: XFS (dm-9): Unmounting Filesystem
Nov  5 12:12:03 NAS emhttpd: shcmd (2689345): rmdir /mnt/cache_appdata
Nov  5 12:12:03 NAS emhttpd: shcmd (2689346): umount /mnt/cache_incoming
Nov  5 12:12:03 NAS kernel: XFS (dm-10): Unmounting Filesystem
Nov  5 12:12:03 NAS emhttpd: shcmd (2689347): rmdir /mnt/cache_incoming
Nov  5 12:12:03 NAS emhttpd: shcmd (2689348): /usr/sbin/cryptsetup luksClose md1
Nov  5 12:12:03 NAS emhttpd: shcmd (2689349): /usr/sbin/cryptsetup luksClose md2
Nov  5 12:12:03 NAS emhttpd: shcmd (2689350): /usr/sbin/cryptsetup luksClose md3
Nov  5 12:12:03 NAS emhttpd: shcmd (2689351): /usr/sbin/cryptsetup luksClose md4
Nov  5 12:12:03 NAS emhttpd: shcmd (2689352): /usr/sbin/cryptsetup luksClose md5
Nov  5 12:12:03 NAS emhttpd: shcmd (2689353): /usr/sbin/cryptsetup luksClose md6
Nov  5 12:12:03 NAS emhttpd: shcmd (2689354): /usr/sbin/cryptsetup luksClose md7
Nov  5 12:12:03 NAS emhttpd: shcmd (2689355): /usr/sbin/cryptsetup luksClose md8
Nov  5 12:12:03 NAS root: Device md8 is still in use.
Nov  5 12:12:03 NAS emhttpd: shcmd (2689355): exit status: 5
Nov  5 12:12:03 NAS emhttpd: shcmd (2689356): /usr/sbin/cryptsetup luksClose md9
Nov  5 12:12:03 NAS emhttpd: shcmd (2689357): /usr/sbin/cryptsetup luksClose sdb1
Nov  5 12:12:03 NAS emhttpd: shcmd (2689358): /usr/sbin/cryptsetup luksClose sdc1
Nov  5 12:12:03 NAS kernel: mdcmd (65): stop
Nov  5 12:12:03 NAS kernel: md: 1 devices still in use.
Nov  5 12:12:03 NAS emhttpd: error: mdcmd, 3289: Device or resource busy (16): write
Nov  5 12:12:03 NAS emhttpd: shcmd (2689359): rm -f /boot/config/forcesync
Nov  5 12:12:03 NAS emhttpd: shcmd (2689360): sync

 

Link to comment
Quote

root@NAS:/mnt# sudo /usr/sbin/cryptsetup luksClose md8
Device md8 is still in use.

 sudo dmsetup remove /dev/mapper/md8
device-mapper: remove ioctl on md8  failed: Device or resource busy

 

Cant remove it no matter what i do, i guess graceful is not going to happen this time, im gonna shut down the server and run memtest, something is definitely wrong with the hardware, your suggestions are welcome

Edited by je82
Link to comment

Good to mention i did had a XFS Corruption issue pop up out of nowhere a few weeks ago,

 

Quote

Oct 20 09:37:11 NAS kernel: XFS (dm-3): Corruption detected. Unmount and run xfs_repair

 

This time i ran all the xfs checks i could on the device affected, and it returned zero errors.

Now here we are again, the docker container went into a locked state
 

Quote

 

Nov  5 11:11:38 NAS kernel: BTRFS critical (device loop2): corrupt leaf: root=2 block=4600659968 slot=57, bad key order, prev (9313379941377294336 136 65535) current (4681564160 169 0)

Device md8 is still in use. and md: 1 devices still in use.

 

 

the array could not be dismounted, i had to shut down via powerbutton.

 

My idea is, either RAM, bad cable/backplane or could it be something else? the server has been stable for a year or so and now suddenly within a short span of time getting issues.

Edited by je82
Link to comment

Memtest passed 2 passes up to test 5, isn't 100% confirmed to not be a memory

 

Found this in the bios change log,, bios is 2 years old: "1.4b (01/09/2023)
1. Updated 5.22_WhitleyCrb_0ACMS_ICX_74 (Intel BKC WW46 IPU2023.1).
2. Updated SEL for processor error.
3. Fixed the memory error that is not reported in SMBIOS event log.
4. Enabled all boot options first for test case 220, 271, 356 and 457.
5. Resolved abnormal SOL resolution"

 

Updating to the latest just in case.

 

Do any of you know any way to test memory while running the server? I cant really take it offline 2 days for a complete 4 pass of memtest. Or if you think the culprit could be something else im all ears, thanks

 

 

 

 

Edited by je82
Link to comment

bleh, shutting down the array while docker auto start is still running is a bad idea? i though it would handle it but now stuck in "Nov  5 14:41:18 NAS emhttpd: shcmd (603): umount /mnt/cache_appdata" loop, and via cli cannot force unmount it either, why is it always impossible to force unmount stuff in unraid?

 

lsof /mnt/cache_appdata returns nothing,

umount -f /mnt/cache_appdata
umount: /mnt/cache_appdata: target is busy.

 

never been able to unmount something when unraid thinks its in use, even the force commands are denied. maybe its unrelated to shutting down the array while docker auto starts are still running?

Edited by je82
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...