je82 Posted November 5, 2023 Share Posted November 5, 2023 (edited) It seems my previous unraid installation that has been stable for years is starting to show misc errors, now all the dockers dropped, what is your suggested approach here? The system is an expensive one running ECC ram and passed memtest 10 passes when built. Maybe its the ssd that is failing? I tried to look at the scrub stats of the docker container and it found no issues? " "UUID: 7430bd32-eee4-4538-9616-eb39918be503 Scrub started: Sun Nov 5 12:03:46 2023 Status: aborted Duration: 0:00:00 Total to scrub: 13.33GiB Rate: 0.00B/s Error summary: no errors found" Attached diags in case you experts can find what could be causing it, appreciate any help. (Removed diags for privacy reasons) Edited November 5, 2023 by je82 Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 trying to shutdown the system gracefully now, capturing these logs: Quote Nov 5 12:12:01 NAS kernel: CPU: 13 PID: 2589 Comm: umount Tainted: P W O 5.15.46-Unraid #1 Nov 5 12:12:01 NAS kernel: Hardware name: Supermicro Super Server/X12SPI-TF, BIOS 1.4 07/11/2022 Nov 5 12:12:01 NAS kernel: RIP: 0010:d_walk+0x8a/0x206 Nov 5 12:12:01 NAS kernel: Code: 45 31 ff 44 89 e0 49 89 ed 83 e0 01 89 44 24 14 49 8b 95 a0 00 00 00 4c 89 eb 48 8d 83 a0 00 00 00 48 39 c2 0f 84 8d 00 00 00 <f6> 82 73 ff ff ff 20 4c 8d aa 70 ff ff ff 4c 8b 32 75 72 4c 8d 42 Nov 5 12:12:01 NAS kernel: RSP: 0018:ffffc9000b3b3db0 EFLAGS: 00010207 Nov 5 12:12:01 NAS kernel: RAX: ffff8881c481ba10 RBX: ffff8881c481b970 RCX: 0000000000000007 Nov 5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: ffffc9000b3b3e28 RDI: ffff8893d0470418 Nov 5 12:12:01 NAS kernel: RBP: ffff88814cab8780 R08: ffff8881c481b9c8 R09: ffffc9000b3b3e68 Nov 5 12:12:01 NAS kernel: R10: ffffffff822d19f0 R11: 000000000000003c R12: 0000000002c6f9b6 Nov 5 12:12:01 NAS kernel: R13: ffff8881c481b970 R14: ffff8893d0470458 R15: 0000000000000000 Nov 5 12:12:01 NAS kernel: FS: 0000147c4d9e4740(0000) GS:ffff889fffd40000(0000) knlGS:0000000000000000 Nov 5 12:12:01 NAS kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 5 12:12:01 NAS kernel: CR2: ffffffffffffff73 CR3: 00000017f1648006 CR4: 0000000000770ee0 Nov 5 12:12:01 NAS kernel: PKRU: 55555554 Nov 5 12:12:01 NAS kernel: Call Trace: Nov 5 12:12:01 NAS kernel: <TASK> Nov 5 12:12:01 NAS kernel: ? select_collect2+0x7c/0x7c Nov 5 12:12:01 NAS kernel: shrink_dcache_parent+0x4c/0x11e Nov 5 12:12:01 NAS kernel: do_one_tree+0xe/0x31 Nov 5 12:12:01 NAS kernel: shrink_dcache_for_umount+0x36/0x6a Nov 5 12:12:01 NAS kernel: generic_shutdown_super+0x1a/0x104 Nov 5 12:12:01 NAS kernel: kill_block_super+0x21/0x40 Nov 5 12:12:01 NAS kernel: deactivate_locked_super+0x33/0x6d Nov 5 12:12:01 NAS kernel: cleanup_mnt+0x67/0xda Nov 5 12:12:01 NAS kernel: task_work_run+0x6f/0x83 Nov 5 12:12:01 NAS kernel: exit_to_user_mode_prepare+0x9a/0x131 Nov 5 12:12:01 NAS kernel: syscall_exit_to_user_mode+0x18/0x23 Nov 5 12:12:01 NAS kernel: do_syscall_64+0x9f/0xa5 Nov 5 12:12:01 NAS kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Nov 5 12:12:01 NAS kernel: RIP: 0033:0x147c4db154e7 Nov 5 12:12:01 NAS kernel: Code: 89 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 51 89 0c 00 f7 d8 64 89 02 b8 Nov 5 12:12:01 NAS kernel: RSP: 002b:00007ffc46f5abc8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 Nov 5 12:12:01 NAS kernel: RAX: 0000000000000000 RBX: 0000147c4dca1f64 RCX: 0000147c4db154e7 Nov 5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000040a5b0 Nov 5 12:12:01 NAS kernel: RBP: 000000000040a380 R08: 0000000000000000 R09: 00000000ffffffff Nov 5 12:12:01 NAS kernel: R10: 0000147c4da18ec8 R11: 0000000000000246 R12: 0000000000000000 Nov 5 12:12:01 NAS kernel: R13: 000000000040a5b0 R14: 000000000040a490 R15: 0000000000000000 Nov 5 12:12:01 NAS kernel: </TASK> Nov 5 12:12:01 NAS kernel: Modules linked in: nvidia_modeset(PO) nvidia_uvm(PO) xt_mark xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_nat xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs dm_crypt dm_mod dax cmac cifs asn1_decoder cifs_arc4 cifs_md4 oid_registry md_mod nvidia(PO) ipmi_devintf wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm drm_kms_helper i10nm_edac x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm mpt3sas aesni_intel crypto_simd cryptd rapl backlight intel_cstate i2c_i801 nvme agpgart i2c_smbus syscopyarea input_leds sysfillrect raid_class ahci sysimgblt ixgbe Nov 5 12:12:01 NAS kernel: nvme_core scsi_transport_sas i2c_core fb_sys_fops led_class intel_uncore libahci mdio intel_pch_thermal acpi_ipmi ipmi_si acpi_power_meter acpi_pad button [last unloaded: tun] Nov 5 12:12:01 NAS kernel: CR2: ffffffffffffff73 Nov 5 12:12:01 NAS kernel: ---[ end trace 478da811436c28f1 ]--- Nov 5 12:12:01 NAS kernel: RIP: 0010:d_walk+0x8a/0x206 Nov 5 12:12:01 NAS kernel: Code: 45 31 ff 44 89 e0 49 89 ed 83 e0 01 89 44 24 14 49 8b 95 a0 00 00 00 4c 89 eb 48 8d 83 a0 00 00 00 48 39 c2 0f 84 8d 00 00 00 <f6> 82 73 ff ff ff 20 4c 8d aa 70 ff ff ff 4c 8b 32 75 72 4c 8d 42 Nov 5 12:12:01 NAS kernel: RSP: 0018:ffffc9000b3b3db0 EFLAGS: 00010207 Nov 5 12:12:01 NAS kernel: RAX: ffff8881c481ba10 RBX: ffff8881c481b970 RCX: 0000000000000007 Nov 5 12:12:01 NAS kernel: RDX: 0000000000000000 RSI: ffffc9000b3b3e28 RDI: ffff8893d0470418 Nov 5 12:12:01 NAS kernel: RBP: ffff88814cab8780 R08: ffff8881c481b9c8 R09: ffffc9000b3b3e68 Nov 5 12:12:01 NAS kernel: R10: ffffffff822d19f0 R11: 000000000000003c R12: 0000000002c6f9b6 Nov 5 12:12:01 NAS kernel: R13: ffff8881c481b970 R14: ffff8893d0470458 R15: 0000000000000000 Nov 5 12:12:01 NAS kernel: FS: 0000147c4d9e4740(0000) GS:ffff889fffd40000(0000) knlGS:0000000000000000 Nov 5 12:12:01 NAS kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 5 12:12:01 NAS kernel: CR2: ffffffffffffff73 CR3: 00000017f1648006 CR4: 0000000000770ee0 Nov 5 12:12:01 NAS kernel: PKRU: 55555554 Nov 5 12:12:01 NAS emhttpd: shcmd (2689340): exit status: 137 Nov 5 12:12:01 NAS emhttpd: shcmd (2689341): rmdir /mnt/disk8 Nov 5 12:12:01 NAS emhttpd: shcmd (2689342): umount /mnt/disk9 Nov 5 12:12:01 NAS kernel: XFS (dm-8): Unmounting Filesystem Nov 5 12:12:01 NAS emhttpd: shcmd (2689343): rmdir /mnt/disk9 Nov 5 12:12:01 NAS emhttpd: shcmd (2689344): umount /mnt/cache_appdata Nov 5 12:12:02 NAS kernel: XFS (dm-9): Unmounting Filesystem Nov 5 12:12:03 NAS emhttpd: shcmd (2689345): rmdir /mnt/cache_appdata Nov 5 12:12:03 NAS emhttpd: shcmd (2689346): umount /mnt/cache_incoming Nov 5 12:12:03 NAS kernel: XFS (dm-10): Unmounting Filesystem Nov 5 12:12:03 NAS emhttpd: shcmd (2689347): rmdir /mnt/cache_incoming Nov 5 12:12:03 NAS emhttpd: shcmd (2689348): /usr/sbin/cryptsetup luksClose md1 Nov 5 12:12:03 NAS emhttpd: shcmd (2689349): /usr/sbin/cryptsetup luksClose md2 Nov 5 12:12:03 NAS emhttpd: shcmd (2689350): /usr/sbin/cryptsetup luksClose md3 Nov 5 12:12:03 NAS emhttpd: shcmd (2689351): /usr/sbin/cryptsetup luksClose md4 Nov 5 12:12:03 NAS emhttpd: shcmd (2689352): /usr/sbin/cryptsetup luksClose md5 Nov 5 12:12:03 NAS emhttpd: shcmd (2689353): /usr/sbin/cryptsetup luksClose md6 Nov 5 12:12:03 NAS emhttpd: shcmd (2689354): /usr/sbin/cryptsetup luksClose md7 Nov 5 12:12:03 NAS emhttpd: shcmd (2689355): /usr/sbin/cryptsetup luksClose md8 Nov 5 12:12:03 NAS root: Device md8 is still in use. Nov 5 12:12:03 NAS emhttpd: shcmd (2689355): exit status: 5 Nov 5 12:12:03 NAS emhttpd: shcmd (2689356): /usr/sbin/cryptsetup luksClose md9 Nov 5 12:12:03 NAS emhttpd: shcmd (2689357): /usr/sbin/cryptsetup luksClose sdb1 Nov 5 12:12:03 NAS emhttpd: shcmd (2689358): /usr/sbin/cryptsetup luksClose sdc1 Nov 5 12:12:03 NAS kernel: mdcmd (65): stop Nov 5 12:12:03 NAS kernel: md: 1 devices still in use. Nov 5 12:12:03 NAS emhttpd: error: mdcmd, 3289: Device or resource busy (16): write Nov 5 12:12:03 NAS emhttpd: shcmd (2689359): rm -f /boot/config/forcesync Nov 5 12:12:03 NAS emhttpd: shcmd (2689360): sync Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 (edited) Quote root@NAS:/mnt# sudo /usr/sbin/cryptsetup luksClose md8 Device md8 is still in use. sudo dmsetup remove /dev/mapper/md8 device-mapper: remove ioctl on md8 failed: Device or resource busy Cant remove it no matter what i do, i guess graceful is not going to happen this time, im gonna shut down the server and run memtest, something is definitely wrong with the hardware, your suggestions are welcome Edited November 5, 2023 by je82 Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 (edited) If someone knows how to identify which drive is MD8 it would be helpful, tried a bunch of ways but i cant really figure out which drive it co-relates to, my guess MD8 = disk8? Edited November 5, 2023 by je82 Quote Link to comment
itimpi Posted November 5, 2023 Share Posted November 5, 2023 37 minutes ago, je82 said: my guess MD8 = disk8? Yes Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 (edited) Good to mention i did had a XFS Corruption issue pop up out of nowhere a few weeks ago, Quote Oct 20 09:37:11 NAS kernel: XFS (dm-3): Corruption detected. Unmount and run xfs_repair This time i ran all the xfs checks i could on the device affected, and it returned zero errors. Now here we are again, the docker container went into a locked state Quote Nov 5 11:11:38 NAS kernel: BTRFS critical (device loop2): corrupt leaf: root=2 block=4600659968 slot=57, bad key order, prev (9313379941377294336 136 65535) current (4681564160 169 0) Device md8 is still in use. and md: 1 devices still in use. the array could not be dismounted, i had to shut down via powerbutton. My idea is, either RAM, bad cable/backplane or could it be something else? the server has been stable for a year or so and now suddenly within a short span of time getting issues. Edited November 5, 2023 by je82 Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 (edited) Memtest passed 2 passes up to test 5, isn't 100% confirmed to not be a memory Found this in the bios change log,, bios is 2 years old: "1.4b (01/09/2023) 1. Updated 5.22_WhitleyCrb_0ACMS_ICX_74 (Intel BKC WW46 IPU2023.1). 2. Updated SEL for processor error. 3. Fixed the memory error that is not reported in SMBIOS event log. 4. Enabled all boot options first for test case 220, 271, 356 and 457. 5. Resolved abnormal SOL resolution" Updating to the latest just in case. Do any of you know any way to test memory while running the server? I cant really take it offline 2 days for a complete 4 pass of memtest. Or if you think the culprit could be something else im all ears, thanks Edited November 5, 2023 by je82 Quote Link to comment
je82 Posted November 5, 2023 Author Share Posted November 5, 2023 (edited) bleh, shutting down the array while docker auto start is still running is a bad idea? i though it would handle it but now stuck in "Nov 5 14:41:18 NAS emhttpd: shcmd (603): umount /mnt/cache_appdata" loop, and via cli cannot force unmount it either, why is it always impossible to force unmount stuff in unraid? lsof /mnt/cache_appdata returns nothing, umount -f /mnt/cache_appdata umount: /mnt/cache_appdata: target is busy. never been able to unmount something when unraid thinks its in use, even the force commands are denied. maybe its unrelated to shutting down the array while docker auto starts are still running? Edited November 5, 2023 by je82 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.