Jump to content
  • 6.12-rc5: process z_wr_iss Tainted when zfs is on heavy load


    seanhe26
    • Closed Annoyance

    some background info:

    N5105 Board, with early microcode in BIOS, working fine with TrueNas Scale, except that VM in Trunnas crashes randomly, which is a known issue with early micorcode. check https://forums.servethehome.com/index.php?threads/jasper-lake-proxmox-kvm-qemu-vm-guest-stability.38824/ so i stopped using VM on Truenas.  I did some memery test last night, all passed. so I assume the hardware is ok.

     

    2 days ago, im trying to setup unraid on this machine, all good, ZFS volume not recognized, so i have to reformat them but i have backup. when i mounted my usb harddrive which contains my backup, and use native copy function in unraid, error happened, i did not download the log that time, but i can clearly remember it's something like  *process*tainted*call*trace ....  and both "rsync" and "z_wr_iss" had such "tainted" error. 

     

    today, i used my another PC, and connected to smb share of ZFS pool. after a few minutes of copying, similair errors emerge:

     

    May 17 11:05:18 Tower kernel: general protection fault, probably for non-canonical address 0xfffb8883e4dbc650: 0000 [#1] PREEMPT SMP NOPTI
    May 17 11:05:18 Tower kernel: CPU: 1 PID: 2113 Comm: z_wr_iss Tainted: P           O       6.1.27-Unraid #1
    May 17 11:05:18 Tower kernel: Hardware name: UGREEN DX4600/To be filled by O.E.M, BIOS 5.19 06/16/2022
    May 17 11:05:18 Tower kernel: RIP: 0010:metaslab_alloc_dva+0xdf9/0xfce [zfs]
    May 17 11:05:18 Tower kernel: Code: 03 4d 58 48 01 d8 48 39 c8 73 bc 48 8b 44 24 58 bf ff ff ff 00 49 c1 ec 09 48 8b 74 24 50 48 c1 e7 20 48 8b 5c 24 18 48 01 c6 <48> 8b 0e 48 89 c8 48 c1 e8 20 48 33 03 48 c1 e0 20 48 21 f8 8b bc
    May 17 11:05:18 Tower kernel: RSP: 0018:ffffc90001b47ba8 EFLAGS: 00010286
    May 17 11:05:18 Tower kernel: RAX: 0000000000000000 RBX: ffff8881652d4000 RCX: 0000000000300000
    May 17 11:05:18 Tower kernel: RDX: 0000000000000000 RSI: fffb8883e4dbc650 RDI: 00ffffff00000000
    May 17 11:05:18 Tower kernel: RBP: ffff888102badc00 R08: 0000000000000000 R09: ffffffffa0daebbe
    May 17 11:05:18 Tower kernel: R10: ffff8884cc768000 R11: 0000000000000002 R12: 00000001ae20b7d0
    May 17 11:05:18 Tower kernel: R13: ffff888101401400 R14: ffff888102badc00 R15: 0000000000000001
    May 17 11:05:18 Tower kernel: FS:  0000000000000000(0000) GS:ffff888c4fe80000(0000) knlGS:0000000000000000
    May 17 11:05:18 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 17 11:05:18 Tower kernel: CR2: 000000c00027c000 CR3: 0000000285fc2000 CR4: 0000000000350ee0
    May 17 11:05:18 Tower kernel: Call Trace:
    May 17 11:05:18 Tower kernel: <TASK>
    May 17 11:05:18 Tower kernel: ? preempt_latency_start+0x2b/0x46
    May 17 11:05:18 Tower kernel: metaslab_alloc+0x107/0x1fd [zfs]
    May 17 11:05:18 Tower kernel: zio_dva_allocate+0xee/0x73f [zfs]
    May 17 11:05:18 Tower kernel: ? preempt_latency_start+0x2b/0x46
    May 17 11:05:18 Tower kernel: ? _raw_spin_lock+0x13/0x1c
    May 17 11:05:18 Tower kernel: ? _raw_spin_unlock+0x14/0x29
    May 17 11:05:18 Tower kernel: ? zio_wait_for_children+0xa9/0xb7 [zfs]
    May 17 11:05:18 Tower kernel: ? preempt_latency_start+0x2b/0x46
    May 17 11:05:18 Tower kernel: ? _raw_spin_lock+0x13/0x1c
    May 17 11:05:18 Tower kernel: ? _raw_spin_unlock+0x14/0x29
    May 17 11:05:18 Tower kernel: ? tsd_hash_search+0x70/0x7d [spl]
    May 17 11:05:18 Tower kernel: zio_execute+0xb1/0xdf [zfs]
    May 17 11:05:18 Tower kernel: taskq_thread+0x266/0x38a [spl]
    May 17 11:05:18 Tower kernel: ? wake_up_q+0x44/0x44
    May 17 11:05:18 Tower kernel: ? zio_subblock+0x22/0x22 [zfs]
    May 17 11:05:18 Tower kernel: ? taskq_dispatch_delay+0x106/0x106 [spl]
    May 17 11:05:18 Tower kernel: kthread+0xe4/0xef
    May 17 11:05:18 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
    May 17 11:05:18 Tower kernel: ret_from_fork+0x1f/0x30
    May 17 11:05:18 Tower kernel: </TASK>
    May 17 11:05:18 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle iptable_mangle vhost_net vhost vhost_iotlb tap ipvlan xt_nat xt_tcpudp veth xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs xt_MASQUERADE xt_mark iptable_nat nfsd auth_rpcgss oid_registry lockd grace sunrpc ip6table_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun md_mod tcp_diag inet_diag nct6775_core hwmon_vid efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls igc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) i915 mei_pxp mei_hdcp wmi_bmof x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel drm_buddy i2c_algo_bit ttm drm_display_helper kvm drm_kms_helper drm crct10dif_pclmul intel_gtt crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel agpgart mei_me i2c_i801 crypto_simd cryptd intel_cstate tpm_crb nvme i2c_smbus
    May 17 11:05:18 Tower kernel: nvme_core mei processor_thermal_device_pci_legacy i2c_core processor_thermal_device processor_thermal_rfim processor_thermal_mbox int340x_thermal_zone intel_soc_dts_iosf tpm_tis syscopyarea ahci sysfillrect sysimgblt libahci iosf_mbi fb_sys_fops thermal tpm_tis_core video fan wmi backlight tpm acpi_pad acpi_tad intel_pmc_core button unix [last unloaded: igc]
    May 17 11:05:18 Tower kernel: ---[ end trace 0000000000000000 ]---
    May 17 11:05:18 Tower kernel: RIP: 0010:metaslab_alloc_dva+0xdf9/0xfce [zfs]
    May 17 11:05:18 Tower kernel: Code: 03 4d 58 48 01 d8 48 39 c8 73 bc 48 8b 44 24 58 bf ff ff ff 00 49 c1 ec 09 48 8b 74 24 50 48 c1 e7 20 48 8b 5c 24 18 48 01 c6 <48> 8b 0e 48 89 c8 48 c1 e8 20 48 33 03 48 c1 e0 20 48 21 f8 8b bc
    May 17 11:05:18 Tower kernel: RSP: 0018:ffffc90001b47ba8 EFLAGS: 00010286
    May 17 11:05:18 Tower kernel: RAX: 0000000000000000 RBX: ffff8881652d4000 RCX: 0000000000300000
    May 17 11:05:18 Tower kernel: RDX: 0000000000000000 RSI: fffb8883e4dbc650 RDI: 00ffffff00000000
    May 17 11:05:18 Tower kernel: RBP: ffff888102badc00 R08: 0000000000000000 R09: ffffffffa0daebbe
    May 17 11:05:18 Tower kernel: R10: ffff8884cc768000 R11: 0000000000000002 R12: 00000001ae20b7d0
    May 17 11:05:18 Tower kernel: R13: ffff888101401400 R14: ffff888102badc00 R15: 0000000000000001
    May 17 11:05:18 Tower kernel: FS:  0000000000000000(0000) GS:ffff888c4fe80000(0000) knlGS:0000000000000000
    May 17 11:05:18 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    May 17 11:05:18 Tower kernel: CR2: 000000c00027c000 CR3: 0000000285fc2000 CR4: 0000000000350ee0

     

    just made a flashdrive with memtest86+, currently full test pass 2 times. still running, makes me confident the hardware is ok. is this related to how unraid handles zfs memory? pls help investigate using the log i attached.

    tower-diagnostics-20230517-1123.zip tower-syslog-20230517-0309.zip




    User Feedback

    Recommended Comments

    did not have patience for 3rd round of memtest finish, still no errors found. so i did more search and found this is more likely to be related to hardware, specifially, memory. so i removed the memery and cleaned the pins and intsert them back. now i have copied for one hour and no errors in unraid so far.

    some noticble but maybe useless information: on papper N5105 only support 16GB Mem, but i used 32GB+16GB. this could cause a problem with system, but Truenas/Windows seems handling it well according to test.

    anyway, i'll close this request for now.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.

×
×
  • Create New...