Pre-release 6.12.0-rc2 ZFS testing raidz1 - kernel: BUG: Bad page state in process z_wr_iss pfn:17a4c1


Recommended Posts

Hi there.

 

This post should be in Bug Report, but I can't post there. I'm not asking for support, understanding this is an RC release, it's just an FYI.

 

I've been running FreeBSD for a very long time for it's stability and security, jails and so on, and started testing Unraid because of the ZFS announcement. A few things I tested didn't work in the UI but worked in the CLI.

 

  1. Importing zpools from FreeBSD. Works fine on the CLI. Pools import, zvols mount and are writable. Snapshots can be rolled back and data integrity is good (checked with md5sum). The Unraid UI doesnt recognise them and won't mount them. Wants to format them. I was hoping this could've been an in-place upgrade, like magic trick, without a 30TB data migration. The zfs master plugin sees the pools and vols.
  2. Attempting to zfs send | zfs recv from FreeBSD to Unraid throws. Didn't expect that to work but was hoping it would. Just letting you know.
  3. Zpools /zvols created on the CLI in Unraid aren't useable by the UI. Sreenshot attached.
  4. Native zvols aren't shareable in the UI.  The sharing UI wants "disks" and not "pools" and can't select zvols. I worked around it by sharing mount points for the zvols, making symlinks on the CLI, but then samba complains about getxattr on the symlinks. Fair enough. Second workaround was to vi /etc/samba/smb-shares.conf and change the mounts. Third option (bad option) would be to change the "mountpoint" path for the zvol to a disk based mount point and mount them there. Keen to get someone's thoughts on this.
  5. The showstopper... I'm primarily testing a mirrored raidz1 pool, just because it's important to me. I've created a native raidz1 pool in the UI using two SSDs and shared it.  After 1-20GB data movement, I get kernel exceptions. Data movement continues to work for a little while, until IO stop altogether and the zpool and array become unresponsive requiring a reboot (init 6, shutdown -r now). Occasionally kernel hangs without panic. This is a reproducible on my system.

 

Here's an example exception:

 

Mar 31 16:18:06 server kernel: BUG: Bad page state in process z_wr_iss  pfn:17a4c1
Mar 31 16:18:06 server kernel: page:000000006bfb9957 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x17a4c1
Mar 31 16:18:06 server kernel: memcg:10000000
Mar 31 16:18:06 server kernel: flags: 0x17fff8000000000(node=0|zone=2|lastcpupid=0xffff)
Mar 31 16:18:06 server kernel: raw: 017fff8000000000 ffffea0005e93048 ffffea0005e93048 0000000000000000
Mar 31 16:18:06 server kernel: raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000010000000
Mar 31 16:18:06 server kernel: page dumped because: page still charged to cgroup
Mar 31 16:18:06 server kernel: Modules linked in: xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter md_mod efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) edac_mce_amd icp(PO) edac_core kvm_amd zcommon(PO) znvpair(PO) kvm spl(O) crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd wmi_bmof rapl mpt3sas i2c_piix4 input_leds i2c_core led_class k10temp ahci r8169 joydev ccp libahci raid_class scsi_transport_sas realtek wmi button acpi_cpufreq unix
Mar 31 16:18:06 server kernel: CPU: 7 PID: 2474 Comm: z_wr_iss Tainted: P           O       6.1.20-Unraid #1
Mar 31 16:18:06 server kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570S PG Riptide, BIOS P1.10 06/24/2021
Mar 31 16:18:06 server kernel: Call Trace:
Mar 31 16:18:06 server kernel: <TASK>
Mar 31 16:18:06 server kernel: dump_stack_lvl+0x44/0x5c
Mar 31 16:18:06 server kernel: bad_page+0xcc/0xe4
Mar 31 16:18:06 server kernel: check_new_pages+0xa7/0xb5
Mar 31 16:18:06 server kernel: get_page_from_freelist+0x590/0x89f
Mar 31 16:18:06 server kernel: __alloc_pages+0xfa/0x1e8
Mar 31 16:18:06 server kernel: abd_alloc_chunks+0xc8/0x31c [zfs]
Mar 31 16:18:06 server kernel: abd_alloc+0x76/0x97 [zfs]
Mar 31 16:18:06 server kernel: arc_hdr_alloc_abd+0x58/0x87 [zfs]
Mar 31 16:18:06 server kernel: arc_write_ready+0x3b9/0x3f8 [zfs]
Mar 31 16:18:06 server kernel: zio_ready+0x5c/0x2be [zfs]
Mar 31 16:18:06 server kernel: ? _raw_spin_lock+0x13/0x1c
Mar 31 16:18:06 server kernel: ? _raw_spin_unlock+0x14/0x29
Mar 31 16:18:06 server kernel: zio_execute+0xb4/0xdf [zfs]
Mar 31 16:18:06 server kernel: taskq_thread+0x269/0x38a [spl]
Mar 31 16:18:06 server kernel: ? wake_up_q+0x44/0x44
Mar 31 16:18:06 server kernel: ? zio_subblock+0x22/0x22 [zfs]
Mar 31 16:18:06 server kernel: ? taskq_dispatch_delay+0x106/0x106 [spl]
Mar 31 16:18:06 server kernel: kthread+0xe7/0xef
Mar 31 16:18:06 server kernel: ? kthread_complete_and_exit+0x1b/0x1b
Mar 31 16:18:06 server kernel: ret_from_fork+0x22/0x30
Mar 31 16:18:06 server kernel: </TASK>

 

pool_import.png

server-diagnostics-20230401-1024.zip

Edited by error_303
adding diags
Link to comment
7 hours ago, error_303 said:

The Unraid UI doesnt recognise them and won't mount them.

Were these pools created with FreeNAS/TrueNAS? If yes they cannot be imported so far because zfs is on partition #2, this should be supported in the near future.

 

ZVOL functionally using the GUID is not yet implemented yet, you can use them but only using the CLI.

 

#5 could be a hardware issue, zfs module is crashing, if you haven't yet run memtest.

 

 

 

Link to comment

Thanks for reaching out.

 

Re #1. They were created in native FreeBSD. And they imported into Unraid just fine, but *only* on the CLI. The UI didn't see them. The screenshot I shared has the UI reading --> disk 1 "Unmountable: unsupported", and right below in the CLI, I imported and listed the zpool, listed the zvols and snaps, followed by a happy comment. I can confirm the actual import works and IO works.

 

Re #5. I'll run the memtest. The disks and SAS controller they're using came out of a stable system. Could be kernel 6.1.20-Unraid not liking the SAS controller microcode.

[server /var/log] # lspci -knn | grep -i sas
03:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3004 PCI-Express Fusion-MPT SAS-3 [1000:0096] (rev 02)
	Subsystem: Broadcom / LSI SAS3004 PCI-Express Fusion-MPT SAS-3 [1000:3110]
	Kernel driver in use: mpt3sas
	Kernel modules: mpt3sas

[server /var/log] # grep LSI syslog 
Mar 31 06:50:35 server kernel: mpt3sas_cm0: LSISAS3004: FWVersion(15.00.02.00), ChipRevision(0x02), BiosVersion(16.00.00.00)

 

Any ideas re #4? Started with zero'd disks, I used the UI to create a mirrored zpool (raidz1). Pool appears, has a zvol, but the zvol can't be shared. When I said native, I meant all done natively in Unraid, as opposed to imported from a foreign system.

 

Thanks again!!

Link to comment
13 minutes ago, error_303 said:

Re #1. They were created in native FreeBSD.

Please import the pool using the CLI then post output of:

zpool status -LP

 

15 minutes ago, error_303 said:

Any ideas re #4?

Like mentioned there's no GUI support for zvols for now, though they can be used for VMs by manually entering the correct device path as the vdisk, e.g.,

/dev/zvol/<pool name>/<zvol name>

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.