Jump to content

[SOLVED] Parity Swap Procedure - Asking to Copy Again


Recommended Posts

Posted (edited)

  

4 minutes ago, trurl said:

We're you following the current documentation for parity Swap?

 

There is no reason to enter mdcmd.


I indeed followed the parity Swap procedure, but as stated by the original poster. I messed up by keeping the screen open after the parity copy. Meaning I would have to do it again. I opted for the solution through the command as it seemed straight forward and was done with positive outcome for the original poster.

Edited by saltz
Link to comment
Posted (edited)
1 hour ago, JorgeB said:

Unraid driver crashed, you will need to reboot, having just pariry2 should not be a problem.


I rebooted, but now the disk cannot be mounted:
 

May 13 16:01:23 Newton kernel: mdcmd (32): set md_num_stripes 1280
May 13 16:01:23 Newton kernel: mdcmd (33): set md_queue_limit 80
May 13 16:01:23 Newton kernel: mdcmd (34): set md_sync_limit 5
May 13 16:01:23 Newton kernel: mdcmd (35): set md_write_method
May 13 16:01:23 Newton kernel: mdcmd (36): start NEW_ARRAY
May 13 16:01:23 Newton kernel: md: invalidslota=3
May 13 16:01:23 Newton kernel: md: invalidslotb=99
May 13 16:01:23 Newton kernel: unraid: allocating 36230K for 1280 stripes (7 disks)
May 13 16:01:23 Newton kernel: md1p1: running, size: 3907018532 blocks
May 13 16:01:23 Newton kernel: md2p1: running, size: 3907018532 blocks
May 13 16:01:23 Newton kernel: md3p1: running, size: 7814026532 blocks
May 13 16:01:23 Newton kernel: md4p1: running, size: 7814026532 blocks
May 13 16:01:23 Newton kernel: md5p1: running, size: 7814026532 blocks
May 13 16:01:23 Newton emhttpd: shcmd (270): udevadm settle
May 13 16:01:23 Newton emhttpd: Opening encrypted volumes...
May 13 16:01:23 Newton emhttpd: shcmd (277): touch /boot/config/forcesync
May 13 16:01:24 Newton emhttpd: Mounting disks...
May 13 16:01:24 Newton emhttpd: mounting /mnt/disk1
May 13 16:01:24 Newton emhttpd: shcmd (278): mkdir -p /mnt/disk1
May 13 16:01:24 Newton emhttpd: shcmd (279): mount -t xfs -o noatime,nouuid /dev/md1p1 /mnt/disk1
May 13 16:01:24 Newton kernel: SGI XFS with ACLs, security attributes, no debug enabled
May 13 16:01:24 Newton kernel: XFS (md1p1): Mounting V5 Filesystem
May 13 16:01:24 Newton kernel: BUG: unable to handle page fault for address: ffffffff81e5ab00
May 13 16:01:24 Newton kernel: #PF: supervisor write access in kernel mode
May 13 16:01:24 Newton kernel: #PF: error_code(0x0003) - permissions violation
May 13 16:01:24 Newton kernel: PGD 220e067 P4D 220e067 PUD 220f063 PMD 8000000001e001e1 
May 13 16:01:24 Newton kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
May 13 16:01:24 Newton kernel: CPU: 11 PID: 11685 Comm: unraidd1 Tainted: P           O       6.1.64-Unraid #1
May 13 16:01:24 Newton kernel: Hardware name: System manufacturer System Product Name/TUF B450-PLUS GAMING, BIOS 2008 12/06/2019
May 13 16:01:24 Newton kernel: RIP: 0010:raid6_avx24_xor_syndrome+0x20b/0x264
May 13 16:01:24 Newton kernel: Code: 0d fc f6 c5 d5 db e8 c5 c5 db f8 c5 15 db e8 c5 05 db f8 c5 dd ef e5 c5 cd ef f7 c4 41 1d ef e5 c4 41 0d ef f7 41 ff c8 eb 9e <c4> c1 7d e7 55 00 c4 c1 7d e7 1b c4 41 7d e7 12 c4 41 7d e7 19 c5
May 13 16:01:24 Newton kernel: RSP: 0018:ffffc9000c63fd50 EFLAGS: 00010286
May 13 16:01:24 Newton kernel: RAX: ffff888104937040 RBX: ffff88816a7e1260 RCX: 0000000000000060
May 13 16:01:24 Newton kernel: RDX: ffff888104937000 RSI: 0000000000000020 RDI: 0000000000000000
May 13 16:01:24 Newton kernel: RBP: ffffffff81e5ab00 R08: 00000000ffffffff R09: ffffffff81e5ab60
May 13 16:01:24 Newton kernel: R10: ffffffff81e5ab40 R11: ffffffff81e5ab20 R12: ffff888104937000
May 13 16:01:24 Newton kernel: R13: ffffffff81e5ab00 R14: 0000000000000000 R15: 0000000000000000
May 13 16:01:24 Newton kernel: FS:  0000000000000000(0000) GS:ffff8887feac0000(0000) knlGS:0000000000000000
May 13 16:01:24 Newton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 16:01:24 Newton kernel: CR2: ffffffff81e5ab00 CR3: 000000000220a000 CR4: 0000000000350ee0
May 13 16:01:24 Newton kernel: Call Trace:
May 13 16:01:24 Newton kernel: <TASK>
May 13 16:01:24 Newton kernel: ? __die_body+0x1a/0x5c
May 13 16:01:24 Newton kernel: ? page_fault_oops+0x329/0x376
May 13 16:01:24 Newton kernel: ? fixup_exception+0x22/0x24b
May 13 16:01:24 Newton kernel: ? exc_page_fault+0xf4/0x11d
May 13 16:01:24 Newton kernel: ? asm_exc_page_fault+0x22/0x30
May 13 16:01:24 Newton kernel: ? raid6_avx24_xor_syndrome+0x20b/0x264
May 13 16:01:24 Newton kernel: rmw6_write_data+0xe1/0x1a9 [md_mod]
May 13 16:01:24 Newton kernel: unraidd+0x851/0x1140 [md_mod]
May 13 16:01:24 Newton kernel: md_thread+0xf7/0x122 [md_mod]
May 13 16:01:24 Newton kernel: ? _raw_spin_rq_lock_irqsave+0x20/0x20
May 13 16:01:24 Newton kernel: ? signal_pending+0x1d/0x1d [md_mod]
May 13 16:01:24 Newton kernel: kthread+0xe7/0xef
May 13 16:01:24 Newton kernel: ? kthread_complete_and_exit+0x1b/0x1b
May 13 16:01:24 Newton kernel: ret_from_fork+0x22/0x30
May 13 16:01:24 Newton kernel: </TASK>
May 13 16:01:24 Newton kernel: Modules linked in: xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls nvidia_drm(PO) nvidia_modeset(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel video sha512_ssse3 sha256_ssse3 sha1_ssse3 drm_kms_helper aesni_intel wmi_bmof crypto_simd drm cryptd mpt3sas rapl i2c_piix4 backlight i2c_core r8169 k10temp ahci raid_class ccp syscopyarea sysfillrect scsi_transport_sas sysimgblt fb_sys_fops libahci realtek wmi button acpi_cpufreq unix
May 13 16:01:24 Newton kernel: CR2: ffffffff81e5ab00
May 13 16:01:24 Newton kernel: ---[ end trace 0000000000000000 ]---
May 13 16:01:24 Newton kernel: RIP: 0010:raid6_avx24_xor_syndrome+0x20b/0x264
May 13 16:01:24 Newton kernel: Code: 0d fc f6 c5 d5 db e8 c5 c5 db f8 c5 15 db e8 c5 05 db f8 c5 dd ef e5 c5 cd ef f7 c4 41 1d ef e5 c4 41 0d ef f7 41 ff c8 eb 9e <c4> c1 7d e7 55 00 c4 c1 7d e7 1b c4 41 7d e7 12 c4 41 7d e7 19 c5
May 13 16:01:24 Newton kernel: RSP: 0018:ffffc9000c63fd50 EFLAGS: 00010286
May 13 16:01:24 Newton kernel: RAX: ffff888104937040 RBX: ffff88816a7e1260 RCX: 0000000000000060
May 13 16:01:24 Newton kernel: RDX: ffff888104937000 RSI: 0000000000000020 RDI: 0000000000000000
May 13 16:01:24 Newton kernel: RBP: ffffffff81e5ab00 R08: 00000000ffffffff R09: ffffffff81e5ab60
May 13 16:01:24 Newton kernel: R10: ffffffff81e5ab40 R11: ffffffff81e5ab20 R12: ffff888104937000
May 13 16:01:24 Newton kernel: R13: ffffffff81e5ab00 R14: 0000000000000000 R15: 0000000000000000
May 13 16:01:24 Newton kernel: FS:  0000000000000000(0000) GS:ffff8887feac0000(0000) knlGS:0000000000000000
May 13 16:01:24 Newton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 16:01:24 Newton kernel: CR2: ffffffff81e5ab00 CR3: 000000000220a000 CR4: 0000000000350ee0
May 13 16:01:24 Newton kernel: note: unraidd1[11685] exited with irqs disabled
May 13 16:01:24 Newton kernel: note: unraidd1[11685] exited with preempt_count 1
May 13 16:01:24 Newton kernel: ------------[ cut here ]------------
May 13 16:01:24 Newton kernel: WARNING: CPU: 11 PID: 11685 at kernel/exit.c:814 do_exit+0x87/0x923
May 13 16:01:24 Newton kernel: Modules linked in: xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls nvidia_drm(PO) nvidia_modeset(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd nvidia(PO) kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel video sha512_ssse3 sha256_ssse3 sha1_ssse3 drm_kms_helper aesni_intel wmi_bmof crypto_simd drm cryptd mpt3sas rapl i2c_piix4 backlight i2c_core r8169 k10temp ahci raid_class ccp syscopyarea sysfillrect scsi_transport_sas sysimgblt fb_sys_fops libahci realtek wmi button acpi_cpufreq unix
May 13 16:01:24 Newton kernel: CPU: 11 PID: 11685 Comm: unraidd1 Tainted: P      D    O       6.1.64-Unraid #1
May 13 16:01:24 Newton kernel: Hardware name: System manufacturer System Product Name/TUF B450-PLUS GAMING, BIOS 2008 12/06/2019
May 13 16:01:24 Newton kernel: RIP: 0010:do_exit+0x87/0x923
May 13 16:01:24 Newton kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 76 dd 80 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 78 dc 80 00 48 8b 83 d0 06 00 00 83
May 13 16:01:24 Newton kernel: RSP: 0018:ffffc9000c63fee0 EFLAGS: 00010286
May 13 16:01:24 Newton kernel: RAX: 0000000000000000 RBX: ffff88817ff89f80 RCX: 0000000000000000
May 13 16:01:24 Newton kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
May 13 16:01:24 Newton kernel: RBP: 0000000000000009 R08: 0000000000000000 R09: 74706d6565727020
May 13 16:01:24 Newton kernel: R10: 20746e756f635f74 R11: 706d656572702068 R12: ffff88810776d000
May 13 16:01:24 Newton kernel: R13: ffff88816a504a40 R14: 0000000000000000 R15: 0000000000000000
May 13 16:01:24 Newton kernel: FS:  0000000000000000(0000) GS:ffff8887feac0000(0000) knlGS:0000000000000000
May 13 16:01:24 Newton kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 16:01:24 Newton kernel: CR2: ffffffff81e5ab00 CR3: 000000000220a000 CR4: 0000000000350ee0
May 13 16:01:24 Newton kernel: Call Trace:
May 13 16:01:24 Newton kernel: <TASK>
May 13 16:01:24 Newton kernel: ? __warn+0xab/0x122
May 13 16:01:24 Newton kernel: ? report_bug+0x109/0x17e
May 13 16:01:24 Newton kernel: ? do_exit+0x87/0x923
May 13 16:01:24 Newton kernel: ? handle_bug+0x41/0x6f
May 13 16:01:24 Newton kernel: ? exc_invalid_op+0x13/0x60
May 13 16:01:24 Newton kernel: ? asm_exc_invalid_op+0x16/0x20
May 13 16:01:24 Newton kernel: ? do_exit+0x87/0x923
May 13 16:01:24 Newton kernel: make_task_dead+0x11c/0x11c
May 13 16:01:24 Newton kernel: rewind_stack_and_make_dead+0x17/0x17
May 13 16:01:24 Newton kernel: RIP: 0000:0x0
May 13 16:01:24 Newton kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6.
May 13 16:01:24 Newton kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
May 13 16:01:24 Newton kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
May 13 16:01:24 Newton kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May 13 16:01:24 Newton kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
May 13 16:01:24 Newton kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
May 13 16:01:24 Newton kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
May 13 16:01:24 Newton kernel: </TASK>
May 13 16:01:24 Newton kernel: ---[ end trace 0000000000000000 ]---


Any idea?

newton-diagnostics-20240513-1602.zip

Edited by saltz
Link to comment
13 minutes ago, JorgeB said:

Unraid driver is still crashing, was disk1 involved in the swap or was it and existing good disk?


It was an existing good disk.

Bad drive = ST4000DM005-2DP166_ZDH1MCX9 - 4 TB (sdg) —> Removed
Replacement drive = WDC_WD80EZAZ-11 TDBA0_1EJVDAUZ - 8 TB —> Re assigned
New parity = 5HUH721010ALE601_1DGERYNZ - 10 TB (sdf) —> New disk
Old parity = WDC_WD80EZAZ-11 TDBA0_1EJVDAUZ - 8 TB

Link to comment
Posted (edited)
45 minutes ago, JorgeB said:

Try checking filesystem on that disk, run it without -n, then reboot and see if it mounts, if it's the same the server could be having a hardware issue.


I rebooted the system by hand and started the array in maintainance mode. But as the disk fs type is set to auto I could not use the webgui. I know it is a xfs disk, so I ran the command:

 

xfs_repair -v /dev/sdb1



Output:

xfs_repair -v /dev/sdb1
Phase 1 - find and verify superblock...
        - block cache size set to 1504920 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2203007 tail block 2203007
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 1
        - agno = 2
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Mon May 13 17:48:32 2024

Phase		Start		End		Duration
Phase 1:	05/13 17:48:24	05/13 17:48:24	
Phase 2:	05/13 17:48:24	05/13 17:48:24	
Phase 3:	05/13 17:48:24	05/13 17:48:28	4 seconds
Phase 4:	05/13 17:48:28	05/13 17:48:28	
Phase 5:	05/13 17:48:28	05/13 17:48:28	
Phase 6:	05/13 17:48:28	05/13 17:48:32	4 seconds
Phase 7:	05/13 17:48:32	05/13 17:48:32	

Total run time: 8 seconds
done


Looks okay? I rebooted again and attempted to start the array as normal, which again seems to freeze/ crash the driver. After yet another restart, I manually mounted the disk like so:

mkdir /mnt/tempdisk1
mount /dev/sdb1 /mnt/tempdisk1

 

Which succeeds!

CAT of /proc/mounts
 

/dev/sdb1 /mnt/tempdisk1 xfs rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0


First off, thanks for all the help so far! I really appreciate it. Is there any more information I could gather that would help with finding out why the unraid driver keeps crashing? As I cannot start the array still, and I need to rebuild disk3 from the parity.

newton-diagnostics-20240513-1754.zip

Edited by saltz
Link to comment
59 minutes ago, saltz said:

Which succeeds!

Looks like for some reason that filesystem is making the Unraid driver crash, when you mount it manually it won't use the md driver, same as if you mount it with a pool.

 

Don't remember ever seeing this before, but possibly you will need to mount the disk with UD or a pool, then backup and reformat, but first, and with the array stopped, click on that disk and change the filesystem to for example btrfs, the disk won't mount but it's just to see if the others do, to confirm it a problem with only that disk, don't format any disk while doing that.

Link to comment
Posted (edited)

Very strange indeed.

 

My disks are fine, they worked and only the bad disk failed thus I performed the parity swap procedure.

 

I performed all steps of the parity swap procedure up untill and step 14. I was not able to start the array as I refreshed the tab and it wanted me to perform the copy again. So I found this thread and performed the steps to circumvent it. Create new config with perserve all selected. Run the command mdcmd set invalidslot 3 29. (It so happens my third disk failing) And then I started the array, which crashed before mounting disks. It just showed starting array. On which I sought help through this thread. And now it keeps crashing on the mount step.

Edited by saltz
Link to comment
17 minutes ago, JorgeB said:

Do you have a spare disk you could format in the array to see if it works or if you get the same issue?

Yes I do! I have one extra disk I was planning to add after the data rebuild was done.

 

Is it safe to add it to the configuration at this point? And if so could you help me detail out the steps?

Link to comment

I assume it's going to be the same, but before trying the new disk, set disk2 to btrfs as well, unassign disk3, and start the array, this to see if the emulated disk3 is mounting, and if it is, I would expect it will also crash, but post diags after anyway.

Link to comment
Posted (edited)
38 minutes ago, JorgeB said:

I assume it's going to be the same, but before trying the new disk, set disk2 to btrfs as well, unassign disk3, and start the array, this to see if the emulated disk3 is mounting, and if it is, I would expect it will also crash, but post diags after anyway.

I performed the steps but see the disk is shown with a mount error (see screenshot).

 

And indeed as suspected it got stuck at the next disk trying to mount.

Screenshot_20240514-153910.png

newton-diagnostics-20240514-1538.zip

Edited by saltz
Link to comment

Disk4 also getting stuck is kind of expected, but the emulated disk3 not mounting is not, meaning something in the process didn't went well, I assume disk3 was also xfs? Try starting the array in maintenance mode and check filesystem on that disk, set it to xfs first, because if that disk is not mounting no point in rebuilding anyway.

Link to comment
Posted (edited)
25 minutes ago, JorgeB said:

Disk4 also getting stuck is kind of expected, but the emulated disk3 not mounting is not, meaning something in the process didn't went well, I assume disk3 was also xfs? Try starting the array in maintenance mode and check filesystem on that disk, set it to xfs first, because if that disk is not mounting no point in rebuilding anyway.


Disk3 was indeed also xfs. Okay, I just performed those steps and it is not looking good...
 

xfs_repair -v /dev/md3p1
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
.........................


(I shortend the amount of dots from the output)

Is there any way I can revert to my configuration before the parity swap procedure? As my old parity disk has never been mounted, its contents should be valid?
So I start my array with my old parity disk and all the other disks at their original position except disk3 as I have pulled it, and perform the whole procedure from the top?

If that is not possible I assume I lost all the content that was on disk3, how would I continue from this point? How can I get my array to start again? :D

Edited by saltz
Link to comment
1 hour ago, saltz said:

Is there any way I can revert to my configuration before the parity swap procedure?

 

You can try:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including the old parity2 disk, you also need a spare disk to assign as disk3, it can be only temporarily for now, that disk should be same size or larger than the old disk3.
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk3
-Start array (in normal mode now) and post new diags.
 

Link to comment

I have been a slight idiot...

I performed the steps but did not have a spare disk in my system that could go in the slot of disk3 that would be smaller than my old parity (8tb). So now it looks like disk3 is no longer part of the array. But my array is able to start again....

 

I am already happy I'm at this point! So now I'm thinking to leave it as is. Upgrade my parity disk as I was initially planned. And try to read the data on my old bad disk to see if and what I can restore from it. Does that make any sense?

Thanks again for all the help, and I'm glad my array is starting again with most of my original data.

Link to comment
8 minutes ago, saltz said:

But my array is able to start again....

That's good, I don't know what was causing the md crashing, but I've never seen that before, possible something specific you did.

 

17 minutes ago, saltz said:

And try to read the data on my old bad disk to see if and what I can restore from it. Does that make any sense?

It does, most times the disk are not completely dead, and maybe the disk itself is OK.

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...