Unassigned Devices Preclear - a utility to preclear disks before adding them to the array


dlandon

Recommended Posts

1 minute ago, Mojo Ryzen said:

 

You've done so much for the Unraid community, I'm happy to help you any way I can. 

I appreciate that as I don't have any disks near that large to test with.

 

Right now the 'post-read' reads every sector on the disk to memory to verify it is zeroed.  This is really overkill and seems to test memory more than the disk.  I want to do the verificaton without reading all the sectors into memory.

  • Like 1
Link to comment

A pre clear ended early as "successful" (obviously something went wrong, 14TB finished in only 20 minutes). This is the log.

 

malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting.../usr/local/emhttp/plugins/unassigned.devices.preclear/scripts/preclear_disk.sh: line 1678
: 100/2 - /2 : syntax error: operand expected (error token is "/2 ")

malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting.../usr/local/emhttp/plugins/unassigned.devices.preclear/scripts/preclear_disk.sh: line 2709
: [: : integer expression expected

malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting.../usr/local/emhttp/plugins/unassigned.devices.preclear/scripts/preclear_disk.sh: line 1678
: 100/2 - /2 : syntax error: operand expected (error token is "/2 ")

malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting...
malloc: jobs.c:1435: assertion botched
free: underflow detected; magic8 corrupted
Aborting.../usr/local/emhttp/plugins/unassigned.devices.preclear/scripts/preclear_disk.sh: line 2725
: /boot/preclear_reports/: Is a directory

 

What is going on here? I've googled magic8 but found nothing. The drives are entirely empty and new. Thanks for any assistance.

Preclear Error.rtf

Link to comment
On 6/23/2023 at 11:14 AM, adammerkley said:

Apologies if this has been asked previously, I searched the thread and couldn't find it.

 

I started a 3 cycle pre clear on a 16TB drive a few days ago.  It's almost finished with the post-read on cycle 1.  Is it possible to change the settings to just 1 cycle at this point?

 

On 6/23/2023 at 3:07 PM, dlandon said:

No.  You can pause and restart where it left off, but you can't change the test parameters.

Potential feature enhancement or is it fundamentally never possible?  I bought a 2 cycle preclear and have buyer's remorse.  After 8 hours, 1st cycle is on step 1 of 5 and 60% progress.  Just ride the wave?

Link to comment
On 8/12/2023 at 8:35 AM, dlandon said:

I'll be testing a different approach to post reads and will have an updated preclear in a day or two.  If you are willing, I'd like you to give it a try.

 

Hey @dlandon, I just saw your update to the preclear plugin come out. Wanted to check to see if this is what I need for my 20TB drive. Standing by ready to go, just give the word. 

Link to comment
5 hours ago, dlandon said:

Yes, please give it a go.

@dlandon, so my server crashed unexpectedly between 7:21pm and 1:20AM. I don't think it's the preclear process, but I've been having unclean shutdowns for unknown reasons so I mirrored my syslogs to flash and the last one is attached from 7:21pm. My cache drive had some corrupt files on it and I fixed those but it's still crashing. 

 

is there anything you see that would explain the repeat crashes? Sorry if there's a better forum, this is the only thread I have to pull on right now.

syslog

Link to comment
5 hours ago, Mojo Ryzen said:

 

is there anything you see that would explain the repeat crashes? Sorry if there's a better forum, this is the only thread I have to pull on right now.

Looks like the macvlan issue got you:

Aug 13 09:42:12 Tower kernel: ------------[ cut here ]------------
Aug 13 09:42:12 Tower kernel: WARNING: CPU: 8 PID: 19104 at net/netfilter/nf_conntrack_core.c:1210 __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: Modules linked in: xt_mark udp_diag xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth macvlan xt_nat xt_tcpudp xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xfs ccp nvidia_uvm(PO) md_mod zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag amdgpu gpu_sched drm_ttm_helper nct6775 nct6775_core hwmon_vid iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables efivarfs af_packet 8021q garp mrp bridge stp llc bonding tls nvidia_drm(PO) nvidia_modeset(PO) i915 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel nvidia(PO) iosf_mbi drm_buddy kvm i2c_algo_bit
Aug 13 09:42:12 Tower kernel: crct10dif_pclmul crc32_pclmul ttm crc32c_intel ghash_clmulni_intel sha512_ssse3 drm_display_helper aesni_intel btusb drm_kms_helper crypto_simd mei_hdcp mei_pxp intel_gtt btrtl cryptd btbcm btintel rapl wmi_bmof intel_cstate bluetooth mpt3sas i2c_i801 drm intel_uncore nvme mei_me agpgart input_leds i2c_smbus ecdh_generic ahci raid_class e1000e i2c_core mei scsi_transport_sas led_class ecc joydev nvme_core libahci syscopyarea sysfillrect vmd sysimgblt fb_sys_fops thermal fan video tpm_crb tpm_tis tpm_tis_core wmi tpm backlight intel_pmc_core acpi_tad acpi_pad button unix
Aug 13 09:42:12 Tower kernel: CPU: 8 PID: 19104 Comm: kworker/u40:1 Tainted: P S         O       6.1.38-Unraid #2
Aug 13 09:42:12 Tower kernel: Hardware name: ASUS System Product Name/PRIME Z690M-PLUS D4, BIOS 1203 03/04/2022
Aug 13 09:42:12 Tower kernel: Workqueue: events_unbound macvlan_process_broadcast [macvlan]
Aug 13 09:42:12 Tower kernel: RIP: 0010:__nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: Code: 44 24 10 e8 e2 e1 ff ff 8b 7c 24 04 89 ea 89 c6 89 04 24 e8 7e e6 ff ff 84 c0 75 a2 48 89 df e8 9b e2 ff ff 85 c0 89 c5 74 18 <0f> 0b 8b 34 24 8b 7c 24 04 e8 18 dd ff ff e8 93 e3 ff ff e9 72 01
Aug 13 09:42:12 Tower kernel: RSP: 0018:ffffc900003a8d98 EFLAGS: 00010202
Aug 13 09:42:12 Tower kernel: RAX: 0000000000000001 RBX: ffff88813c3f7d00 RCX: 3e2bda644a39e8b0
Aug 13 09:42:12 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88813c3f7d00
Aug 13 09:42:12 Tower kernel: RBP: 0000000000000001 R08: bc81bf30a6d60746 R09: b81ee9915673b8ee
Aug 13 09:42:12 Tower kernel: R10: 02dbe46e6b3802c4 R11: ffffc900003a8d60 R12: ffffffff82a11d00
Aug 13 09:42:12 Tower kernel: R13: 0000000000037cac R14: ffff8882781d6200 R15: 0000000000000000
Aug 13 09:42:12 Tower kernel: FS:  0000000000000000(0000) GS:ffff88885f400000(0000) knlGS:0000000000000000
Aug 13 09:42:12 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 13 09:42:12 Tower kernel: CR2: 000035729acd8000 CR3: 000000000420a000 CR4: 0000000000750ee0
Aug 13 09:42:12 Tower kernel: PKRU: 55555554
Aug 13 09:42:12 Tower kernel: Call Trace:
Aug 13 09:42:12 Tower kernel: <IRQ>
Aug 13 09:42:12 Tower kernel: ? __warn+0xab/0x122
Aug 13 09:42:12 Tower kernel: ? report_bug+0x109/0x17e
Aug 13 09:42:12 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: ? handle_bug+0x41/0x6f
Aug 13 09:42:12 Tower kernel: ? exc_invalid_op+0x13/0x60
Aug 13 09:42:12 Tower kernel: ? asm_exc_invalid_op+0x16/0x20
Aug 13 09:42:12 Tower kernel: ? __nf_conntrack_confirm+0xa4/0x2b0 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: ? __nf_conntrack_confirm+0x9e/0x2b0 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: ? nf_nat_inet_fn+0x123/0x1a8 [nf_nat]
Aug 13 09:42:12 Tower kernel: nf_conntrack_confirm+0x25/0x54 [nf_conntrack]
Aug 13 09:42:12 Tower kernel: nf_hook_slow+0x3a/0x96
Aug 13 09:42:12 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Aug 13 09:42:12 Tower kernel: NF_HOOK.constprop.0+0x79/0xd9
Aug 13 09:42:12 Tower kernel: ? ip_protocol_deliver_rcu+0x164/0x164
Aug 13 09:42:12 Tower kernel: __netif_receive_skb_one_core+0x77/0x9c
Aug 13 09:42:12 Tower kernel: process_backlog+0x8c/0x116
Aug 13 09:42:12 Tower kernel: __napi_poll.constprop.0+0x28/0x124
Aug 13 09:42:12 Tower kernel: net_rx_action+0x159/0x24f
Aug 13 09:42:12 Tower kernel: __do_softirq+0x126/0x288
Aug 13 09:42:12 Tower kernel: do_softirq+0x7f/0xab
Aug 13 09:42:12 Tower kernel: </IRQ>
Aug 13 09:42:12 Tower kernel: <TASK>
Aug 13 09:42:12 Tower kernel: __local_bh_enable_ip+0x4c/0x6b
Aug 13 09:42:12 Tower kernel: netif_rx+0x52/0x5a
Aug 13 09:42:12 Tower kernel: macvlan_broadcast+0x10a/0x150 [macvlan]
Aug 13 09:42:12 Tower kernel: ? _raw_spin_unlock+0x14/0x29
Aug 13 09:42:12 Tower kernel: macvlan_process_broadcast+0xbc/0x12f [macvlan]
Aug 13 09:42:12 Tower kernel: process_one_work+0x1a8/0x295
Aug 13 09:42:12 Tower kernel: worker_thread+0x18b/0x244
Aug 13 09:42:12 Tower kernel: ? rescuer_thread+0x281/0x281
Aug 13 09:42:12 Tower kernel: kthread+0xe4/0xef
Aug 13 09:42:12 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b
Aug 13 09:42:12 Tower kernel: ret_from_fork+0x1f/0x30
Aug 13 09:42:12 Tower kernel: </TASK>
Aug 13 09:42:12 Tower kernel: ---[ end trace 0000000000000000 ]---

 

Some other issues:

  • Turn off mover logging.
  • You have a nvme drive issue?
Aug 13 09:37:36 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 75100532 off 12288 csum 0x154af3bd expected csum 0x90be6d72 mirror 1
Aug 13 09:37:36 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 29, gen 0
Aug 13 09:37:36 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 75100532 off 16384 csum 0xeb756e35 expected csum 0x8188ffff mirror 1
Aug 13 09:37:36 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 30, gen 0
Aug 13 09:37:36 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 75100532 off 12288 csum 0x154af3bd expected csum 0x90be6d72 mirror 1
Aug 13 09:37:36 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 31, gen 0
Aug 13 09:37:36 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 75100532 off 12288 csum 0x154af3bd expected csum 0x90be6d72 mirror 1
Aug 13 09:37:36 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 32, gen 0
Aug 13 09:37:36 Tower kernel: BTRFS warning (device nvme0n1p1): csum failed root 5 ino 75100532 off 12288 csum 0x154af3bd expected csum 0x90be6d72 mirror 1
Aug 13 09:37:36 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 0, corrupt 33, gen 0
Aug 13 09:41:24 Tower ool www[20856]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' ''
  • Display error:
    Aug 13 03:46:05 Tower kernel: EDID block 0 (tag 0x00) checksum is invalid, remainder is 125

    Check your monitor connection and possibly replace the cable.

Link to comment
On 8/14/2023 at 7:29 PM, dlandon said:

Yes, please give it a go.

@dlandon, sorry for the delay, it took me some time to work through the issues you pointed out and gave me guidance on and man this combination of cache plus macvlan issues was a pain in the ass. I ended up reformatting my cache drive and rebuilding from a backup plus I deleted my Docker vdisk and rebuilt that from the saved app templates. There may have been a better way but I got everything running again and then ran your pre-clear plugin (took 3 days) but it finished successfully!

 

Thanks again for everything, hopefully it's smooth sailing from here!

Link to comment
4 minutes ago, Mojo Ryzen said:

There may have been a better way but I got everything running again and then ran your pre-clear plugin (took 3 days) but it finished successfully!

I'm glad you got everything back to normal and the preclear worked properly.  It was a bit of a guess on my part, but it looks like I got it right.

 

You should consider the 6.12.4.rc18 version.  We think the macvlan issues are resolved with this version.

Link to comment
12 minutes ago, dlandon said:

You should consider the 6.12.4.rc18 version.  We think the macvlan issues are resolved with this version.

 

I saw that this morning on the r/unraid subreddit and I sighed audibly because of all the time I spent reading posts and troubleshooting. So many restarts and logs and googling and searching the forum. I have a few remaining issues to troubleshoot unrelated to macvlan but I'm going to move to 6.12.4rc as soon as I can. Thanks for the continued support. 

Link to comment

Hi Guys

Very new to unraid and trying to get my discs precleared before adding them to my array. (I aplogise, I have not read all 19 pages of this thread).

I have managed to install the plugin and I have discovered the destructive mode setting that I have enabled. I am not able to delete the partitions on my discs though. I click the red x on either the disc or the partition, I then click the clean button and nothing happens. The dialog stays there and nothing changes...

I have tried this on 3 identical discs and 1 different disc and I get the same behavior.

 

Is this user error or something else?

 

I am on unraid 6.12.3

 

Thanks

 

 

image.png

image.png

image.png

Link to comment
9 hours ago, wgstarks said:

Did you also type in the word Yes?

So the answer to my question is: "yes, it's user error"!

Ok obviously this was stupid of me. I never saw the text box. It was incredibly faint on my screen, I only read "remove partition 1" and I assumed that as the button was not disabled it was ready to do its thing!

Making the button disabled and highlighting the text box if the button is clicked would make this more idiot proof.

Thanks for pointing out my stupidity!

 

I now have an issue with the whole crashing and needing to be power cycled when I start a preclear with no pre check. I suspect I might have done hardware issues...

Link to comment

I started pre-clearing three 16TB disks; two started pre-read, but one pre-read started and then almost immediately paused. I tried restarting the pre-clear on that disk, but it went straight to the paused pre-read state. The log shows "Pause requested by queue manager". Should I worry about this?

 

Dashboard shows CPU at under 25% and RAM usage is at 4%. So those should not be causing any problems.

 

Also, having read a bit of this thread, can I set the number of cycles back to 1 during the pre-clear? I currently have the number set to 2, which I understand from an early posting in this thread is unnecessary.

Link to comment

I had originally set my brand new disk pre-clear to 2 cycles, but decided it wasn't worth taking the extra time. So I stopped the pre-clear after the post-read had completed (and the next cycle had started with pre-read).

 

What do I need to do now to be able to add this disk to the array? I read through the various choices in the first post, but it's not clear to me what is required now.

 

 

Link to comment
2 hours ago, Kilrah said:

Just add it, unraid will clear if needed anyway.

 

That part I understand. My issue is that if, having completed the pre-clear cycle, and therefore having written zeroes to the entire 16TB drive, this will result in writing 16TB of zeroes again, somewhat unnecessarily.

 

Since I have completed one pre-clear cycle and only barely started the pre-read step of the second cycle before stopping, I am hoping that the disk has the two requirements for adding to the array as a cleared disk:

- the drive is filled with zeroes

- the drive has a signature that affirms this

 

So if I add the drive in its current state to the array, will unRAID recognise it as a cleared drive and therefore allow me to format it as a new array drive?

 

Or can I start a new pre-clear and select just the Verify Signature option?

 

I'm trying to avoid doing an unnecessary re-zeroing of the drive if possible.

Link to comment
10 minutes ago, sonofdbn said:

I'm trying to avoid doing an unnecessary re-zeroing of the drive if possible.

Well you already stopped it so there's nothing you can do now. If it curently has the preclear signature then it'll go straight in, if not then the whole drive will have to be rewritten regardless. You can do a Verify signature to know, but... you'll also know once added to the array by whether it starts a clear or not.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.