SHRINKING THE ARRAY QUESTION with multiple Drive Emulations

January 12, 20242 yr

Good morning,

I had 2 drives with smart errors and a 3rd with read errors (disk2, disk8 and disk11). I decided to replace two and remove one. I didn't want to lose parity in the event more failed, so I thought shrinking the array may be best. I ordered new drives, pre-clear them, and began the #2 procedure (from the legacy documentation) to shrink the array. Two drives ended up failing, the one being zero'ed (disk 11) and another (disk7). Now I'm faced with decision to continue the shrinking procedure to remove disk 11, while disk7 is emulated OR allow a rebuild although disk2 and disk8 still have smart errors (btw, I've checked/replaced cables, but still have those smart errors). I am looking for advice on a couple fronts. 1) Does the fact that disk 11 failed during zero'ing and continued zero'ing as an emulated drive cause problems continuing with the shrinking procedure and 2) Can I use the new config tool, remove disk 11 and leave emulated disk7 or will the new config tool (keeping assignments for parity and disk7, but not disk11 AND checking the valid parity option) wipe out disk7's emulation?

I hope these question make sense as presented. Thank you for your time !

marlin-diagnostics-20240112-1037.zip

Quote

January 12, 20242 yr

Community Expert

Diags are after rebooting, so we can't see what happened, but disks 2 and 8 look healthy, they show some UDMA CRC errors, that is usually a bad SATA cable, just replace it and check that the errors stop increasing.

If you want to remove disk7 I would just do a new config without it and re-sync parity (this assuming all other disks are healthy), you can still zero the emulated disk, but IMHO not the best option, since it will also read all the other disks.

P.S: disk 7 also looks healthy.

Quote

January 12, 20242 yr

Author

1 hour ago, JorgeB said:

Diags are after rebooting, so we can't see what happened

My apologies

1 hour ago, JorgeB said:

but disks 2 and 8 look healthy, they show some UDMA CRC errors, that is usually a bad SATA cable, just replace it and check that the errors stop increasing.

I've replaced all the cables and still had the UDMA CRC errors increasing. Might need to consider replacing the JBOD case cables / backplanes next. I have reseated them.

2 hours ago, JorgeB said:

this assuming all other disks are healthy

disk7 and disk11 are disabled and emulated

2 hours ago, JorgeB said:

you can still zero the emulated disk

disk11 has already been zero'd

2 hours ago, JorgeB said:

but IMHO not the best option, since it will also read all the other disks

I believe if it is only in reconstruct write aka turbo wrote mode ?

2 hours ago, JorgeB said:

P.S: disk 7 also looks healthy

maybe so, but unraid disabled it... read errors

2 hours ago, sniggil said:

1) Does the fact that disk 11 failed during zero'ing and continued zero'ing as an emulated drive cause problems continuing with the shrinking procedure and 2) Can I use the new config tool, remove disk 11 and leave emulated disk7 ?

Not sure which way to go at this point. I thought removing disk11 and rebuilding 7 was the way to go.

Quote

January 13, 20242 yr

Community Expert

13 hours ago, sniggil said:

I believe if it is only in reconstruct write aka turbo wrote mode ?

Since the disk is being emulated it will happen in any mode.

If you keep getting UDMA CRC errors (and apparently healthy disks getting disabled) with new cables it could be a backplane/controller issue.

13 hours ago, sniggil said:

Not sure which way to go at this point. I thought removing disk11 and rebuilding 7 was the way to go.

You can, does disk7 still have data or it's already been emptied?

Quote

February 20, 20242 yr

Author

On 1/13/2024 at 4:05 AM, JorgeB said:

If you keep getting UDMA CRC errors (and apparently healthy disks getting disabled) with new cables it could be a backplane/controller issue.

I’ve rebuilt the server. Got rid of the JBOD box in case there are backplane issues and moved everything internal in a new case and a new controller. Currently I’m getting high CPU usage with BTRFS: errors and warnings along with device resets. Is there a chance someone could take a look and see if I’m missing something obvious.

marlin-diagnostics-20240219-2032.zip

Quote

February 20, 20242 yr

Community Expert

Connection problems with each of the drives in cache pool, corruption on disk2.

Quote

February 20, 20242 yr

Author

17 minutes ago, trurl said:

Connection problems with each of the drives in cache pool

I’ll double check the physical connections.

18 minutes ago, trurl said:

corruption on disk2

Any suggestions to correct this ?

Quote

February 20, 20242 yr

Community Expert

Run a scrub on that disk, then look at the syslog for a list of corrupt files, those should be deleted/restored from a backup.

Quote

SHRINKING THE ARRAY QUESTION with multiple Drive Emulations

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)