January 12, 20242 yr Good morning, I had 2 drives with smart errors and a 3rd with read errors (disk2, disk8 and disk11). I decided to replace two and remove one. I didn't want to lose parity in the event more failed, so I thought shrinking the array may be best. I ordered new drives, pre-clear them, and began the #2 procedure (from the legacy documentation) to shrink the array. Two drives ended up failing, the one being zero'ed (disk 11) and another (disk7). Now I'm faced with decision to continue the shrinking procedure to remove disk 11, while disk7 is emulated OR allow a rebuild although disk2 and disk8 still have smart errors (btw, I've checked/replaced cables, but still have those smart errors). I am looking for advice on a couple fronts. 1) Does the fact that disk 11 failed during zero'ing and continued zero'ing as an emulated drive cause problems continuing with the shrinking procedure and 2) Can I use the new config tool, remove disk 11 and leave emulated disk7 or will the new config tool (keeping assignments for parity and disk7, but not disk11 AND checking the valid parity option) wipe out disk7's emulation? I hope these question make sense as presented. Thank you for your time ! marlin-diagnostics-20240112-1037.zip
January 12, 20242 yr Community Expert Diags are after rebooting, so we can't see what happened, but disks 2 and 8 look healthy, they show some UDMA CRC errors, that is usually a bad SATA cable, just replace it and check that the errors stop increasing. If you want to remove disk7 I would just do a new config without it and re-sync parity (this assuming all other disks are healthy), you can still zero the emulated disk, but IMHO not the best option, since it will also read all the other disks. P.S: disk 7 also looks healthy.
January 12, 20242 yr Author 1 hour ago, JorgeB said: Diags are after rebooting, so we can't see what happened My apologies 1 hour ago, JorgeB said: but disks 2 and 8 look healthy, they show some UDMA CRC errors, that is usually a bad SATA cable, just replace it and check that the errors stop increasing. I've replaced all the cables and still had the UDMA CRC errors increasing. Might need to consider replacing the JBOD case cables / backplanes next. I have reseated them. 2 hours ago, JorgeB said: this assuming all other disks are healthy disk7 and disk11 are disabled and emulated 2 hours ago, JorgeB said: you can still zero the emulated disk disk11 has already been zero'd 2 hours ago, JorgeB said: but IMHO not the best option, since it will also read all the other disks I believe if it is only in reconstruct write aka turbo wrote mode ? 2 hours ago, JorgeB said: P.S: disk 7 also looks healthy maybe so, but unraid disabled it... read errors 2 hours ago, sniggil said: 1) Does the fact that disk 11 failed during zero'ing and continued zero'ing as an emulated drive cause problems continuing with the shrinking procedure and 2) Can I use the new config tool, remove disk 11 and leave emulated disk7 ? Not sure which way to go at this point. I thought removing disk11 and rebuilding 7 was the way to go.
January 13, 20242 yr Community Expert 13 hours ago, sniggil said: I believe if it is only in reconstruct write aka turbo wrote mode ? Since the disk is being emulated it will happen in any mode. If you keep getting UDMA CRC errors (and apparently healthy disks getting disabled) with new cables it could be a backplane/controller issue. 13 hours ago, sniggil said: Not sure which way to go at this point. I thought removing disk11 and rebuilding 7 was the way to go. You can, does disk7 still have data or it's already been emptied?
February 20, 20242 yr Author On 1/13/2024 at 4:05 AM, JorgeB said: If you keep getting UDMA CRC errors (and apparently healthy disks getting disabled) with new cables it could be a backplane/controller issue. I’ve rebuilt the server. Got rid of the JBOD box in case there are backplane issues and moved everything internal in a new case and a new controller. Currently I’m getting high CPU usage with BTRFS: errors and warnings along with device resets. Is there a chance someone could take a look and see if I’m missing something obvious. marlin-diagnostics-20240219-2032.zip
February 20, 20242 yr Community Expert Connection problems with each of the drives in cache pool, corruption on disk2.
February 20, 20242 yr Author 17 minutes ago, trurl said: Connection problems with each of the drives in cache pool I’ll double check the physical connections. 18 minutes ago, trurl said: corruption on disk2 Any suggestions to correct this ?
February 20, 20242 yr Community Expert Run a scrub on that disk, then look at the syslog for a list of corrupt files, those should be deleted/restored from a backup.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.