Unraid Parity disk error at the EXACT moment I plugged in a new 18 TB drive (with system live)?


johmei

Recommended Posts

I currently have 2 x 12 TB parity drives, 1 x 12 TB data drive and 3 x 4 TB data drives (4 TB drives are by far the oldest).  I am wanting to replace both 12 TB parity drives with 18 TB drives and either add the two 12 TB parity drives to the array as data drives, or replace two of the 4 TB drives with the 12 TB drives.

 

I wanted to preclear my 18 TB drives so I plugged one into my system (I have a dock at the top that plugs into the MB, the other drives are all on an "LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)" controller)....the very second I did this, I received the follow errors in my notification bar.  So I'm running an extended SMART test on the parity drive now but I just can't imagine there is any connection to plugging in that drive...is that possible or is it a coincidence? 

 

I have the errors below from the notification archive.

 

29-01-2023 20:23    Unraid array errors    Warning [JOHNSNAS] - array has errors    Array has 1 disk with read errors    warning    
Parity disk - ST12000VN0008-2PH103_ZS80483R (sdc) (errors 2048)    
29-01-2023 20:23    Unraid Parity disk error    Alert [JOHNSNAS] - Parity disk in error state (disk dsbl)    ST12000VN0008-2PH103_ZS80483R (sdc)    alert

 

Thanks!

Link to comment

Attach diagnostics to your NEXT post in this thread.

 

Hotswap is not supported well by all hardware. Usually best to avoid it.

 

I suspect attempted hotswap is exactly what disabled the other disk. It will have to be rebuilt.

 

I know this is not what you were trying to do, but in the more usual case of trying to replace a disk, Unraid won't do anything with the replacement until it is assigned, and you can't change assignments with the array started, so might as well shut down.

 

 

 

  • Like 1
Link to comment
2 hours ago, trurl said:

Attach diagnostics to your NEXT post in this thread.

 

Hotswap is not supported well by all hardware. Usually best to avoid it.

 

I suspect attempted hotswap is exactly what disabled the other disk. It will have to be rebuilt.

 

I know this is not what you were trying to do, but in the more usual case of trying to replace a disk, Unraid won't do anything with the replacement until it is assigned, and you can't change assignments with the array started, so might as well shut down.

 

 

 

 

Is that general rule about hotswapping true even if the MB bios has settings for hotswap support?  I still would rather not risk it in the future so I'll probably abandon that regardless, but I'm just curious.

 

I suppose if it has to be rebuilt anyway...I might as well have it rebuild onto the new 18 TB drive?  I assume I should replace each parity drive only one at a time to reduce the risk of data loss should something go wrong?

 

Thanks!!

johnsnas-diagnostics-20230130-1159.zip

Link to comment

I have successfully used hotswap (of sorts) with an eSATA enclosure by powering down the enclosure to install or remove disks. The other end of the eSATA cable was plugged into SATA port right next to others being used by my array. This was used with Unassigned Devices to create external backups.

 

I use a USB3 enclosure now since it is more generally useful than eSATA, and it works well with preclear and other things that require transparent access to the disk, including SMART.

 

Many USB enclosures have interfaces that don't work well for our purposes, not passing SMART or even serial number, maybe even giving a different view of the disk so it isn't the standard size.

  • Like 1
Link to comment

Since only parity (P) is disabled I would replace that one first.

 

Really probably not much risk to replacing both at once if you are careful with connections, don't write to your server while rebuilding, and keep the original parity disks so they could be used if necessary. And parity contains none of your data anyway.

  • Like 1
Link to comment
1 hour ago, trurl said:

Since only parity (P) is disabled I would replace that one first.

 

Really probably not much risk to replacing both at once if you are careful with connections, don't write to your server while rebuilding, and keep the original parity disks so they could be used if necessary. And parity contains none of your data anyway.

All sounds good, thanks!  I'll still probably replace it one at a time because I'm paranoid, but it's good to know it would be fine either way!  Thanks again!

Link to comment
5 minutes ago, johmei said:

Also, the extended smart test passed without any issues!  So I'll go ahead and rebuild parity when I get home onto my new 18 TB...should I still refrain from writing to my server while rebuilding even if it's only 1 of the 2 parity drives?

Theoretically you can write to the array, but the writing and parity build both have significantly degraded performance while they are running in parallel so you may prefer not to.

  • Like 1
Link to comment
1 hour ago, itimpi said:

Theoretically you can write to the array, but the writing and parity build both have significantly degraded performance while they are running in parallel so you may prefer not to.

Yeah, I'd definitely prefer to just let it build without writing to it in that case.  Thanks!!

 

1 hour ago, JorgeB said:

Note that sometimes the board/controller can support hot plug but if were the drives are connected doesn't connecting a new device might create a slight voltage drop that can drope another disk.

Got ya.  I'm gonna play it safe from this point forward!

Link to comment

Hey! This (or something very similar) just happened to me a few days ago!

 

Initially I had:

 

Parity Drive - Seagate 14TB

Data Drive 1 - Western Digital 8TB

Data Drive 2 - Western Digital 8TB

 

I bought 3 x Western Digital 18TB, my idea was to replace the 14TB drive with one Western Digital 18TB drive, then add the 2 remaining Western Digital 18TB and the old Seagate 14TB drive to the array. It's not related to this but my next planned step was to move all data from the 8TB drives to the new free space in the drive, then replace them with 2 Western Digital 12TB drives I have around, but that's another story for another topic I was planning to create.

 

With all disks in the system, I precleared all three Western Digital 18TB as usual. Then I stopped the array, removed the 14TB from it (not physically from the machine), added one of the Western Digital 18TB drives as the new Parity Drive, then started the array.

 

My heart skipped a beat when I suddenly was notified of this:

1855341933_Image003un.thumb.png.71a00d757311390e75e04363e9d711dc.png

 

Those two disks were perfect, I checked the SMART logs to see if anything had changed but no errors there. I even did a short and long SMART tests on both drives (after parity was rebuilt) and no error was found.

 

Not sure why this happened, it seems to be an error due to "hot swapping" devices, the system might have been confused with old/trash/"dirty" SMART data lying around for those devices. I'm including diagnostics just in case it helps.

 

 

boxy-diagnostics-20230130-2308.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.