Multiple drive failure


Recommended Posts

Good evening

 

Unraid V6.12.9

 

So I have a 7 x 3TB disk array with 1 x 3TB parity and 1 x 250gb cache.

 

One of my drives failed yesterday (D6). So I did an extended smart report (attached) and because it was giving a WRITE FPDMA QUEUED error, I thought I'd try stopping the array, reseating it, un-assigning the failed drive, starting the array, stopping the array again, reassigning the "failed" drive and starting the array again.

 

This failed 49% though, so I was in the process of getting a different drive when another drive completely failed (D3) and with the server being in the cellar and me having to lift a hatch and crawl to the server, I thought the drive must have just been having a paddy so I lazily and stupidly rebooted the server. Now I (expectedly) can't start the array as I have 2 drives and only 1 parity.

 

Am I correct in saying I need to go to Tools -> New Config -> Preserve current assignments -> All -> Done and then un-assigning the failed disks (D4 + D6) and starting the array again, then after the new drives have precleared fine, adding them into the array?

 

I know I'll have lost all the data that was on D4+D6 but c'est la vie, there's nothing I can do about it now.

 

TIA.

WDC_WD30EFRX-68EUZN0_WD-WMC4N0F2RUHT-20240328-0847.txt

Edited by Piemanpm
Spelling
Link to comment
15 minutes ago, Piemanpm said:

Didn't think they'd be helpful as I'd restarted.

Pre-reboot would be better, but these allow to check SMART for both disks, which does not look great, but most data should still be salvageable, re-sync parity and see if you get any errors from both disks.

Link to comment

First post mentions disks 3 and 6 failing, but if disk 4 is missing, parity is no longer valid, you can sync instead with the remaining disks, and then see if you don't get read errors from disks 3 and 6, or others, all data from disk4 will be gone, if other disks have read errors during the sync you could lose some more data, and based on SMART, it's very possible.

Link to comment

You should add attributes 1 and 200 for monitoring on all your WD drives. When you do, you will see several drives with SMART ( 👎 ) warnings on the Dashboard page.

 

And setup Notifications to alert you by email or other agent as soon as a problem is detected.

Link to comment
Posted (edited)
19 minutes ago, JorgeB said:

First post mentions disks 3 and 6 failing, but if disk 4 is missing, parity is no longer valid, you can sync instead with the remaining disks, and then see if you don't get read errors from disks 3 and 6, or others, all data from disk4 will be gone, if other disks have read errors during the sync you could lose some more data, and based on SMART, it's very possible.

 

My apologies, I did mean Disk 4. So I leave "Parity is already valid" unticked and then start it up?

 

16 minutes ago, trurl said:

You should add attributes 1 and 200 for monitoring on all your WD drives. When you do, you will see several drives with SMART ( 👎 ) warnings on the Dashboard page.

 

And setup Notifications to alert you by email or other agent as soon as a problem is detected.

I never knew about monitoring 1 + 200. And I have got notifications set up, it notified me that disk 6 was in an error state. I'll keep an eye on the others now I've put 1, 200 in the smart attribute notifications for the WD drives.

 

Thanks both for your help on this.

Edited by Piemanpm
Link to comment

If you want to "fill the gap" where disk4 was by changing the assignments do so now, then

3 minutes ago, Piemanpm said:

leave "Parity is already valid" unticked and then start it up?

While parity is rebuilding, on Main - Array Devices, you should see lots of Reads from all assigned data disks, lots of Writes to both parity disks, zeros in the Errors column.

 

If it looks like there are problems, post new diagnostics.

Link to comment
36 minutes ago, trurl said:

If you want to "fill the gap" where disk4 was by changing the assignments do so now, then

While parity is rebuilding, on Main - Array Devices, you should see lots of Reads from all assigned data disks, lots of Writes to both parity disks, zeros in the Errors column.

 

If it looks like there are problems, post new diagnostics.

 

Well I'll be replacing disk4 shortly so it won't be an issue.

0 errors in Array Devices so far (6.2% in) but disk 6 (the original one that I was in the process of replacing) raw read error rate is going up intermittently.

Thanks for your help, both of you. Greatly appreciated.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.