Failing disk path forward

bigrob8181 · June 26, 2022

Looking for a bit of advice.

Backstory: Last week i replaced my 8TB parity drive with a new 14TB drive. (Old parity drive still good.)

Around the same time I noticed a different drive was beginning to have errors (disk4), but nothing that really worried me. (I've never lost a drive so wasn't very sensitive to the issues)

Yesterday i installed a LSI HBA and moved all drives to it. Installed a second cache SSD, and brought up the array. Began zeroing the old parity dive to add more space to the array (added as disk6).

Yesterday in the late afternoon, the problematic disk (disk4) started spitting out tons of errors. Unraid has not marked it failed yet, but all the tell tell signs are there. (been reading forms since i first encountered the issues)

Reallocated sector count: 102

Current pending sector: 2216

Offline uncorrectable: 266

UDMA CRC error count: 14

Smart short test Errored out.

I just want to take the disk that is zeroed (disk6)(not formatted yet) and replace the failing disk (disk4), but since I've added it to the array, despite not yet being formatted and whatnot, unraid sees it as an additional missing disk if i try to move it. Parity "should be" good at this point, however it checks monthly at the beginning of the month, which is in a few days.

Path forward?

JorgeB · June 26, 2022

If disk6 was never formatted you can do a new config without it (tools -> new config), then check "parity is already valid" and start the array, stop the array, and replace disk4.

bigrob8181 · June 26, 2022

I may have already started down another path by accident, and not sure how to proceed.

I stopped the array, and removed the failing device from the array.

Started the array.

I stopped the array with the hopes of moving the zeroed drive into slot 4. No dice (2 missing drives)

Trying to put things back to normal, added drive 4 back, started the array, and unraid is doing a rebuild on the failing drive (thinking its new for some reason).

What is the path forward now?

JorgeB · June 26, 2022

Wait for the rebuild to finish then do what I posted above. it can get more complicated if the rebuild doesn't finish due to errors, but errors are much more likely on reads vs. writes.

bigrob8181 · June 26, 2022

Ok, well 14 hours to go then.

Currently at:

Reallocated sector count: 290

Current pending sector: 936

Offline uncorrectable: 266

UDMA CRC error count: 14

I do have a new 8TB drive coming in on tuesday that i can just drop in. Just needs to make it until then.

Edited June 26, 2022 by bigrob8181

bigrob8181 · June 27, 2022

Ok, so the rebuild of the failing drive is finished, and at the moment there are no errors in unraid. The smart data is as follows:

Reallocated sector count: 525

Current pending sector: 56

Offline uncorrectable: 266

UDMA CRC error count: 14

With no more errors in unraid, would you suggest replacing the drive or see how things play out? I still have the one drive zeroed and ready to drop in, i also have an additional 8TB coming in the mail on wednesday. In the event i do not replace the drive, i would consider using the drive coming in the mail for a second parity, and the zeroed drive for additional storage space in the array. I am not interested in purchasing an additional drive in the near future to replace the failing drive, but also am not wanting to trash a drive that may still be good.

itimpi · June 27, 2022

I would say that drive could well be on its last legs To use with Unraid you want the pending sectors to be 0, and a stable (small) number of reallocated sectors.

JorgeB · June 27, 2022

Disk is still showing pending sectors, so it's likely still bad, like mentioned errors usually happen on reads, not writes, but if you want to confirm before replacing run a SMART extended test.

trurl · June 27, 2022

You should replace, but since you haven't posted diagnostics I hesitate to make any recommendations since you may have other problems you are unaware of.

On 6/26/2022 at 7:44 AM, bigrob8181 said:

all the tell tell signs are there

You should have been getting notifications from Unraid about this disk. Do you have Notifications setup to alert you by email or other agent as soon as a problem is detected? Do any of your other disks have SMART warnings on the Dashboard page?

bigrob8181 · June 27, 2022

Diagnostics are attached.

I was getting a lot of notifications this last week. I have not gotten any additional notifications nor emails since i accidentally rebuilt the failing drive onto itself.

Disregard, right after i sent this, i got a notification on disk error and the counter is at 4. I suppose that makes it an easy decision to just swap it out, and expand the storage on wednesday. I suppose i will just have to go dual parity in the future. 👍

zeus-diagnostics-20220627-1134.zip

Edited June 27, 2022 by bigrob8181

trurl · June 27, 2022

16 minutes ago, bigrob8181 said:

counter is at 4

What counter?

33 minutes ago, trurl said:

Do any of your other disks have SMART warnings on the Dashboard page?

bigrob8181 · June 27, 2022

No other warnings.

trurl · June 27, 2022

I know you discussed adding a disk but not sure what happened with that. Was disk6 added but not formatted yet?

bigrob8181 · June 27, 2022

1 hour ago, trurl said:

I know you discussed adding a disk but not sure what happened with that. Was disk6 added but not formatted yet?

That's correct. I was adding to increase array size, but with disk 4 failing I decided it would be better suited replacing it.

I have a rack server case being delivered today so I decided whatever I do, it'll have to start after the case migration.

trurl · June 27, 2022

I guess my question wasn't clear. In your latest screenshot, it shows a disk assigned as disk6 but unmountable.

Did you replace disk4? Or did you rebuild to the original disk4?

Either way, did you also add a disk6, and it is unmountable because it hasn't been formatted yet?

trurl · June 27, 2022

3 hours ago, bigrob8181 said:

No other warnings.

I was talking about the Dashboard page but you posted a screenshot of Main - Array Devices. That did answer the question about the "counter" but it also prompted the other questions I had about disk6.

I was still wondering about whether or not you had any SMART warnings on the Dashboard page, so I looked at the SMART reports of each disk in your Diagnostics.

On disks 1, 2, 3, you should be getting warnings for UDMA CRC error, unless you have already acknowledged them.

The disk4 in your diagnostics has the same problems as before so I guess you rebuilt the original.

bigrob8181 · June 28, 2022

On 6/26/2022 at 8:22 AM, JorgeB said:

Wait for the rebuild to finish then do what I posted above. it can get more complicated if the rebuild doesn't finish due to errors, but errors are much more likely on reads vs. writes.

Ok, so i attempted what was instructed above and must have messed something up.

Disk 6 is now in disk 4 spot. Unraid is re-zeroing. It is NOT emulating the failing disk.

Parity is valid (although i am concerned, because the data is not emulated, so across the disks i would think it might mess something up)

No data from the failing disk 4 is on the array, however its still on the disk.

Is parity being marked valid when its clearly not going to be problematic?

I suppose i need to allow the zero to happen again and then use unbalance to move the data back onto the array. Is this the correct way to proceed?

bigrob8181 · June 28, 2022

For the time being, I canceled the disk clear and shut down the server. Figure I need to protect what's there until I get a response.

trurl · June 28, 2022

2 hours ago, bigrob8181 said:

No data from the failing disk 4 is on the array, however its still on the disk.

What do you mean? How do you know it's still on the disk?

2 hours ago, bigrob8181 said:

Disk 6 is now in disk 4 spot. Unraid is re-zeroing. It is NOT emulating the failing disk.

Unraid will not have made any assignment changes itself. And it would only clear a disk added to a new slot. Something missing from your description.

trurl · June 28, 2022

Sorry, after reviewing the thread, I think I understand how there became a disk6 from your post here

https://forums.unraid.net/topic/125328-failing-disk-path-forward/?do=findComment&comment=1142343

What I don't understand is what you did after that. What exactly did you do in this attempt?

2 hours ago, bigrob8181 said:

attempted what was instructed above

trurl · June 28, 2022

I may be able to guess what you did by breaking this down some.

On 6/26/2022 at 7:50 AM, JorgeB said:

If disk6 was never formatted you can do a new config without it (tools -> new config)

New Config with nothing assigned as disk6, but with the original disk4 assigned as disk4, and all other disks assigned just as they were.

On 6/26/2022 at 7:50 AM, JorgeB said:

check "parity is already valid" and start the array

At this point, all disks would be accepted just as they were assigned.

On 6/26/2022 at 7:50 AM, JorgeB said:

stop the array, and replace disk4

Stop the array, assign the new disk as disk4. At this point, starting the array would make it rebuild the emulated disk4 onto the new disk.

What did you do instead of what I explained above?

For example, if you New Config without any disk assigned as disk4 and started the array, then added disk4, then it would begin clearing the added disk.

If that's what happened, then assuming no other writes to the array, it should still be possible to get it to emulate disk4.

bigrob8181 · June 28, 2022

7 hours ago, trurl said:

I may be able to guess what you did by breaking this down some.

New Config with nothing assigned as disk6, but with the original disk4 assigned as disk4, and all other disks assigned just as they were.

At this point, all disks would be accepted just as they were assigned.

Stop the array, assign the new disk as disk4. At this point, starting the array would make it rebuild the emulated disk4 onto the new disk.

What did you do instead of what I explained above?

For example, if you New Config without any disk assigned as disk4 and started the array, then added disk4, then it would begin clearing the added disk.

If that's what happened, then assuming no other writes to the array, it should still be possible to get it to emulate disk4.

I may have done a new config for both removing the zeroed drive AND for moving the zeroed drive to slot 4. I understand now what I should have done.

Thoughts on new config with the failing drive back in the original location start, stop, then replace with the good drive? Any concerns with parity on the other drives? If I need to do an rsync between the 2 drives after to ensure all data is correct that's fine.

At this point I still have the data, just looking for the quickest and most complete path forward.

Sorry for misunderstanding ND causing a mess although not a complete disaster.

bigrob8181 · June 28, 2022

I should add the good drive got to like 5% zeroed again so like 20 to 30 minutes with the array up and no disk 4.

JorgeB · June 28, 2022

49 minutes ago, bigrob8181 said:

Thoughts on new config with the failing drive back in the original location start, stop, then replace with the good drive?

Should be OK, but some filesystem corruption can happen.

bigrob8181 · June 28, 2022

1 hour ago, JorgeB said:

Should be OK, but some filesystem corruption can happen.

Would it be corruption only on the failing drives files or across the array? Since I still have the data I should be able to recover any corruption unless it's in a bad spot of the failing drive by a simple rsync command I would think.

Failing disk path forward

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation