How to safely remove bad drive from array

munimisu · July 27, 2022

Hi everyone.

I am on unraid version: 6.9.2. Recently one of my drive failed (red cross mark beside it) and it wont turn on at all. How can I safely remove it? I dont have a replacement drive of same size, I have a bigger driver that I will add later. Not sure if I can just replace the bad drive with new bigger one and have parity rebuild safely.

I have checked all FAQs, but they all advise to move data to other drives, which I can't since drive is already bad.

Thanks in advance for any suggestion.

trurl · July 27, 2022

26 minutes ago, munimisu said:

checked all FAQs, but they all advise to move data to other drives

Where do you see that? I am very skeptical that any of our FAQs suggest anything like that.

Moving to other drives in the array is NOT the recommendation. In fact, I always specifically recommend NOT doing that.

Lots of questions we have can be answered if you

Attach Diagnostics to your NEXT post in this thread

trurl · July 27, 2022

27 minutes ago, munimisu said:

it wont turn on at all

What do you mean and how do you know?

I am concerned that you have already done things that might cause problems recovering your data.

Please don't do anything without further advice.

munimisu · July 27, 2022

2 hours ago, trurl said:

Where do you see that? I am very skeptical that any of our FAQs suggest anything like that.

Here is the guide I landed on when searching on this issue https://wiki.unraid.net/Shrink_array#:~:text=Make sure that the drive or drives you are removing have been removed from any inclusions or exclusions for all shares%2C including in the global share settings. Shares should be changed from the default of "All" to "Include". This include list should contain only the drives that will be retained.

Attached diagnostic log

homenas-diagnostics-20220727-1218.zip

munimisu · July 27, 2022

2 hours ago, trurl said:

What do you mean and how do you know?

Please see the screenshot. When I hover over the red X tooltip says click to spin-up the drive, when I do that drive still have same red X beside it and contents are not displayed when I check the disk in explorer.

I have not done anything yet that will impact the drive. Array is still up and running on same situation. Only change I have done is excluded the bad drive (disk 5 in my array) from any share.

Thanks a lot for helping.

trurl · July 27, 2022

That wiki is indeed about removing a disk, but it isn't really about removing a disabled disk. Normally you want to recover the data by rebuilding the disk.

Doesn't look like there is anything wrong with the disk itself. Syslog indicates problems communicating with multiple disks, but since you have single parity only one could be disabled.

Probably controller or power problem.

Disk5 disabled, but SMART looks OK except for a ridiculous number of CRC (connection problems). Extended test passed but was some time ago.

Emulated disk5 mounts but doesn't seem to have much data on it, if any.

Is disk5 supposed to be empty?

trurl · July 27, 2022

3 minutes ago, trurl said:

Is disk5 supposed to be empty?

Since your other disks all have data you must have moved the data off that disk, or formatted it.

Can you explain why the disk is empty?

munimisu · July 27, 2022

Thank you for checking the logs.

I dont recall what was the status of the disk5 before I noticed the issue. It was never setup to be empty. Its showing it have 27GB of data on it (which is pretty low since this drive was in the array since beginning).

Can you please advise what can I do now?

trurl · July 27, 2022

Did you ever reformat the disk?

What do you get from command line with this?

ls -lah /mnt/disk5

munimisu · July 27, 2022

2 minutes ago, trurl said:

Did you ever reformat the disk?

No, never reformat the drive since adding to the array.

Here is the result:

root@homeNAS:~# ls -lah /mnt/disk5
total 0
drwxrwxrwx 2 nobody users 6 Jul 7 11:41 ./
drwxr-xr-x 15 root root 300 Jul 7 11:31 ../

JorgeB · July 27, 2022

199 UDMA_CRC_Error_Count    -O--CK   200   171   000    -    2468648

Disk was already disabled at boot, but there are still constant errors, the above attribute suggests an issue with he SATA cable.

trurl · July 27, 2022

12 minutes ago, munimisu said:

Here is the result

So the disk is empty. Did you move the data off of it?

7 minutes ago, JorgeB said:

Disk was already disabled at boot

How did you decide that? The syslog goes back 3 weeks.

Also, what about these in syslog?

Spoiler

Jul 20 04:53:37 homeNAS kernel: sd 4:0:0:0: attempting task abort!scmd(0x00000000d321d4dd), outstanding for 7226 ms & timeout 7000 ms
Jul 20 04:53:37 homeNAS kernel: sd 4:0:0:0: [sdc] tag#9294 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Jul 20 04:53:37 homeNAS kernel: scsi target4:0:0: handle(0x001c), sas_address(0x500c04f2f3186921), phy(33)
Jul 20 04:53:37 homeNAS kernel: scsi target4:0:0: enclosure logical id(0x500c04f2f3186900), slot(0) 
Jul 20 04:53:37 homeNAS kernel: sd 4:0:0:0: device_block, handle(0x001c)
Jul 20 04:53:38 homeNAS kernel: sd 4:0:0:0: task abort: SUCCESS scmd(0x00000000d321d4dd)
Jul 20 04:53:38 homeNAS kernel: sd 4:0:0:0: device_unblock and setting to running, handle(0x001c)
Jul 20 04:53:38 homeNAS emhttpd: read SMART /dev/sdc
Jul 20 04:53:40 homeNAS emhttpd: spinning down /dev/sdk
Jul 20 04:53:43 homeNAS emhttpd: spinning down /dev/sdd
Jul 20 04:54:07 homeNAS kernel: sd 4:0:9:0: attempting task abort!scmd(0x00000000867a2e94), outstanding for 7444 ms & timeout 7000 ms
Jul 20 04:54:07 homeNAS kernel: sd 4:0:9:0: [sdk] tag#9322 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Jul 20 04:54:07 homeNAS kernel: scsi target4:0:9: handle(0x001b), sas_address(0x500c04f2f3186920), phy(32)
Jul 20 04:54:07 homeNAS kernel: scsi target4:0:9: enclosure logical id(0x500c04f2f3186900), slot(1) 
Jul 20 04:54:07 homeNAS kernel: sd 4:0:9:0: device_block, handle(0x001b)
Jul 20 04:54:08 homeNAS kernel: sd 4:0:9:0: task abort: SUCCESS scmd(0x00000000867a2e94)
Jul 20 04:54:08 homeNAS kernel: sd 4:0:9:0: device_unblock and setting to running, handle(0x001b)
Jul 20 04:54:08 homeNAS emhttpd: read SMART /dev/sdk
Jul 20 04:54:10 homeNAS emhttpd: spinning down /dev/sdh
Jul 20 04:54:10 homeNAS emhttpd: spinning down /dev/sde
Jul 20 04:54:47 homeNAS kernel: sd 4:0:3:0: attempting task abort!scmd(0x000000003c610d58), outstanding for 7043 ms & timeout 7000 ms
Jul 20 04:54:47 homeNAS kernel: sd 4:0:3:0: [sde] tag#9280 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Jul 20 04:54:47 homeNAS kernel: scsi target4:0:3: handle(0x0015), sas_address(0x500c04f2f3186919), phy(25)
Jul 20 04:54:47 homeNAS kernel: scsi target4:0:3: enclosure logical id(0x500c04f2f3186900), slot(5) 
Jul 20 04:54:47 homeNAS kernel: sd 4:0:3:0: task abort: SUCCESS scmd(0x000000003c610d58)
Jul 20 04:54:50 homeNAS emhttpd: read SMART /dev/sde
Jul 20 04:55:07 homeNAS kernel: sd 4:0:6:0: attempting task abort!scmd(0x0000000019c137e4), outstanding for 7348 ms & timeout 7000 ms
Jul 20 04:55:07 homeNAS kernel: sd 4:0:6:0: [sdh] tag#9310 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Jul 20 04:55:07 homeNAS kernel: scsi target4:0:6: handle(0x0018), sas_address(0x500c04f2f318691d), phy(29)
Jul 20 04:55:07 homeNAS kernel: scsi target4:0:6: enclosure logical id(0x500c04f2f3186900), slot(7) 
Jul 20 04:55:07 homeNAS kernel: sd 4:0:6:0: task abort: SUCCESS scmd(0x0000000019c137e4)

munimisu · July 27, 2022

24 minutes ago, trurl said:

So the disk is empty. Did you move the data off of it?

I did exclude this disk from all shares. Didn't do anything manually.

trurl · July 27, 2022

6 minutes ago, munimisu said:

I did exclude this disk from all shares. Didn't do anything manually.

That wouldn't affect any files already on the disk. Was the disk ever shown as unmountable? That would cause Unraid to list it as a disk to be formatted along with any other disks that actually needed formatting, but you would have to agree to format them.

trurl · July 27, 2022

6 minutes ago, trurl said:

Was the disk ever shown as unmountable? That would cause Unraid to list it as a disk to be formatted along with any other disks that actually needed formatting, but you would have to agree to format them.

How long ago was this disk installed?

trurl · July 27, 2022

38 minutes ago, trurl said:

How did you decide that?

I see

Jul  7 11:41:00 homeNAS root: Fix Common Problems: Error: disk5 (WDC_WD40EFRX-68N32N0_WD-WCC7K6NLJ5LE) is disabled

munimisu · July 27, 2022

8 minutes ago, trurl said:

That wouldn't affect any files already on the disk. Was the disk ever shown as unmountable? That would cause Unraid to list it as a disk to be formatted along with any other disks that actually needed formatting, but you would have to agree to format them.

I didnt do any formatting recently, server have been in same state for many months now until recently this disk issue.

3 minutes ago, trurl said:

How long ago was this disk installed?

I think its been around 3 years.

trurl · July 27, 2022

11 minutes ago, munimisu said:

recently this disk issue

As noted disk had been disabled for at least 3 weeks maybe longer.

Do you have Notifications setup to alert you by email or other agent as soon as a problem is detected?

Do you really want to remove the disk or do you just want to enable it again? Either way is a rebuild, rebuild disk5 or rebuild parity without it.

trurl · July 27, 2022

If there was any data on disk5 it is gone now. Do you have backups of anything important and irreplaceable?

trurl · July 27, 2022

Actually we don't know for sure about the contents of the physical disk5 since Unraid isn't using it. We do know the emulated disk is empty. You could unassign disk5 and see if that disk will mount with Unassigned Devices.

trurl · July 27, 2022

If the physical disk does have data you could New Config it back into the array and rebuild parity

trurl · July 27, 2022

22 minutes ago, trurl said:

As noted disk had been disabled for at least 3 weeks maybe longer.

Do you have Notifications setup to alert you by email or other agent as soon as a problem is detected?

Also worth noting that with single parity and one disk disabled you have been running with no redundancy all that time.

munimisu · July 27, 2022

25 minutes ago, trurl said:

Do you have Notifications setup to alert you by email or other agent as soon as a problem is detected?

Unfortunately no. I periodically check if there is any issue/update pending. I will setup notifications.

25 minutes ago, trurl said:

Do you really want to remove the disk or do you just want to enable it again? Either way is a rebuild, rebuild disk5 or rebuild parity without it.

I dont want to remove the disk if there is no issue with disk.

Can you advise if I got this right:

1. stop array

2. unassign disk5

3. rebuild parity without disk5

4. After parity built is complete, stop array

5. assign disk5 again

6. re-build parity

munimisu · July 27, 2022

13 minutes ago, trurl said:

Also worth noting that with single parity and one disk disabled you have been running with no redundancy all that time.

Noted. I have a disk to add another parity, just been putting it off. I will do that now and setup notifications as well. Thanks!

trurl · July 27, 2022

5 minutes ago, munimisu said:

1. stop array

2. unassign disk5

3. rebuild parity without disk5

4. After parity built is complete, stop array

5. assign disk5 again

6. re-build parity

No reason to rebuild parity twice. And you can't rebuild without the disk unless you New Config, so needs some correction.

Do you want to see if there is anything on the physical disk?

How to safely remove bad drive from array

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation