Removing a drive with no replacement


Recommended Posts

One of my drives has failed, and I want to remove it from the array. Since the array is 8Tb (1+1+2+4) in size with 2.5Tb used the plan is to simply remove the failed drive, and not replace it, so basically reducing the size of the array from 8Tb to 7Tb. 

 

The wiki (and other resources) explain what to do to replace a failed drive, but it's not clear if this will work when simply removing a drive, given we want to use parity to restore to existing drives, not a new one. 

 

The guide for shrinking an array seems to imply that data on the existing drive will be lost, and needs to be manually copied back to the array but it's not clear to me what the process is to do that. Particularly these two statements seem contradictory -- I'm reading them incorrectly presumably:

  • method does not keep the drive's data within the array. If the drive to be removed has data you want to stay in the array, you must move it yourself to the other data drives.
  • method should be used if you need to preserve the contents of the data drive you are removing from the array.

 

What's the process to prevent parity being rebuilt, but rather use parity to restore to existing drives?

 

Edited by Nepherim
Added shrinkage...
Link to comment

You must rebuild parity if you remove a drive. There is no way to rebuild the data from one data slot to other disks in the array, only to another disk in the same slot. Parity is only about rebuilding a disk and knows nothing about any of the data.

 

Go to Tools - Diagnostics and attach the complete diagnostics zip file to your NEXT post and we can give you other ideas on how to proceed.

Link to comment
25 minutes ago, trurl said:

There is no way to rebuild the data from one data slot to other disks in the array, only to another disk in the same slot. Parity is only about rebuilding a disk and knows nothing about any of the data.

Sounds like the process would be to disable parity (just to prevent rebuild whilst copying data), remove the failing drive, mount it unassigned. and try to copy the data from the failing unassigned drive to the array drives. Then restart the array, and allow parity to rebuild.

 

Diagnostic are attached.

unraid-diagnostics-20200430-1759.zip

Link to comment

As a bit of additional context the drive failed midway though a process of backing up some key folders to a drive not in the array. 

 

At the moment I'm letting the backup continue with the data on the failed drive being emulated by parity. That way I'll have the data on a non-array drive in case the failed drive is unreadable.

Link to comment

Disk2 is one of the very worst SMART reports I have ever seen. It is pretty old though. Did this happen all at once, or have you just been ignoring this problem? Do you have Notifications setup?

 

1 minute ago, Nepherim said:

As a bit of additional context the drive failed midway though a process of backing up some key folders to a drive not in the array. 

 

At the moment I'm letting the backup continue with the data on the failed drive being emulated by parity. That way I'll have the data on a non-array drive in case the failed drive is unreadable.

This seems like a good plan, and in fact, what I usually recommend instead of trying to shuffle files from the emulated disk to other disks in the array when you are already unprotected.

 

41 minutes ago, Nepherim said:

Sounds like the process would be to disable parity (just to prevent rebuild whilst copying data), remove the failing drive, mount it unassigned. and try to copy the data from the failing unassigned drive to the array drives. Then restart the array, and allow parity to rebuild.

Once you have the backup, might as well New Config without the disk, reassign disks if you want to fill in that gap left by the missing disk2 (making sure you don't change parity assignment, of course), and go ahead and rebuild parity. Then everything is protected again and then you can copy the data from that backup.

Link to comment
35 minutes ago, trurl said:

Disk2 is one of the very worst SMART reports I have ever seen. It is pretty old though. Did this happen all at once, or have you just been ignoring this problem? Do you have Notifications setup?

The drive was originally being used as normal drive in a different machine. It did start showing some errors, which prompted me to start using Unraid. At that point I reformatted the drive thinking that might avoid whatever bad parts of the drive there might be. I then used it outside Unraid for a bit, and it "seemed" okay -- although to be honest I'd really have no way of knowing. I then decided it seemed good, so put it in the array. I did see a few notifications, but it's only been in the array for a week or so, and didn't really have a way of assessing if they were danger notifications or just fyi types. Clearly danger notifications :)

 

tl;dr: Probably been failing for a while without me realizing.

 

41 minutes ago, trurl said:

Once you have the backup, might as well New Config without the disk, reassign disks if you want to fill in that gap left by the missing disk2 (making sure you don't change parity assignment, of course), and go ahead and rebuild parity. Then everything is protected again and then you can copy the data from that backup.

Can the final step be done by mounting the backup drive (or worst case the failed drive) as Unassigned and using rsync or Konqueror to copy the data to the array?

Link to comment
9 minutes ago, Nepherim said:

I did see a few notifications, but it's only been in the array for a week or so, and didn't really have a way of assessing if they were danger notifications or just fyi types. Clearly danger notifications :)

If you don't know whether or not a notification is something to worry about, it is something to worry about and you should ask. That disk should be showing you SMART warning on the Dashboard and if you click on the disk in Main and go to its Attributes, you will see several highlighted.

 

You should setup Notifications to alert you immediately by email or other agent when Unraid detects a problem.

 

12 minutes ago, Nepherim said:

Can the final step be done by mounting the backup drive (or worst case the failed drive) as Unassigned and using rsync or Konqueror to copy the data to the array?

Yes. I assume you meant Krusader there. Have you used it with an Unassigned Device before?

Link to comment
39 minutes ago, trurl said:

If you don't know whether or not a notification is something to worry about, it is something to worry about and you should ask. That disk should be showing you SMART warning on the Dashboard and if you click on the disk in Main and go to its Attributes, you will see several highlighted.

 

You should setup Notifications to alert you immediately by email or other agent when Unraid detects a problem.

Before the failure, I did see some 'sectors being reallocated' info warnings, and read that as an informational alert saying action had been taken. Is there a way to differentiate a few sectors being reallocated vs a lot of sectors indicating problems? Or are all reallocates cause for the same level of concern? Email alerts make sense, but only if they are user actionable.

 

39 minutes ago, trurl said:

Yes. I assume you meant Krusader there. Have you used it with an Unassigned Device before?

Yes, I meant Krusader. Not used it with Unassigned Devices, but when I created the docker I did setup an Unassigned folder mapping. Not yet tested that -- will wait for backup to complete fist.

 

Edited by Nepherim
Link to comment
1 hour ago, Nepherim said:

Before the failure, I did see some 'sectors being reallocated' info warnings, and read that as an informational alert saying action had been taken. Is there a way to differentiate a few sectors being reallocated vs a lot of sectors indicating problems? Or are all reallocates cause for the same level of concern? Email alerts make sense, but only if they are user actionable.

If a reallocation event happens once, and never again, it's not a concern. If it repeats again, it's time to pay attention. 3rd event, plan a replacement. If you get several reallocation events back to back, prepare to lose the drive, probably sooner rather than later.

 

The number of sectors is a factor, but the rate of increase is what is most concerning. Some drives reallocate a dozen or so sectors, then stay quiet for years.

  • Like 1
Link to comment
13 hours ago, trurl said:

Once you have the backup, might as well New Config without the disk, reassign disks if you want to fill in that gap left by the missing disk2 (making sure you don't change parity assignment, of course), and go ahead and rebuild parity. Then everything is protected again and then you can copy the data from that backup.

The backup drive being used is LUKS encrypted. Is this procedure the right way to mount a LUKS drive as Unassigned?

 

Also, when I get to the point where I want to restore the files from the failed drive back to the array it seems like the only practical approach is to restore from a copy of the failed drive if it's still readable. Any other approach is going to require some means of identifying what partial sub-(sub-sub...) directory file content was on the failed drive. What is the best approach to identifying and then restoring the files from the failed drive? 

Link to comment
16 hours ago, trurl said:

Once you have the backup, might as well New Config without the disk, reassign disks if you want to fill in that gap left by the missing disk2 (making sure you don't change parity assignment, of course), and go ahead and rebuild parity. Then everything is protected again and then you can copy the data from that backup.

Here's where I'm at:

  • Made a backup of key folders made to an off-Unraid drive using parity to emulate the failed disk.
  • Mounted a LUKS USB drive as Unassigned Device, and from terminal did an rsync of /mnt/disk2 (failed drive) to /mnt/disks/<<USB-DRIVE>>.

 

My next steps as above is to remove the failed drive, and then "New Config without the disk, reassign disks if you want to fill in that gap left by the missing disk2 (making sure you don't change parity assignment, of course), and go ahead and rebuild parity.".

 

The step after that is unclear. What is the process for restoring the files from the failed drive to the array? My confusion is that the failed drive has files from every high level share on the array. Can i simply use Krusader to copy the files from each high level directory on the failed-drive copy over to the appropriate Unraid share, and files will simply merge to the existing sub-directories? Or is it more nuanced than that?

 

One other thought here is I actually have a drive in the array which was recently added, and is not actually being used. Is there a sequence to use this drive somehow? Or is that more messing than is needed?

 

Edited by Nepherim
formtting
Link to comment

When copying from an Unassigned Device, it is up to you and how it serves your purposes whether you copy to disks in the parity array, or you copy to user shares.

 

The user shares are simply the aggregate of all top level folders on cache and array.

 

If you create a user share in the webUI, Unraid creates a top level folder named for the share on cache or array as needed in accordance with the settings for that user share.

 

If you create a top level folder on cache or array disk, that top level folder is automatically a user share with the same name as the folder, and it will have default settings until you change them.

 

If you copy from an Unassigned Device to a user share, the data will wind up on cache or array disks as determined by the settings for that user share.

 

If you copy from an Unassigned Device to a path within a top level folder on an assigned disk (cache or array), then the data will wind up on that disk, but it will still be part of the user share named for that top level folder.

 

Note that you must not mix user shares and assigned disks when moving / copying files. This is because Linux doesn't understand that the user shares and the disks are the same files, so it can try to overwrite what it is trying to read if the source and destination paths work out that way.

 

But since Unassigned Devices are not part of the user shares, you can move/copy them with user shares or with assigned disks since there is no chance the paths will collide.

Link to comment

Appreciate your patience here, thanks for the help! My question is more oriented around the fact that Unraid may split folders within a share across multiple disks. That is typically transparent to Unraid users as Unraid handles all the splitting and shows just the share, regardless of which disk the data happens to be on.

 

In this case though I have a disk which I'm reading from which contains parts of what was once on a share. So for example I have some files from Share1, some from Share2, etc all on the 'failed' disk. Given there's around 1Tb of files, I can't really pick and choose at the low level, so I need to basically grab all files on the unassigned drive under the Share1 directory and copy them to the array share Share1. In theory this simply merges data from the unassigned drive with the data in the array share. There should be no duplicate files names since this disk was originally part of the entire array. I just want to be sure that when I do a copy /failed disk/Share1/dir1/dir1a/* doesn't end up removing whatever existing files or data there might be in  /Unraid/Share1/dir1 or even dir1a. Doesn't sound like it'll be an isse, but rather be safe than sorry.

Link to comment

Copy the folders and files from the emulated disk somewhere off the array. Note that the top level folder(s) of that emulated disk are part of user shares with the same name as the folder. Your copy must preserve the same folder structure as was on that emulated disk so you will be able to put them back in the correct user shares. Sounds like you are done with that part.

 

You do a New Config without that disk so you can rebuild parity and get the rest of the array protected again. As soon as you start the array with a New Config, and before parity is rebuilt, the disk is no longer emulated. Any folders and files that were on the emulated disk are no longer in the array and so no longer in any user share. But the other assigned disks still contain their own folders and files, and their top level folders are still part of user shares.

 

2 hours ago, trurl said:

When copying from an Unassigned Device, it is up to you and how it serves your purposes

When copying the data back, you can use Krusader or rsync or whatever you want.

 

2 hours ago, trurl said:

whether you copy to disks in the parity array

If you take that copy you made, and you copy those top level folders back to assigned disk(s), then those top level folders will be merged if those same folders already exist on the assigned disk(s), or else they will be created on the assigned disk(s) if they don't already exist.

 

That is how you would do it if you wanted to put all of it on the assigned disk that isn't being used. That disk would then have those parts of those user shares that were on the emulated disk.

 

2 hours ago, trurl said:

or you copy to user shares.

If you take that copy you made, and you copy the contents of each of those top level folders to a user share by that same name, then those contents will be put in that user share and Unraid will decide which assigned disk to actually write them to according to the settings for that user share.

 

 

 

  • Like 1
Link to comment
On 5/1/2020 at 3:33 PM, trurl said:

If you take that copy you made, and you copy those top level folders back to assigned disk(s), then those top level folders will be merged if those same folders already exist on the assigned disk(s), or else they will be created on the assigned disk(s) if they don't already exist.

 

That is how you would do it if you wanted to put all of it on the assigned disk that isn't being used. That disk would then have those parts of those user shares that were on the emulated disk.

This was the part I wasn't understanding at the start, and is the key. Basically a "share" is represented by a top level folder on any of he disks in the array. Doesn't matter how that top-level folder gets there, either as a copy onto the array through a 'share', or direct to the disk via Krusader or rsync.

 

Everything is now back up and running, with none of the many warning previously seen. Telegram notifications have been setup so this time I'll actually see them! Thanks trurl for you help and patience.

 

On 4/30/2020 at 6:59 PM, trurl said:

Disk2 is one of the very worst SMART reports I have ever seen. It is pretty old though. Did this happen all at once, or have you just been ignoring this problem? Do you have Notifications setup?

Out of interest what are you looking at which tells you this -- I'd like to know what to look for.

Link to comment
19 minutes ago, Nepherim said:

Out of interest what are you looking at which tells you this -- I'd like to know what to look for.

You should have seen SMART warnings on the Dashboard for that disk, and Unraid would also have been notifying you about those SMART warnings if you had Notifications setup. The SMART attributes monitored by default are configured in Settings - Disk Settings. You can also configure each disk to override those global settings by clicking onthe disk to get to its page.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   148   148   140    -    987
197 Current_Pending_Sector  -O--CK   001   001   000    -    65491
198 Offline_Uncorrectable   ----CK   198   198   000    -    931
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

Pending is so large that I wonder if that number isn't some glitch since it is very close to the maximum 16-bit number.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.