EMPTY and Remove a Drive Without Losing Parity


NAS

Recommended Posts

  • 7 months later...

....

If you really want to remove a drive and leave parity undisturbed, this will work:

1. Start array in Maintenance mode.  This ensures no file systems are mounted.

2. Identify which disk you're removing, let's say it's disk3.  Take a screen shot.

3. From the command line type:

dd bs=1M if=/dev/zero of=/dev/md3   <-- 'md3' here corresponds to 'disk3'

4. Go to bed because this will take a long time.

5. When the command completes, Stop array, go to Utils page, click 'New Config' and execute that Utility.

6. Go back to Main, assign Parity, and all devices except the one you just cleared.

7. Click checkbox "Parity is already valid.", and click Start

 

Any code changes I make will just refine the above process.  Here are several I can think of:

a) Add ability to mark a disk "offline".  This let's us Start array normally in step 1 so server is not down during the process.

b) Add some code put array in mode where all writes are "reconstruct writes" vs. "read-modify-writes".  This requires all the drives to be spun up during the clearing process but would probably let step 3 run 3x faster.

c) Add an explicit "Clear the offline disk and remove from array when done" button.

...

 

Sorry for resurrecting this thread. I was linked to this thread from another post about how to remove one of my drives, but doing so made me lose 1/2 a TB of data it seems, and i dont even know what, grr :(

 

Is there anything i can do now?

 

screenshots attached for both before and after, obviously you can see that the used space went from 16.7tb to 16.1. tried clicking new config again and mounting the old drive, but that comes up as "unmountable". Im pretty sure i have to run the parity check again shouldnt i, obviously the parity wont be valid.

 

here was the console output from the executed command:

root@karie:~# dd bs=1M if=/dev/zero of=/dev/md4
dd: error writing ‘/dev/md4’: No space left on device
953870+0 records in
953869+0 records out
1000204853248 bytes (1.0 TB) copied, 22028.4 s, 45.4 MB/s

Link to comment

sorry, cant seem to attach images, here is a link:

 

To where did you move the data from the disk you removed?

haha, nowhere? :(

 

more i think about is, more i feel like maybe there is a missing step haha. I simply followed the quoted instructions above, i assumed the console command ("dd bs=1M if=/dev/zero of=/dev/md3") copies the data from the said drive to the rest of the array.

 

Im guessing there is another step i should have done before all this? Can someone please explain how/what i should have done?

 

Also, is there any way for me to recover the missing data, the old parity might still be valid.

 

ps: im reallllly new to this, im just getting the hang of unraid.

Link to comment

sorry, cant seem to attach images, here is a link:

 

To where did you move the data from the disk you removed?

haha, nowhere? :(

 

more i think about is, more i feel like maybe there is a missing step haha. I simply followed the quoted instructions above, i assumed the console command ("dd bs=1M if=/dev/zero of=/dev/md3") copies the data from the said drive to the rest of the array.

 

Im guessing there is another step i should have done before all this? Can someone please explain how/what i should have done?

 

Also, is there any way for me to recover the missing data, the old parity might still be valid.

 

ps: im reallllly new to this, im just getting the hang of unraid.

the 'dd ' command zeroes the drive (thus erasing any contents) and updates parity appropriately.    Once a drive has been completely zeroed then its presence in the array is irrelevant as far as parity is concerned.

 

If you wanted to keep the drives contents then you should have copied the contents elsewhere before zeroing the drive.

 

Note that the procedure you followed is unofficial.    There is no officially supported way to remove a drive while keeping parity valid (and thus the array protected).

Link to comment

I can see why you made the mistake, this thread continues the discussion from a previous one, and nowhere in those steps does it mentions that you have to copy/move all data from the disk you want to remove.

 

AFAIK it will be impossible to recover any data, the disk was filled with zeros, so it can't be recovered by normal utilities and parity was updated during the process, so a rebuild would give the same result.

 

Link to comment

... i assumed the console command ("dd bs=1M if=/dev/zero of=/dev/md3") copies the data from the said drive to the rest of the array.

Not trying to pile on here, but for future reference, don't do anything at the command line without some idea of what it actually does. Linux can be pretty cryptic, tending to have very powerful commands represented by 2 or 3 letters. I don't know nearly as much as many here.

 

Google is your friend.

Link to comment

whats done is done, but maybe someone should add a bold/red disclaimer at the top of the first post saying that the instructions are incomplete? Because i digged up this thread, more people might see it and maybe assume its complete instructions. my bad for bringing this to the top :(

Link to comment

whats done is done, but maybe someone should add a bold/red disclaimer at the top of the first post saying that the instructions are incomplete? Because i digged up this thread, more people might see it and maybe assume its complete instructions. my bad for bringing this to the top :(

 

I've added a warning to the posts, both Tom's original post and the copy in the first post, in hope no one else will lose data.

Link to comment

:(

 

This should reinforce my strong opinion that requrieing users to use dd is ALWAYS a bad idea

Haha i agree.

 

Is there any reason why the UI simply doesn't have an "empty drive" button which simply sets the drive to some read only mode and offloads all the drive contents to other drives in the array. Once emptied, then "update parity and remove" button which essentially runs that dd command the unmounts.

Link to comment

:(

 

This should reinforce my strong opinion that requrieing users to use dd is ALWAYS a bad idea

Haha i agree.

 

Is there any reason why the UI simply doesn't have an "empty drive" button which simply sets the drive to some read only mode and offloads all the drive contents to other drives in the array. Once emptied, then "update parity and remove" button which essentially runs that dd command the unmounts.

Shrinking the array is not something often requested so it hasn't been implemented. And there are documented ways to accomplish it without resorting to the command line, one of which I linked you to when you asked about that "advanced" method.

 

I think other things have taken priority and this functionality is probably pretty far down on the wish list. There is not even an official method to backup your data. Limetech is a pretty small company, maybe only a half dozen employees. So they have to decide what's important to use their resources on.

Link to comment
  • 3 months later...

As someone put it earlier, I'm "kicking the ants nest one more time"!  But I've developed a script to make drive clearing safe, so hopefully this may be the second to last time the nest gets kicked (the last being when LimeTech builds the feature into unRAID!).

 

  clear an array drive script

 

It's designed to be as bulletproof as I could get it, and as far as I know cannot possibly cause any data loss.  It has kludges in it, due to the fact that the User Scripts plugin cannot currently interact with the user.  So it requires the user to clean all data off the drive (and I verify that), then put back a single folder named exactly clear-me, a marker that I test for.  The linked post describes it further.

 

I've updated the Shrink array wiki page to include a new method for removing a drive by clearing it first, while maintaining parity.  It's the second method listed there.  It is of course based on the work and discussion of this thread.

Link to comment
  • 5 months later...

I am just about to clear a drive like described in the Wiki's shrink array page.

But the writing seems odd slow to me with 2,1MB/s, ist this normal?

 

No, write speed should be the same as any other write to that disk according to the write method selected, it's recommended to turn on turbo write to increase the clearing speed.

Link to comment

I did set Tunable (md_write_method) to reconstruct write like mentioned in the wiki. Under Settings -> Disk Settings.

 

Edit:

Okay i stopped the process and started all over again. Except for setting Tunable (md_write_method) now to auto. From what i see now he writes the zero's to disk with approx. 65 MB/s.

Link to comment

I did set Tunable (md_write_method) to reconstruct write like mentioned in the wiki. Under Settings -> Disk Settings.

 

Edit:

Okay i stopped the process and started all over again. Except for setting Tunable (md_write_method) now to auto. From what i see now he writes the zero's to disk with approx. 65 MB/s.

If normal write method is performing better than reconstruct that would suggest an issue with another drive, since reconstruct reads all drives and normal only uses the disk to be written and parity.

 

Post your diagnostics.

Link to comment

Okay i have not had any speed problems while copying stuff around to reformat several drives to  xfs.

Attached you find the diagnostics, hopefully nothing serious.

 

There's an intermittent problem with the parity disk, most likely a bad SATA cable/power connector:

 

Feb  1 15:20:46 HTMS kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Feb  1 15:20:46 HTMS kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Feb  1 15:20:46 HTMS kernel: ata5: SError: { UnrecovData Handshk }
Feb  1 15:20:46 HTMS kernel: ata5.00: failed command: WRITE DMA EXT
Feb  1 15:20:46 HTMS kernel: ata5.00: cmd 35/00:00:80:db:c2/00:04:81:00:00/e0 tag 20 dma 524288 out
Feb  1 15:20:46 HTMS kernel:         res 50/00:00:7f:db:c2/00:00:81:00:00/e0 Emask 0x10 (ATA bus error)
Feb  1 15:20:46 HTMS kernel: ata5.00: status: { DRDY }
Feb  1 15:20:46 HTMS kernel: ata5: hard resetting link
Feb  1 15:20:46 HTMS kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb  1 15:20:46 HTMS kernel: ata5.00: configured for UDMA/133
Feb  1 15:20:46 HTMS kernel: ata5: EH complete
Feb  1 16:18:45 HTMS kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Feb  1 16:18:45 HTMS kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Feb  1 16:18:45 HTMS kernel: ata5: SError: { UnrecovData Handshk }
Feb  1 16:18:45 HTMS kernel: ata5.00: failed command: WRITE DMA EXT
Feb  1 16:18:45 HTMS kernel: ata5.00: cmd 35/00:40:20:bd:01/00:05:29:00:00/e0 tag 9 dma 688128 out
Feb  1 16:18:45 HTMS kernel:         res 50/00:00:1f:e5:01/00:00:29:00:00/e0 Emask 0x10 (ATA bus error)
Feb  1 16:18:45 HTMS kernel: ata5.00: status: { DRDY }
Feb  1 16:18:45 HTMS kernel: ata5: hard resetting link
Feb  1 16:18:45 HTMS kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb  1 16:18:45 HTMS kernel: ata5.00: configured for UDMA/133
Feb  1 16:18:45 HTMS kernel: ata5: EH complete
Feb  1 16:38:37 HTMS kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Feb  1 16:38:37 HTMS kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Feb  1 16:38:37 HTMS kernel: ata5: SError: { UnrecovData Handshk }
Feb  1 16:38:37 HTMS kernel: ata5.00: failed command: WRITE DMA EXT
Feb  1 16:38:37 HTMS kernel: ata5.00: cmd 35/00:40:60:21:51/00:05:09:01:00/e0 tag 13 dma 688128 out
Feb  1 16:38:37 HTMS kernel:         res 50/00:00:5f:21:51/00:00:09:01:00/e0 Emask 0x10 (ATA bus error)
Feb  1 16:38:37 HTMS kernel: ata5.00: status: { DRDY }
Feb  1 16:38:37 HTMS kernel: ata5: hard resetting link
Feb  1 16:38:37 HTMS kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb  1 16:38:37 HTMS kernel: ata5.00: configured for UDMA/133
Feb  1 16:38:37 HTMS kernel: ata5: EH complete

Link to comment

Thanks for that hint.

What to do now? Directly stop everything and trying to fix it?

There are no errors shown in the unRaid WebGui for the Parity disk.

 

Also if the parity disk would cause the problems with the low write speed, should the write speed be bad no matter what write method ist used?

 

Btw. the diagnostics are copied while Tunable (md_write_method) is set to auto.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.