Rookie Mistake - Removed too many disks at once


Go to solution Solved by JorgeB,

Recommended Posts

I think I made a huge mistake in replacing 3 drives at once from an array of 6 disks and 1 parity drive. Here are the following steps I took:

  1. took screenshot of disks in functioning array
  2. used Unbalance to move data out of 3 disks that are to be replaced
  3. powered down array and server
  4. replaced 3 disks in array
  5. powered on server
  6. new config which wiped out parity disk
  7. assigned the 3 new disks
  8. started array and it started rebuilding the parity but server crashes

 

I wanted to get some advice on next steps. Based on some research on these forums and reddit, I think my next steps are to unassign the 3 new drives and let the parity rebuild. While parity is rebuilding, prepare new disks by preclearing them. When parity and preclear is complete, add and assign new drives. Is that right?

Edited by soysauce10
added additional step missing
Link to comment
24 minutes ago, soysauce10 said:

I think my next steps are to unassign the 3 new drives and let the parity rebuild. While parity is rebuilding, prepare new disks by preclearing them. When parity and preclear is complete, add and assign new drives. Is that right?

None of that should matter. What about the original disks? What were your plans for that data?

Link to comment
57 minutes ago, JorgeB said:

Doing a parity sync should not crash the server, please post the diagnostics.

Sorry for not providing them earlier.

 

53 minutes ago, trurl said:

None of that should matter. What about the original disks? What were your plans for that data?

I had used Unbalance to move all data on those previous 3 disks out and onto the other disks in the array before I swapped in the 3 new drives.

wmlunraid-diagnostics-20220811-1107.zip

Link to comment
  • Solution
Aug 11 11:09:48 WMLUnraid kernel: ata1: COMRESET failed (errno=-16)
Aug 11 11:09:48 WMLUnraid kernel: ata1: reset failed, giving up
Aug 11 11:09:48 WMLUnraid kernel: ata1.00: disabled

 

SSD cache dropped offline, but that's not a reason for the server crashing, still you need to fix that, check/replace cables, then enable the syslog server, start the parity sync and post that if it crashes again.

Link to comment
4 minutes ago, JorgeB said:
Aug 11 11:09:48 WMLUnraid kernel: ata1: COMRESET failed (errno=-16)
Aug 11 11:09:48 WMLUnraid kernel: ata1: reset failed, giving up
Aug 11 11:09:48 WMLUnraid kernel: ata1.00: disabled

 

SSD cache dropped offline, but that's not a reason for the server crashing, still you need to fix that, check/replace cables, then enable the syslog server, start the parity sync and post that if it crashes again.

Interesting. Okay, I'll work on that. Thanks.

Link to comment
35 minutes ago, soysauce10 said:

used Unbalance to move all data on those previous 3 disks out and onto the other disks in the array before I swapped in the 3 new drives.

Would have been much simpler and probably faster to simply replace/rebuild each one at a time, no need to move anything.

Link to comment
4 minutes ago, trurl said:

Would have been much simpler and probably faster to simply replace/rebuild each one at a time, no need to move anything.


Do you mean add the three new drives and move the data onto the new ones then rebuild? The server is already maxed out on drives, so that's why I didn't add the new drives and move data before removing.

Link to comment

Nothing like that.

 

Do you know how to rebuild a disk if one of them fails? Exactly the same thing.

 

Replace/rebuild is the normal way to do what you were trying to do. Everyone does it all the time when they want to replace a disk with a larger disk, or if they think a drive is too old or beginning to show problem, or whatever.

 

Shut down, replace one disk, boot up, assign new disk to the slot of the replaced disk, start array to rebuild.

 

Repeat as necessary, one disk at a time.

 

Replacing/rebuilding disks is the whole point of parity.

 

 

Link to comment
Just now, trurl said:

Nothing like that.

 

Do you know how to rebuild a disk if one of them fails? Exactly the same thing.

 

Replace/rebuild is the normal way to do what you were trying to do. Everyone does it all the time when they want to replace a disk with a larger disk, or if they think a drive is too old or beginning to show problem, or whatever.

 

Shut down, replace one disk, boot up, assign new disk to the slot of the replaced disk, start array to rebuild.

 

Repeat as necessary, one disk at a time.

 

Replacing/rebuilding disks is the whole point of parity.

 

 

 

Got it. I should have done that. Yeah, I understood where I went wrong before I posted this topic. Thank you.

Link to comment

Perhaps you thought the whole point of parity was as a backup.

 

Parity contains none of your data and is in no way a substitute for backup. Do you have backups of anything important and irreplaceable?

 

Parity can only rebuild a single disk in combination with all the other disks. Dual parity allows 2 simultaneous rebuilds.

 

So, in a sense, you did remove too many at a time. If you were going to rebuild.

 

Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Very simple calculation, easy to understand.

 

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array

Link to comment
1 minute ago, trurl said:

Perhaps you thought the whole point of parity was as a backup.

 

Parity contains none of your data and is in no way a substitute for backup. Do you have backups of anything important and irreplaceable?

 

Parity can only rebuild a single disk in combination with all the other disks. Dual parity allows 2 simultaneous rebuilds.

 

So, in a sense, you did remove too many at a time. If you were going to rebuild.

 

Parity is just an extra bit that allows a missing bit to be calculated from all the other bits. Very simple calculation, easy to understand.

 

https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array


I think I sort of ignored parity part of it and removed the disks after using Unbalance to move files because I wasn't worried about the disks failing on me right then and there.

Link to comment

On the server, I'm seeing these disk errors. See the attached picture. Quick search on these forums look like it's an issue with the Docker image or the cache disk full. I think the docker image is at the max 40GB but the cache disk is not full. But it does correlate to the issue of the cache disk dropping out like JorgeB said.

20220811_120254.jpg

Link to comment

SSD is on a different port but it dropped again:

 

Aug 11 12:27:15 WMLUnraid kernel: ata2: COMRESET failed (errno=-16)
Aug 11 12:27:15 WMLUnraid kernel: ata2: limiting SATA link speed to 3.0 Gbps
Aug 11 12:27:20 WMLUnraid kernel: ata2: COMRESET failed (errno=-16)
Aug 11 12:27:20 WMLUnraid kernel: ata2: reset failed, giving up
Aug 11 12:27:20 WMLUnraid kernel: ata2.00: disabled

 

Link to comment
6 minutes ago, trurl said:

Can't tell anything about that since the array was not started in those diagnostics so nothing was mounted.

 

Start array and attach new diagnostics to your NEXT post in this thread.


Sorry about that. I started the array and it started a parity sync and then I exported logs.

 

 

1 minute ago, JorgeB said:

SSD is on a different port but it dropped again:

 

Aug 11 12:27:15 WMLUnraid kernel: ata2: COMRESET failed (errno=-16)
Aug 11 12:27:15 WMLUnraid kernel: ata2: limiting SATA link speed to 3.0 Gbps
Aug 11 12:27:20 WMLUnraid kernel: ata2: COMRESET failed (errno=-16)
Aug 11 12:27:20 WMLUnraid kernel: ata2: reset failed, giving up
Aug 11 12:27:20 WMLUnraid kernel: ata2.00: disabled

 


I just swapped out the sata cable to troubleshoot.

Thank you both for your help!

wmlunraid-diagnostics-20220811-1239.zip

Link to comment
1 minute ago, JorgeB said:

Before or after these last diags? It's still dropping.

 

Also the docker image is corrupt and needs to be re-created.

I swapped the cable before exporting the diags. Yeah, I can still see the errors on the server.

Does recreating the image mean losing all settings on the containers? Is there a way to restore?

Link to comment

Before doing anything else, fix the connection problems.

 

Then

 

appdata and system shares have files on the array.

 

Also, domains has files on its designated pool (cache) and on the other pool, maybe this is intentional?

 

And share named important apparently related to VMs, it is cache:no but has files on cache.

 

You have a cache:no share B-----s with files on cache.

 

1 minute ago, soysauce10 said:

Does recreating the image mean losing all settings on the containers? Is there a way to restore?

 

It is easy to restore dockers but we can get into that after things are working well and we clean up those shares.

Link to comment
On 8/11/2022 at 1:05 PM, trurl said:

Before doing anything else, fix the connection problems.

 

Then

 

appdata and system shares have files on the array.

 

Also, domains has files on its designated pool (cache) and on the other pool, maybe this is intentional?

 

And share named important apparently related to VMs, it is cache:no but has files on cache.

 

You have a cache:no share B-----s with files on cache.

 

 

It is easy to restore dockers but we can get into that after things are working well and we clean up those shares.

 

I think the connectivity issues are resolved. I spent the last day or so testing all the RAM and slots and letting parity sync back up.

When I used unbalance to move data from disk to disk, it left empty folders in those appdata and share names, which is why it appears domains folder is in cache and other pools. No, it is not intentional.

If you could point me in the right direction for the restore, that would be great. I'll start researching more on what the process is as well.

 

Thanks for the help!

Link to comment

Since you are on 6.10, you should install Dynamix File Manager plugin, it will let you delete those folders or move them if needed. Just note that nothing can move open files, and dockers will often have open files. Since you will be deleting docker.img that would be a perfect time to take care of that.

 

https://wiki.unraid.net/Manual/Docker_Management#Re-Create_the_Docker_image_file

 

https://wiki.unraid.net/Manual/Docker_Management#Re-Installing_Docker_Applications

Link to comment
16 hours ago, trurl said:

Since you are on 6.10, you should install Dynamix File Manager plugin, it will let you delete those folders or move them if needed. Just note that nothing can move open files, and dockers will often have open files. Since you will be deleting docker.img that would be a perfect time to take care of that.

 

https://wiki.unraid.net/Manual/Docker_Management#Re-Create_the_Docker_image_file

 

https://wiki.unraid.net/Manual/Docker_Management#Re-Installing_Docker_Applications

 

Restored! I appreciate the help!!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.