Shrinking Array, with many small disks


Go to solution Solved by kcossabo,

Recommended Posts

I am thinking of re-starting my server build.

I have 29 disks in the array, one parity drive in a 44TB Array;
- 8TB Parity
- 2x 8TB Data
- 3TB Data
- 25 x 1TB Disks (long story)

I have 28.8TB in use with 80% of it already on another system (Still migrating)

I have had a Cache drive fail (the one with appdata, and no active backup, I installed it, just forgot to configure it)
I have has one of the 1TB Drives fail and it takes 12 hours to rebuild

I read the article on Shrinking Array 


I am thinking of going to a;
- 22TB Parity Drive
- 2x 12TB
- 3x 8TB

Question:

It seems the right move, as I have a landing place for the data on the server to move to, is to back-up, destroy, rebuild, migrate data back. but what are your thoughts on the following options, or please add your thoughts.

1) Move Data off of server, then rebuild from scratch - 12-24 hours to move the data (probably less) with rSync and SMB mounts on the 10GE network.

2) Use the "The "Remove Drives Then Rebuild Parity" Method". The time it would take to remove 25 x 1TB drives safely by the parity rebuild would be 12+ Days.

3) Use the "The "Clear Drive Then Remove Drive" Method" - with active parity drive

4) I could use 'unbalance' to move the data off of the drives, a modified "The "Clear Drive Then Remove Drive" Method"

Thoughts.

Link to comment
39 minutes ago, itimpi said:

This seems wrong - I would expect a 1TB drive to only take something of the order of 2 hours or less to rebuild a 1TB drive.

I was very very surprised when I changed the drive and it reported a 12+ Hour rebuild estimate, to wake up this morning to it still rebuilding. 

My hope for the 24x 1TB drives was to have a quick rebuild, so when a failure occurred, I was not rebuilding 12TB; I was rebuilding 1/12th of that...

this is a 9211-8i HbA controller -LSI SAS2x36 expander with 24 SAS drives, if that impacts anything.

Link to comment
Posted (edited)
1 minute ago, JonathanM said:

How are these drives connected? Motherboard model, HBA model, etc.

a 9211-8i HbA controller -LSI SAS2x36 expander with 24 SAS drives, if that impacts anything. The other drives are SATA off of the mother board

Edited by kcossabo
Link to comment

Diagnostics may show the issue. I suspect either one or more of the drives is having problems reading, and all the retries are slowing things to a crawl, or there is something accessing the array reading and writing while the rebuild is happening. Rebuilding a drive requires reading simultaneously from all the other drives, so any issues with data speed will impact things greatly, as will any reads or writes to the array.

 

Do you have anything else accessing the array, like docker containers or VM's, or network clients?

Link to comment
9 minutes ago, JonathanM said:

Diagnostics may show the issue. I suspect either one or more of the drives is having problems reading, and all the retries are slowing things to a crawl, or there is something accessing the array reading and writing while the rebuild is happening. Rebuilding a drive requires reading simultaneously from all the other drives, so any issues with data speed will impact things greatly, as will any reads or writes to the array.

 

Do you have anything else accessing the array, like docker containers or VM's, or network clients?


Nothing accessing the array at this time. Lost the Cache drive, so all the Containers are Dead. Not sharing this out to other devices at this time.

GREATLY APPRECIATE  the link to Diagnostics

Will post what I find. 

Link to comment
  • Solution

found the issue.

 

%Cpu(s):  0.2 us, 88.5 sy,  0.0 ni, 11.0 id,  0.3 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem : 128800.0 total, 124426.1 free,   1536.2 used,   2837.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 125040.5 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                      
 1717 root      -2   0       0      0      0 R  97.0   0.0  16:41.31 acpi_pad/5                                                   
 1735 root      -2   0       0      0      0 R  96.7   0.0  16:42.05 acpi_pad/22                                                  
 1638 root      -2   0       0      0      0 R  95.4   0.0  16:43.78 acpi_pad/1                                                   
 1715 root      -2   0       0      0      0 R  95.4   0.0  16:41.49 acpi_pad/3                                                   
 1718 root      -2   0       0      0      0 R  95.4   0.0  16:40.45 acpi_pad/6                                                   
 1720 root      -2   0       0      0      0 R  95.4   0.0  16:42.80 acpi_pad/7                                                   
 1721 root      -2   0       0      0      0 R  95.4   0.0  16:42.07 acpi_pad/8                                                   
 1727 root      -2   0       0      0      0 R  95.4   0.0  16:41.63 acpi_pad/14                                                  
 1728 root      -2   0       0      0      0 R  95.4   0.0  16:43.09 acpi_pad/15                                                  
 1730 root      -2   0       0      0      0 R  95.4   0.0  16:41.88 acpi_pad/17                                                  
 1731 root      -2   0       0      0      0 R  95.4   0.0  16:41.03 acpi_pad/18                                                  
 1732 root      -2   0       0      0      0 R  95.4   0.0  16:42.11 acpi_pad/19                                                  
 1736 root      -2   0       0      0      0 R  95.4   0.0  16:42.64 acpi_pad/23                                                  
 1739 root      -2   0       0      0      0 R  95.4   0.0  16:42.17 acpi_pad/26                                                  
 1740 root      -2   0       0      0      0 R  95.4   0.0  16:43.18 acpi_pad/27                                                  
 1741 root      -2   0       0      0      0 R  95.4   0.0  16:41.27 acpi_pad/28                                                  
 1742 root      -2   0       0      0      0 R  95.4   0.0  16:41.92 acpi_pad/29                                                  
 1745 root      -2   0       0      0      0 R  95.4   0.0  16:41.97 acpi_pad/32                                                  
 1746 root      -2   0       0      0      0 R  95.4   0.0  16:41.92 acpi_pad/33                                                  
 1747 root      -2   0       0      0      0 R  95.4   0.0  16:42.47 acpi_pad/34                                                  
 1750 root      -2   0       0      0      0 R  95.4   0.0  16:41.95 acpi_pad/37                                                  
 1751 root      -2   0       0      0      0 R  95.4   0.0  16:42.37 acpi_pad/38                                                  
 1753 root      -2   0       0      0      0 R  95.4   0.0  16:41.25 acpi_pad/40                                                  
 1637 root      -2   0       0      0      0 R  95.1   0.0  16:41.82 acpi_pad/0                                                   
 1714 root      -2   0       0      0      0 R  95.1   0.0  16:40.85 acpi_pad/2                                                   
 1716 root      -2   0       0      0      0 R  95.1   0.0  16:41.90 acpi_pad/4                                                   
 1722 root      -2   0       0      0      0 R  95.1   0.0  16:40.76 acpi_pad/9                                                   
 1723 root      -2   0       0      0      0 R  95.1   0.0  16:43.21 acpi_pad/10                                                  
 1724 root      -2   0       0      0      0 R  95.1   0.0  16:41.61 acpi_pad/11                                                  
 1725 root      -2   0       0      0      0 R  95.1   0.0  16:42.07 acpi_pad/12                                                  
 1726 root      -2   0       0      0      0 R  95.1   0.0  16:41.34 acpi_pad/13                                                  
 1729 root      -2   0       0      0      0 R  95.1   0.0  16:42.44 acpi_pad/16                                                  
 1733 root      -2   0       0      0      0 R  95.1   0.0  16:42.86 acpi_pad/20                                                  
 1734 root      -2   0       0      0      0 R  95.1   0.0  16:42.53 acpi_pad/21                                                  
 1737 root      -2   0       0      0      0 R  95.1   0.0  16:42.44 acpi_pad/24                                                  
 1738 root      -2   0       0      0      0 R  95.1   0.0  16:41.72 acpi_pad/25          


I issued

 

modprobe --remove acpi_pad


and the time to rebuild is dropping quickly, unlike last nights. I am down to less than 9 hours in less than 8 min.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.