kcossabo Posted March 21 Share Posted March 21 I am thinking of re-starting my server build. I have 29 disks in the array, one parity drive in a 44TB Array; - 8TB Parity - 2x 8TB Data - 3TB Data - 25 x 1TB Disks (long story) I have 28.8TB in use with 80% of it already on another system (Still migrating) I have had a Cache drive fail (the one with appdata, and no active backup, I installed it, just forgot to configure it) I have has one of the 1TB Drives fail and it takes 12 hours to rebuild I read the article on Shrinking Array I am thinking of going to a; - 22TB Parity Drive - 2x 12TB - 3x 8TB Question: It seems the right move, as I have a landing place for the data on the server to move to, is to back-up, destroy, rebuild, migrate data back. but what are your thoughts on the following options, or please add your thoughts. 1) Move Data off of server, then rebuild from scratch - 12-24 hours to move the data (probably less) with rSync and SMB mounts on the 10GE network. 2) Use the "The "Remove Drives Then Rebuild Parity" Method". The time it would take to remove 25 x 1TB drives safely by the parity rebuild would be 12+ Days. 3) Use the "The "Clear Drive Then Remove Drive" Method" - with active parity drive 4) I could use 'unbalance' to move the data off of the drives, a modified "The "Clear Drive Then Remove Drive" Method" Thoughts. Quote Link to comment
itimpi Posted March 21 Share Posted March 21 44 minutes ago, kcossabo said: I have has one of the 1TB Drives fail and it takes 12 hours to rebuild This seems wrong - I would expect a 1TB drive to only take something of the order of 2 hours or less to rebuild a 1TB drive. Quote Link to comment
JonathanM Posted March 21 Share Posted March 21 1 hour ago, kcossabo said: I have 29 disks in the array, one parity drive in a 44TB Array; How are these drives connected? Motherboard model, HBA model, etc. Quote Link to comment
kcossabo Posted March 21 Author Share Posted March 21 39 minutes ago, itimpi said: This seems wrong - I would expect a 1TB drive to only take something of the order of 2 hours or less to rebuild a 1TB drive. I was very very surprised when I changed the drive and it reported a 12+ Hour rebuild estimate, to wake up this morning to it still rebuilding. My hope for the 24x 1TB drives was to have a quick rebuild, so when a failure occurred, I was not rebuilding 12TB; I was rebuilding 1/12th of that... this is a 9211-8i HbA controller -LSI SAS2x36 expander with 24 SAS drives, if that impacts anything. Quote Link to comment
kcossabo Posted March 21 Author Share Posted March 21 (edited) 1 minute ago, JonathanM said: How are these drives connected? Motherboard model, HBA model, etc. a 9211-8i HbA controller -LSI SAS2x36 expander with 24 SAS drives, if that impacts anything. The other drives are SATA off of the mother board Edited March 21 by kcossabo Quote Link to comment
JonathanM Posted March 21 Share Posted March 21 Diagnostics may show the issue. I suspect either one or more of the drives is having problems reading, and all the retries are slowing things to a crawl, or there is something accessing the array reading and writing while the rebuild is happening. Rebuilding a drive requires reading simultaneously from all the other drives, so any issues with data speed will impact things greatly, as will any reads or writes to the array. Do you have anything else accessing the array, like docker containers or VM's, or network clients? Quote Link to comment
kcossabo Posted March 21 Author Share Posted March 21 9 minutes ago, JonathanM said: Diagnostics may show the issue. I suspect either one or more of the drives is having problems reading, and all the retries are slowing things to a crawl, or there is something accessing the array reading and writing while the rebuild is happening. Rebuilding a drive requires reading simultaneously from all the other drives, so any issues with data speed will impact things greatly, as will any reads or writes to the array. Do you have anything else accessing the array, like docker containers or VM's, or network clients? Nothing accessing the array at this time. Lost the Cache drive, so all the Containers are Dead. Not sharing this out to other devices at this time. I GREATLY APPRECIATE the link to Diagnostics. Will post what I find. Quote Link to comment
kcossabo Posted March 21 Author Share Posted March 21 That diagnosis is big. Not sure what to look at, but for fun I am swapping another 1TB drive 21 hours for this 1TB.... Quote Link to comment
Solution kcossabo Posted March 21 Author Solution Share Posted March 21 found the issue. %Cpu(s): 0.2 us, 88.5 sy, 0.0 ni, 11.0 id, 0.3 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 128800.0 total, 124426.1 free, 1536.2 used, 2837.7 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 125040.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1717 root -2 0 0 0 0 R 97.0 0.0 16:41.31 acpi_pad/5 1735 root -2 0 0 0 0 R 96.7 0.0 16:42.05 acpi_pad/22 1638 root -2 0 0 0 0 R 95.4 0.0 16:43.78 acpi_pad/1 1715 root -2 0 0 0 0 R 95.4 0.0 16:41.49 acpi_pad/3 1718 root -2 0 0 0 0 R 95.4 0.0 16:40.45 acpi_pad/6 1720 root -2 0 0 0 0 R 95.4 0.0 16:42.80 acpi_pad/7 1721 root -2 0 0 0 0 R 95.4 0.0 16:42.07 acpi_pad/8 1727 root -2 0 0 0 0 R 95.4 0.0 16:41.63 acpi_pad/14 1728 root -2 0 0 0 0 R 95.4 0.0 16:43.09 acpi_pad/15 1730 root -2 0 0 0 0 R 95.4 0.0 16:41.88 acpi_pad/17 1731 root -2 0 0 0 0 R 95.4 0.0 16:41.03 acpi_pad/18 1732 root -2 0 0 0 0 R 95.4 0.0 16:42.11 acpi_pad/19 1736 root -2 0 0 0 0 R 95.4 0.0 16:42.64 acpi_pad/23 1739 root -2 0 0 0 0 R 95.4 0.0 16:42.17 acpi_pad/26 1740 root -2 0 0 0 0 R 95.4 0.0 16:43.18 acpi_pad/27 1741 root -2 0 0 0 0 R 95.4 0.0 16:41.27 acpi_pad/28 1742 root -2 0 0 0 0 R 95.4 0.0 16:41.92 acpi_pad/29 1745 root -2 0 0 0 0 R 95.4 0.0 16:41.97 acpi_pad/32 1746 root -2 0 0 0 0 R 95.4 0.0 16:41.92 acpi_pad/33 1747 root -2 0 0 0 0 R 95.4 0.0 16:42.47 acpi_pad/34 1750 root -2 0 0 0 0 R 95.4 0.0 16:41.95 acpi_pad/37 1751 root -2 0 0 0 0 R 95.4 0.0 16:42.37 acpi_pad/38 1753 root -2 0 0 0 0 R 95.4 0.0 16:41.25 acpi_pad/40 1637 root -2 0 0 0 0 R 95.1 0.0 16:41.82 acpi_pad/0 1714 root -2 0 0 0 0 R 95.1 0.0 16:40.85 acpi_pad/2 1716 root -2 0 0 0 0 R 95.1 0.0 16:41.90 acpi_pad/4 1722 root -2 0 0 0 0 R 95.1 0.0 16:40.76 acpi_pad/9 1723 root -2 0 0 0 0 R 95.1 0.0 16:43.21 acpi_pad/10 1724 root -2 0 0 0 0 R 95.1 0.0 16:41.61 acpi_pad/11 1725 root -2 0 0 0 0 R 95.1 0.0 16:42.07 acpi_pad/12 1726 root -2 0 0 0 0 R 95.1 0.0 16:41.34 acpi_pad/13 1729 root -2 0 0 0 0 R 95.1 0.0 16:42.44 acpi_pad/16 1733 root -2 0 0 0 0 R 95.1 0.0 16:42.86 acpi_pad/20 1734 root -2 0 0 0 0 R 95.1 0.0 16:42.53 acpi_pad/21 1737 root -2 0 0 0 0 R 95.1 0.0 16:42.44 acpi_pad/24 1738 root -2 0 0 0 0 R 95.1 0.0 16:41.72 acpi_pad/25 I issued modprobe --remove acpi_pad and the time to rebuild is dropping quickly, unlike last nights. I am down to less than 9 hours in less than 8 min. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.