Replacing 2 drives at once on 6.8.3?


Recommended Posts

Hi All,

I've just received 2 new drives for my Unraid setup, to replace 2 older drives in the array.

I'm currently using 6x 8TB drives (double parity and 6 data) and 2x 6TB drives (2 data).

The 2x 6TB drives are the candidates for replacement/upgrade with 2x 8TB disks.

 

I've been searching around and googling and haven't been able to find any info apart from a thread using unraid 5.0.3, which was also using single parity, and he didn't seem to follow the correct process that I'm familiar with.

 

Anyhow...

 

So my question is, does the current stable 6.8.3 (When using double parity) support replacing 2 disks at once?

Has anyone attempted this, or is it a no-go idea?

Link to comment

I wondered about this also, as a valid and healthy parity seems to be a key requirement.

 

I've had a good parity history, and run at the beginning of every month:

image.png.58df386faa8f13a5c74b7de310eb55c9.png

 

So I had a good parity check about 10 days ago.

 

As far as I can tell, none of my drives have any SMART errors and are recent drives purchased within the past year.

 

Link to comment

Thanks, that's great to know that things should work.

 

As an aside question, I've been wondering why my parity checks run at about 110MB/second, but my drives are capable of above 200MB/s.

Am I missing something in the calculation or is it a case of slowest drive, or something else?

Link to comment
Just now, jonathanm said:

All of them over their full capacity?

Sorry I'm not sure what you mean.

 

The drives have had a couple TB free on each for some time, but the speed has been consistently at around 110MB/s.

Recently they have been filling up a bit more, but the parity speed has been the same.

Link to comment
52 minutes ago, KptnKMan said:

As an aside question, I've been wondering why my parity checks run at about 110MB/second, but my drives are capable of above 200MB/s.

Am I missing something in the calculation or is it a case of slowest drive, or something else?

Generally the speed is good at first and is going down with time as the heads read from the outer of the platters to the center.

If I look at my example, I start at 190ish MB/s and I end up on the 90ish for an average of 145MB/s.

Edited by ChatNoir
fixed my mistake
  • Thanks 1
Link to comment

Disks are typically somewhat slower on the inner cylinders simply due to data density. The shorter tracks have less data because they are shorter, but it is running at the same RPM as the outer tracks which are longer. And the outer tracks are used first, so slower the fuller it gets.

 

And of course, parity operations are more complicated than single disk read/write.

Link to comment

Ok so I downloaded and ran Diskspeed yesterday, and have been running it a few times with mixed results:

image.png.c46b09aec0810a16a86e201681984526.png

 

Unfortunately, 2 of my disks are slower, but these are the 2x 6TB disks that I'm replacing... so no surprise there.

What is surprising, however, seems to be that one of the 6TB disks (Disk1/sdl) is bandwidth capped, and the other (Disk2/sdm) is causing issues finishing the tests, and seems to be retrying with Speed Gap errors every time I ran it:

image.thumb.png.df7cd1797d190f26cb21ea7f030da5d6.png

 

Interestingly, I never knew about this Diskspeed tool but it's interesting and has definitely confirmed my suspicions about these 6TB drives, being that they are (1) slowing down operations and (2) are proving a bit inconsistent alongside the newer 8TB drives.

If anyone hasn't done so, I'd recommend checking out the Hard Drive Database associated with the Diskspeed tool to see how results of drives with the same model number perform. I assume that the Diskspeed tool uploads test results to this database.

 

So at this point, looks like a good idea to get these guys out of there.

Link to comment
15 hours ago, KptnKMan said:

Unfortunately, 2 of my disks are slower, but these are the 2x 6TB disks that I'm replacing... so no surprise there.

What is surprising, however, seems to be that one of the 6TB disks (Disk1/sdl) is bandwidth capped, and the other (Disk2/sdm) is causing issues finishing the tests, and seems to be retrying with Speed Gap errors every time I ran it:

image.thumb.png.df7cd1797d190f26cb21ea7f030da5d6.png

 

Interestingly, I never knew about this Diskspeed tool but it's interesting and has definitely confirmed my suspicions about these 6TB drives, being that they are (1) slowing down operations and (2) are proving a bit inconsistent alongside the newer 8TB drives.

If anyone hasn't done so, I'd recommend checking out the Hard Drive Database associated with the Diskspeed tool to see how results of drives with the same model number perform. I assume that the Diskspeed tool uploads test results to this database.

 

So at this point, looks like a good idea to get these guys out of there.

 

The bandwidth capped means that the drive is likely outputting data faster than the system can utilize it and is represented by a flat'ish line for a portion of the graph.

 

Question - when you were getting the Speed Gap errors, was the max allowed size increasing? It's supposed to increase every time it retries to eventually pass but here it doesn't look like it was doing that.

Edited by jbartlett
Link to comment
20 minutes ago, jbartlett said:

 

The bandwidth capped means that the drive is likely outputting data faster than the system can utilize it and is represented by a flat'ish line for a portion of the graph.

I can't say the line looked very flat'ish but I'm not sure if this is an issue with these older drives bursting data of something.

I thought it would have had the opposite issue, with not keeping up with the other 8TB drives.

Guess I was mistaken.

 

20 minutes ago, jbartlett said:

 

Question - when you were getting the Speed Gap errors, was the max allowed size increasing? It's supposed to increase every time it retries to eventually pass but here it doesn't look like it was doing that.

Is that the data in the other graph? That was somewhat linear like this graph, but seemed to have got stuck retrying at 90%. I'm not sure if you're referring to that. 

 

 

I feel like I've summoned a genie by accident. 

Link to comment

Hi @jbartlett and thanks for making this tool.

I've gained a lot of insight into how my drives are performing.

However, I'm still not sure of your question regarding the "Max allowed size", I think you mean the threshold of measured throughput, but I don't know where I can see or verify this. It didn't seem to be increasing above 45MB. I also tried running the same test with "Disable Speed Gap detection" enabled, and it still would not finish the test.

 

So once the parity rebuild finished, looks like it was a little faster overall, despite the system being under normal load:

image.png.930deb8e4fae7e5b13876f1dd081f73d.png

 

Ran the Diskspeed test again (The test finished this time):

image.png.7ff712a940fcac188ed5969733d9d08f.png

 

The results look a lot more pleasing, with all the drives performing quite well together.

Interestingly, there was a still a bandwidth cap on Parity1 and (the now new) Disk1:

image.png.3fa49203e5d7c18bb44ccdcf13c9a114.png

 

I'm not sure what to do with this result as yet, but it looks like the system is at least behaving normally.

Link to comment
On 1/10/2021 at 11:19 AM, jonathanm said:

As always, the typical disclaimer that RAID or Unraid is NOT backup, it's only redundancy, applies.

 

As long as your backup strategy is working ok for you, then you should be fine to go ahead with a double rebuild.

 

When replacing drives that haven't failed, just keep the old drives untouched until the rebuild completes. At least for the 2 drives being rebuilt, you'll have a full backup of the data they contain. The issue is what happens when one or more of the other drives that are not being rebuilt fails. In my case I'm not too concerned as my original DVD/Blu-ray/UHD Blu-rays are my backup. And the internet for anything else that I may not have on disc (or easy access to, like my VHS/Beta/Laserdisc analog captures).

 

I'm about to do the same... upgrade 2 x 10TB with 2 x 16TB. The 2 x 10TB drives are 10 month old WD that both have some reallocated sectors, but they still seem OK overall. I plan to run some more diagnostics on them and if required, RMA them while still covered by warranty.

 

All of my drives are less than two years old now (after replacing many that were 6 - 10 years old). My concern is the overall system performance hit for a dual drive rebuild. I already see read/write performance drop when rebuilding a single drive, enough that Plex may occasionally buffer if there more than a couple of users active. With two drives rebuilding, that may reduce performance enough to make the system unusable. I mitigate write performance issues by only trying to use my 2TB cache SSD during the rebuild and then let Mover handle the backlog when the rebuild completes.

 

Alas the age of my motherboard/CPU/RAM already impact my day to day usage as they only support PCIe 2.0 speed for the LSI HBA. I'm still saving money to convert the system (a 10 year old Supermicro CSE-847) to a DAS, connected to a new outboard system as the host. I'm planning to go either Ryzen 5950 or maybe stretch the budget and go with a Threadripper.

 

It takes the current setup about 50 hrs to rebuild a single drive onto a new 16TB. Just not sure if a dual rebuild would increase this or if it's just better to be patient and do one drive at a time. Thoughts?

 

 

 

 

Link to comment
18 hours ago, AgentXXL said:

It takes the current setup about 50 hrs to rebuild a single drive onto a new 16TB. Just not sure if a dual rebuild would increase this or if it's just better to be patient and do one drive at a time. Thoughts?

50 hours? Yikes. I couldn't be happy with that, and this is also the reason I stuck with 8TB drives max until (unlikely, for many reasons) consumer SATA gets faster then 600MB/sec. My dual 8TB rebuild ran about 16hours in the end, but I think would have been much faster if the system wasn't in active normal use.

Anyway, you haven't indicated any actual numbers on what you're doing, as I have, so there's no way to make any real conclusion about what is happening with your system.

 

My backup Unraid server is also an aging system on PCIE2.0, that I'm planning to soon decommission, but I still get 95.8MB/sec on Parity-Checks:

image.png.0cd5531a969502a43ff219fc6ef6166f.png

 

I'm not using 16TB disks in my backup system, of course, but it'll be a little faster than 95.8MB/sec when I swap in my currently free 6TB disks in there.

 

In comparison, I have no idea what you're dealing with that would take that long.

You're using disks twice the size of my primary Unraid, but sounds like more than double the time:

image.png.920e9d406f1cc757d7f308846cd9a6e9.png

 

You should post some meaningful numbers, like Parity-Check History and Diskspeed tests, if you want opinions.

Link to comment
5 hours ago, trurl said:

Should be about the same since it is all done in parallel, assuming no controller bottlenecks such as port multipliers. I agree 50 hours seems excessive. My 8TB takes a little over 17 hours.

 

I've decided to try an alternate method, a variation on the 'Replace multiple smaller disks with a single larger one' - https://wiki.unraid.net/Replacing_Multiple_Data_Drives_with_a_Single_Larger_Drive. I've installed the 2 x 16TB (successfully precleared) as UD mounted drives. I formatted each with XFS and then copied the data from the respective 10TB drives that they're going to replace. That took about 18hrs to do both drives as both 10TB drives were almost full. I'm now in the process of shutting down and removing the 2 x 10TB drives. Then when I powerup (my array is set to NOT autostart), I use the New Config tool to rebuild my parity. Worst case is another 50 hrs for a complete dual parity rebuild.

 

Once the rebuild is complete I'll try running Diskspeed to see if it can help pinpoint my bottleneck. The array does have 27 drives totalling 288TB after this process, including the 2 x parity and 1 x cache SSD. So 24 disks that make up the unRAID pool. This older Supermicro system has some real quirks with the motherboard and no matter what I try I can't even get the LSI HBA to run at even 50% of the PCIe 2.0 bus speed. I am using the 6Gbps capable single controller backplanes but so far the issues I've encountered appear to be between the LSI HBA and the PCIe bus.  I suspect when I convert to DAS using an outboard host with at least PCIe 3.0 capability, my performance will definitely improve.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.