Addition of a parity disk cripples transfer speed


Recommended Posts

Hi folks,

 

I've been trying Unraid for the past month (1 hour remaining on my trial actually), and I'm quite pleased with it so far. Since I'm impatient, have proper backups and had my future parity drive stuck somewhere due to COVID-19, I first set everything up without parity. All was fine, transfer speed within the server was around 150-180 MB/s on average.

 

Last Monday, the parity drive finally arrived. I precleared it (I get it's not necessary anymore, point was avoiding premature failure issues), set the disk as the new parity, and let Unraid do it's job for the next 17 hours (for a 8TB drive). I then tuned everything up according to the wiki, Spaceinvader One amazing videos and this very forum.

 

Everything seemed to work fine, except files transfers between disks or to the shares slowed down a whole lot. They now start around 120 MB/s, and usually stabilize around 40-60 MB/s a few minutes later (using MC or any CLI tool).

 

Time: 0:00.08 ETA 0:07.05 (164.92 MB/s)
Time: 0:00.15 ETA 0:10.05 (114.67 MB/s)
Time: 0:00.24 ETA 0:11.30 (99.59 MB/s)
Time: 0:01.22 ETA 0:13.19 (80.77 MB/s)
Time: 0:07.25 ETA 0:09.21 (70.66 MB/s)
...

 

The rig :

Intel i5 3570k (4 cores)

Gigabyte H77N-WIFI

8 GB DDR3

Dell H200 controller (flashed in IT mode, fresh thermal paste and brand new Noctua fan. Heat sink is barely warm)

 

It seems more than capable for my needs, considering the minimum specs in the documentation. I barely use anything except the NAS features. I only run a very small Debian VM, to have a secured interface between Unraid and my distant servers. It does nothing 90% of the time, and was disabled during troubleshooting. I don't run any Docker container, except DiskSpeed during the benchmarks.

 

The disks :

1529348566_20-04-11-disks.thumb.PNG.4d212211c5897cff6d45848d132f3f12.PNG

 

Note : still waiting for an extra SATA power cable to plug my second SSD and set a RAID1 for cache. The mover runs hourly, so the limited capacity is sufficient for my needs.

 

Troubleshooting so far :

 

First I benchmarked the disks, which doesn't show anything special to my knowledge. No controller induced bottle-neck.

 

20-04-11.thumb.PNG.0663448b2eac165150aece9409c91207.PNG1024199119_20-04-11-controller.thumb.PNG.d23e5bff9c4117e30e1f4bca3e3945b4.PNG

 

The 2 old WD Green drives are obviously slower, so I though that maybe reconstruct write for the parity wasn't optimal, since any write on the array makes the other disks read. Switching to read/modify/write made no difference, so I went back.

 

I re-wired everything in the server, checked the controller cooling, still no cookie.

 

The only issue I can see in the syslog is a seemingly random NTP drift, but I doubt it's relevant.

 

In a desperate move, I unassigned the parity drive, which immediately fixed the speed issue. Parity-Sync is running as I write to revert that bold move.

 

I've attached the diagnostics. Sorry the log is so short, I've rebooted the server a few dozens times in the last couple days. If necessary I'll let it run for a few days and upload it again.

 

Any help would be appreciated, I've no clue what to do next. 🤓

saucisse-diagnostics-20200410-2046.zip

Link to comment
7 hours ago, Majyk Oyster said:

around 40-60 MB/s

After your dramatic title, I was expecting something worse. This is a reasonable speed.

 

7 hours ago, Majyk Oyster said:

The mover runs hourly, so the limited capacity is sufficient for my needs.

I know you aren't complaining about this, but Mover is intended for idle time. You might find it better to run mover less often and not cache some of your user shares. Do you need cache speed for all writes?

 

I cache very little, since most writes to my server are scheduled backups and queued downloads, so I am not waiting on them to complete anyway.

 

Haven't looked at the diagnostics. Are you actually having any problems other than the expected slower parity writes?

 

  • Thanks 1
Link to comment

Thanks for your replies.

 

I expected some overhead, but I  had no idea how much since I have 0 experience with software raid-like solutions. I couldn't find any infos to roughly quantify real life performances. I'm still moving/sorting lots of data (disk to disk) to make my server nice and tidy, which would bypass the cache. That kind of transfer should diminish drastically soon.

 

I'm still somewhat confused : since the CPU, RAM and bandwidth are all far from saturation, what would the typical limiting factor ? Could I do anything to slightly improve things ?

 

32 minutes ago, trurl said:

I know you aren't complaining about this, but Mover is intended for idle time. You might find it better to run mover less often and not cache some of your user shares. Do you need cache speed for all writes?

 

I cache very little, since most writes to my server are scheduled backups and queued downloads, so I am not waiting on them to complete anyway.

The only reason I set the mover hourly is because I used the small 120GB SSDs I had laying around, and they fill up pretty quickly. Ideally I would switch to 480GB-960GB SSDs and use the default nightly mover setting.

 

I never enabled the cache on shares containing huge files (20GB+). Disabling it for all shares used mainly for automated transfers is indeed something to consider, thanks for the input.

 

35 minutes ago, trurl said:

Haven't looked at the diagnostics. Are you actually having any problems other than the expected slower parity writes?

Not that I can tell, the wiki and the forum helped me with every issue I had so far.

Link to comment

With reconstruct write enable you should get write speeds around the max speed of the slowest disk in use at any point, so if you were writing to an empty disk max speed should be >100MB/s, if you were writing for example to one disk with about 2.8/2.9TB used speed should be around 60MB/s, since it would be limited by the 3TB WD green, other than that you can have one or more disks with slow sector zones, was the parity check speed always normal, without slow periods? If you have dynamix stats installed you can check the graph (before rebooting).

  • Thanks 1
Link to comment
42 minutes ago, johnnie.black said:

With reconstruct write enable you should get write speeds around the max speed of the slowest disk in use at any point, so if you were writing to an empty disk max speed should be >100MB/s, if you were writing for example to one disk with about 2.8/2.9TB used speed should be around 60MB/s, since it would be limited by the 3TB WD green, other than that you can have one or more disks with slow sector zones, was the parity check speed always normal, without slow periods? If you have dynamix stats installed you can check the graph (before rebooting).

@johnnie.black  Not always  Because the kernel does delayed writes to the physical device, if you have a few drives, and your allocation mode is set to "Most Free", the files all go to different drives.

 

When this happens, you'll see in the dashboard that the system has actually switched to rwm from reconstruct, because reads happen after a file is fully written (updating file system tables / whatnot)  If you're transferring a ton of files, of varying sizes, the drives all have the same free space, and the allocation method is most free the system will stay in rwm for upwards of 90% of the time.

 

Drove me nuts sorting that out last night.  

 

A bit of an edge case to report to Tom, but I did come to the firm conclusion that I absolutely hate how he's got it switching back and forth now.

Link to comment

Yes, that's a known issue, or better yet to known issues, most free should never be used for best performance, even with older releases where it didn't auto switch to R/M/W there was still a very noticeable performance impact when parity writes overlapped, this performance impact is much worse with the new behavior, and one I'm against as I've already posted multiple times, IMHO that should only happen if write mode is set to auto, when I enable turbo write I want turbo write always.

Link to comment

It's an annoyance more than anything, but unfortunately not a bug but a design choice.  But I've never actually had any noticeable performance hits from Most Free and read/write/modify modes.  I never would have noticed this behaviour had I not been transferring back and forth terabytes of information back and forth between 2 servers.  Tried everything to mitigate it and the solution was to switch to highwater to avoid the overlapping reads at the end of a file.  Unless your drives all have the same amount of free space and you're transferring over a lot of information you may never even notice the degradation.  

 

But, under the circumstances I had, once it switched to read/write/modify it tended to stay there all the time (90%+ of the time) with a huge hit to the performance that was very noticeable.

  • Like 1
Link to comment
1 hour ago, johnnie.black said:

With reconstruct write enable you should get write speeds around the max speed of the slowest disk in use at any point, so if you were writing to an empty disk max speed should be >100MB/s, if you were writing for example to one disk with about 2.8/2.9TB used speed should be around 60MB/s, since it would be limited by the 3TB WD green, other than that you can have one or more disks with slow sector zones, was the parity check speed always normal, without slow periods? If you have dynamix stats installed you can check the graph (before rebooting).

Thanks for the input. That's closer to what I first imagined.

 

I just installed Dynamix stats, so there's not much data to inspect right now. But I took a closer look at the parity sync (still running). It just went over 50% (4TB) while I was watching, which is where WD Greens are not used anymore, and only the Reds and the Ironwolf are left. The speed instantly went from 70MB/s to 160MB/s. During the first hours, the speed was around 110-120MB/s. The last time I made a parity sync, it ended with a 120MB/s average speed. So everything looks normal and coherent with the Diskspeed results so far, regarding parity sync at least.

 

Before :

paritysync-progress.thumb.PNG.c5d9c9e4eb0c9c388537942f27763e4a.PNG

paritysync-disksspeed.thumb.PNG.676259e11c0450189b1b83e064da64b5.PNG

 

After :

paritysync-disksspeed2.thumb.PNG.c7b023eae111ae2e3122b2c39c0035e3.PNG

 

However, writing from a Red disk to another, using read/modify/write mode so the Greens don't slow everything done, shouldn't I get better performance than 40-60MB/s ? It didn't seem to make any difference when I tried yesterday. I did quite a bunch of tests with disk to disk transfers before removing the parity, I doubt they where all impacted with the WD Green end of platter speeds.

 

Replacing those WD Greens with more Reds (shucked white labels actually, don't tell WD) is very tempting indeed. I've added 3x8TB drives to the server in the last few weeks (and retired a pile of 1-3TB disks), so my computer budget is kinda burned right now. I might try to replace the Greens with old but functional 7200rpm drives that sit on my desk collecting dust, that could make things better I guess (unless I find it's pointless in the meantime).

 

  

1 hour ago, Squid said:

@johnnie.black  Not always  Because the kernel does delayed writes to the physical device, if you have a few drives, and your allocation mode is set to "Most Free", the files all go to different drives.

After a bit of tinkering, I only use high-water now. Thanks for the input nonetheless !

Edited by Majyk Oyster
Answering new replies
Link to comment
8 minutes ago, Majyk Oyster said:

However, writing from a Red disk to another, using read/modify/write mode so the Greens don't slow everything done, shouldn't I get better performance than 40-60MB/s ?

Do you mean from one array drive to another? That will always be slow, and write mode will auto change from reconstruct to r/m/w.

Link to comment
18 minutes ago, johnnie.black said:

Do you mean from one array drive to another? That will always be slow, and write mode will auto change from reconstruct to r/m/w.

Exactly, disk to disk within the array using MC or another CLI command.

 

I can't find any reference of what to expect performance-wise on average/enthusiast/non-pro hardware, so I have a hard time knowing what's normal and what could be improved.  Troubleshooting something that's not broken can be hard sometimes. 🤓

Edited by Majyk Oyster
Link to comment
26 minutes ago, Majyk Oyster said:

so I have a hard time knowing what's normal and what could be improved.

Array to array copies with Unraid were never fast, mostly because of parity being a dedicated drive, with earlier releases it was a little faster with Turbo Write, still because one disk has to read and write at the same time it was never that fast, but you could get 70/80MB/s, now, and since the write mode auto changes to r/m/w it can never go above 50/60MB/s, so what you are experiencing is normal, and speed will be the same with or without turbo write, turbo write is much faster for any transfers originating outside the array, like your desktop or from cache or any UD device.

  • Thanks 1
Link to comment

Thanks for the details, much appreciated. Seems like a wasted a bunch of time and a perfectly good parity then. 😁

 

The WD Greens can definitely max out that 60MB/s "hard limit", so I guess it's pointless replacing them while using parity. I'd still love to know exactly what's the limiting factor in all this, but that may be Unraid secret sauce.

 

On a side note, I shrunk my VM and docker.img, so now I have under 3,5GB used space on the cache, and it's also freed of all automated writes. That should let it breeze a little.

Link to comment
29 minutes ago, Majyk Oyster said:

'd still love to know exactly what's the limiting factor in all this, but that may be Unraid secret sauce.

Did you read the link I gave earlier in the thread?    Once you start getting complete disk rotations as part of a write operation that is very much a limiting factor.

  • Like 1
Link to comment
1 minute ago, itimpi said:

Did you read the link I gave earlier in the thread?    Once you start getting complete disk rotations as part of a write operation that is very much a limiting factor.

Absolutely, every word of it. I only used the Read/Modify/Write mode for a few tests, my array is configured with turbo mode. Disk rotation wait shouldn't be an issue (?).

 

As I understand things, considering Diskspeed results and the way turbo mode works, the slowest writing speed would be around 60MB/s when reading from the end of my 3TB WD Green disk to reconstruct parity, but it also could be much better when writing past the "half" of my parity drive, which means reading only from faster disks. So I don't get why I never experienced better speeds since I activated parity. I could still be missing something though. 

Link to comment
15 hours ago, Majyk Oyster said:

the way turbo mode works, the slowest writing speed would be around 60MB/s when reading from the end of my 3TB WD Green disk to reconstruct parity, but it also could be much better when writing past the "half" of my parity drive, which means reading only from faster disks.

That's correct, IF the transfer source is outside the array.

  • Like 1
Link to comment

As the parity is done rebuilding and the array is protected once again, I checked various things :

 

- Reads to the array from ethernet are stable at 120MB/s (which is pretty nice).

- Writes from ethernet to cached shares on the array are stable at 80MB/s.

- Writes from ethernet to non cached shares on the array are stable at 80MB/s.

- Disk to disk transfers within the array start around 120MB/S and slowly stabilize around 50MB/s.

 

I've double checked, the files indeed land on /mnt/cache/ or /mnt/diskX/ as they're supposed to. I'm not sure how to simulate the varying speed of parity reconstructing writes, depending on the disks reading data and the position of the data on the platters.

 

That makes me wonder : are my array disks too fast to allow my cache (SSD) to make any difference on array writes ? The wiki page might be a little outdated on that subject, since mentioned speed and sizes are those of 2011 hardware.

 

 

Link to comment
1 hour ago, trurl said:

Is this a move or a copy? Moving of course is also writing the source disk to delete the files.

Both seem to be just as fast while the data is written. I guess the extra work needed for a move is done once the transfer in completed, and the source file deleted.

1 hour ago, johnnie.black said:

Is this data to data disk, or cache to data? Like mentioned data to data disk will always be slower.

It's data to data (/mnt/diskA > /mnt/diskB/). I just witnessed such a transfer at 80MB/s, so the aforementioned fluctuation is indeed confirmed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.