Slow write speeds to cache and pool


Recommended Posts

Hi guys, 

I've been running Unraid for almost a year now. All issues I’ve run into so far, I’ve been able to resolve on my own with the help of google and this forum. But I’m kinda stumped now.


I’m running 5 data disks, a mixture of SATA and SAS drives (all 7200RPM), plus dual parity (both SAS), and a 500GB Samsung Evo 850 SSD as a cache disk, for a total of 8 disks. All of the spindles are on XFS, and the Cache is on BTRFS.

The disks assigned are connected to sdc through sdj.
 

When I first built the server, I had fewer disks running and only single parity, but I could get write speeds to the cache of about 130mbps or better; and write speeds to the spindle disks around 50mbps, as I recall (not exact, so don’t quote me). Not exactly impressive, but I’m not a speed freak either, and while I was not impressed, I didn’t find those speeds to be troubling enough to be concerned about. I didn’t go with Unraid for speed, I went with it for ease of use and ability to use mismatched disks, and easily be able to upgrade/expand the data pool.

At any rate, I’m not overly concerned with performance of the spindle disks, as long as the cache does its job, and is maintaining SOMEWHAT reasonable write speeds. The problem is, it’s not anymore.


I recently noticed (since upgrading to 6.7.0) that my write speeds are maxed out at about 30MBPS, no matter what. Includes the cache. I don’t know if this was going on before the upgrade, but I didn’t notice it before then.

To be fair, I did move the drives to a new case; and am now running a different board with dual processors. It’s POSSIBLE that since I didn’t do any testing at that point, that that is when the performance started to suck. I don’t think so, but I’m not ruling that out, because I don’t have hard data for that.

I've tried turning on Reconstruct write, and it had literally zero effect. (And it shouldn't matter for writes to the cache drive anyway). I’ve tried switching out cables on the SSD, and I’ve tried both SAS connection and SATA connection to that drive.

The performance is the same regardless of whether it’s writing to cache or a disk in the data pool, and it doesn’t matter if it’s a network transfer, or done locally using a Krusader in a docker, or Unbalance.

I pulled the diagnostics and in the log file (attached), I'm seeing a lot of references to a write error on disk 0 and disk 29. I've looked everywhere I know to look to find anything it would consider a disk 29, and can't find anything. I assume disk 0 is one of the parity drives? I googled the error and from the posts I've found, it's related to a bug with Unraid not being able to spin down SAS drives? Not sure if that's accurate, but it makes sense since I do have SAS drives in my machine.

I’m sure it’s just something stupid (and probably my fault), but I can’t for the life of me figure out what is going on with it to cause such slow writes.
 

I spent the past few days going through, cleaning things up, making sure only the data that I want on the cache drive is on there, and that there are no files stuck in limbo trying to transfer out of the cache due to insufficient space on a share. I feel like there isn’t anything glaringly wrong at this point, yet the issue persists.


I’m in the process of trying to clear off the smallest disk to try and reformat it from XFS to BTRFS because I’ve read that it can be SLIGHTLY faster, but I already know that it’s not going to make a meaningful difference (cache is already BTRFS anyway).
 

Desperate for any ideas at this point. Thanks in advance for any tips or ideas you guys might have!

266915982_ScreenShot2019-06-10at2_50_03PM.thumb.png.a196bde5102a7e8d8b45f6b7754da30b.png

syslog.txt

Link to comment

Really? That seems horrendous even for spindle drives to me. But yeah, even when transferring from data to cache, or over network to cache, the speeds are pretty bad.

Here's a Krusader screenshot from a test I just did moving to the cache drive. It peaked at 70MBPS for literally 2 seconds, before dropping down to 5MBPS and then crawling back up to this and staying right around 30-31 after that.610015663_ScreenShot2019-06-10at4_38_41PM.thumb.png.e1a4a0e025458681faaa4473902f1bb4.png

Link to comment

You are providing too little information for anyone to able to figure out the issue. Please attach full diagnostic .zip in your next post. How are the drives connected? To an HBA  card or to the MB ports? If connected to the MB, have you made sure it's connected to a SATA3 port and not a SATA2 port? Many MB have both, old ones in particular. When you say new/different MB with dual CPU I automatically assume older server grade MB/CPU, because that's what most dual CPU users around here use (what I've seen anyway). Unless you spent a lot of $$$.. :P And I have to guess since there's not any info.

 

You should also fix the spamming in your syslog, if it's the "SAS drive spin down bug" causing that like you said you should disable spin down for the parity drives if they are unable to spin down anyway. And disk 29 is the parity2 drive FYI. :)

  • Like 1
Link to comment

Cache-to-cache is actually significantly better. It bursts up as much as 400MBPS for a few seconds, and then seems to level off around 120-150MBPS.

I fixed the log spamming issue, I think. No change on transfer speeds though.

The board is a Gigabyte GA-7TESM board. All SATA connectors on this board are 3.0. It also has an onboard SAS controller, which I've connected with breakout cables. I have a breakout board for it as well, but currently am just plugged directly into the SAS port for most of the drives. It's an older server board, but it's still a pretty capable board. I don't see any reason why it wouldn't be capable of handling reasonable transfer speeds.

Then again, maybe I'm taking the wrong approach. Maybe I should've started by asking what other people are seeing as normal operating transfer speeds. Maybe I'm expecting too much from Unraid, but I don't think it's all that unreasonable to expect a measly 50MBPS from, for example, a drive that is seeing write speed benchmark averages in the low-to-mid hundreds, as reported here: https://hdd.userbenchmark.com/Seagate-Barracuda-720014-1TB/Rating/1849

And like I said, unless I'm crazy, I'm pretty sure I was seeing 50MBPS average on spindle writes, and minimum 130MBPS to the cache.

Am I crazy? Does Unraid really have that much overhead? Even with turning on "turbo write"? I get that there's a lot of overhead for parity, but this still seems slow, especially to see zero difference when using turbo write.

I've tested switching cables between the SAS controller and the SATA ports with the cache drive to ensure that the issue isn't cabling or the ports.

I've also got an LSI SAS controller card, and even a SATA 3 PCIe cart that I could throw in as well, to test, but that seems a little far-fetched, TBH.

Thanks for clarifying that disk 29 is the second parity drive. It occurred to me that it might be second parity, but I wasn't sure why those drives would be wanting to spin down, since I figured they'd be spun up full time when writes were happening.

Attaching the whole latest diagnostics zip file below.

Thanks for all the responses, guys.

tower-diagnostics-20190611-0238.zip

Link to comment

The reason I asked about the MB was mainly to know if the drives are connected to SATA2 or 3 ports. And you didn't answer @testdasi question about TRIM on the cache drive. Do you have it regularly scheduled or has it not been trimmed in a while? I don't have time to log at the diags right now, but let's do some simple performance tests without any network, docker containers and what not involved to rule out some things. Your cache drive should absolutely perform better then what your tests are showing.

 

SSH in or open terminal in the webui, navigate to a cache only share on the cache drive and run this command: dd if=/dev/zero of=file.txt count=5k bs=1024k

This will write a 5GB file to the drive and give you some stats, please post the output here. Run it 3-4 times to get the average and post the result. 

 

Do the same for the array drives, first directly to a single disk, then a user share which does not has the use cache set to: yes and post the results. I don't know if it makes a difference to write to a specific drive or a user share (it shouldn't), but it's fun to test anyway.

 

Also do the same test to the array with turbo write enabled. It was not clear from your posts but it seemed like you did most of the test on the cache drive, and turbo write only works on the array.

 

Been a while since I've done any test on my system but I remember vaguely something about the speed you say you've had earlier around 50MB/s to the array, without turbo write. The cache drive I think had around 3-400MB/s.

  • Like 1
Link to comment

Yes, sorry I overlooked the trim question. It's scheduled to trim nightly at midnight.

The testing to the cache was specifically because somebody requested some other screenshots, besides the work going on in unbalance.

I ran that command: 
dd if=/dev/zero of=file.txt count=5k bs=1024k
And returned:
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 3.33656 s, 2.2 GB/s

Ran it several times in a row and got exactly that same number all but once, and go 1.6 GB/s once.

Not sure how to run it in other directories. I tried dd if=/mnt/user0 of=file.txt count=5k bs=1024k to run it in the array without cache? But it says the file doesn't exist, so I'm not sure what path to point it to, since I don't fully understand what that command is doing in the first place. Not sure what I need to do different, but I don't want to get too creative and corrupt something.

Link to comment

In case it is not clear, the architecture of Unraid means that writing to parity protect array drives will always be significantly slower than you might expect from raw disk speeds because each ‘write’ is actually at least 4 I/O operations.    You first get a read of the relevant sector on both the parity drive and the array drive.    UnRAID then calculates the changed sector contents and writes the sector back to the data drive and the parity drive.   With dual parity you get a read and write operation on that drive as well.     The read operations can be run in parallel as can the write operations, but there will always be at least one disk rotation between the reads and writes.   

Link to comment
7 hours ago, wrenchmonkey said:

I ran that command: 
dd if=/dev/zero of=file.txt count=5k bs=1024k
And returned:
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 3.33656 s, 2.2 GB/s

Ran it several times in a row and got exactly that same number all but once, and go 1.6 GB/s once

No, that doesn't look right... The speed is way too high. Are you sure you entered a directory on the cache drive in the terminal, then ran the command? To me, it looks like you maybe ran the command on a directory which lives in RAM, which will give you that kind of high speed. And btw, this command does not test raw disk speed, it tests the actual write speed, and you will see both of the parity drives updating if writing to the array.

 

I'm not sure on how familiar you are with the command line, but you either need to use the cd command and type the path manually or you can do what I usually do when I'm not sure where I want to go (or I don't remember the path), use midnight commander (by typing mc and hit enter) to navigate to the right directory then quit MC by pressing F10, which will always leave you in the directory you where last in and then the path is filled in for you on the command line. 

 

So to simplify things use MC, navigate to the right path, quit MC then run the above command. And to able able to run the command on a specific disk in the array I think you need to enable disk shares. If you want to know exactly what the command does you can see this link which explains the use case pretty simple: https://skorks.com/2010/03/how-to-quickly-generate-a-large-file-on-the-command-line-with-linux/

 

Edit: About disk shares, if you've not used it before or are not completely aware of the "user share copy bug" it's best not to enable it. But if you do enable it, be sure to NEVER copy anything from a user share to a disk share and vice versa.
 

Edited by strike
Link to comment

Sorry, I'm relatively familiar with terminal, but haven't ever used that particular command. It was late and I overlooked the instruction to CD into the directory I wanted to test. I just kept it in the default directory. Trying again.

(Turbo Write Enabled)
 

In /mnt/user:
Average of 3: 351.6 MB/s

 

In /mnt/user0:
Average of 3: 98.86 MB/s

In /mnt/cache:
Average of 3: 710 MB/s

In /mnt/disk1

Average of 3: 102.3 MB/s


In /mnt/disk2:
Average of 3: 99 MB/s


In mnt/disk3:
Average of 3: 101.33 MB/s


In /mnt/disk4:
Average of 3: 102.33 MB/s


In /mnt/disk5:
Average of 3: 103.6 MB/s

All looking pretty good. Definitely more like what I'd expect to see from these disks.

I will re-run the tests with turbo write disabled and post results. It's kind of a time-consuming process.

Edited by wrenchmonkey
Link to comment

Disabled Turbo Write and re-ran the tests, with the exception of the user share and the cache share, since it would be redundant.

(Turbo Write Disabled)

In /mnt/user0:
Average of 3: 73.46 MB/s

In /mnt/disk1

Average of 3: 73 MB/s

In /mnt/disk2:
Average of 3: 76.86 MB/s

In mnt/disk3:
Average of 3: 77.46 MB/s

In /mnt/disk4:
Average of 3: 73.7 MB/s

In mnt/disk5:
Average of 3: 73.33 MB/s

Link to comment

I ran some tests myself and I got about the same speed as you. I did, however, see about -20MB/s difference in writes to the array from what that test was showing and the speed reported in the dashboard in the webui. I don't know which is more correct. Either way, the speed we're both seeing is about what can be expected in unraid. So based on your tests your disks is working fine (for writes) and any hardware issues can be ruled out. So that leaves us with software, network or any overhead that might be between the two and unraid itself. 

 

And all we did was test the write speed of the disks and since you were also copying from and to your array in your earlier tests we also need to test the read performance of your disks to rule out any disk problems. Because if there's an issue reading from one or more of your disks it will sometimes have a major impact on the write speed to the array (using turbo write) since it requires reading from all of the disks. It can also have an impact copying from your array to cache or any other disks outside the array and parity-check speed as well. You can test the read performance of your disks with the diskspeed docker container.

Edited by strike
Link to comment
  • 2 weeks later...
  • 3 weeks later...

Unraid version 6.6.6

 

I have samekind of problem. I tested write speeds with command provided above. I seem to have some problem on how Unraid chooses where to write new data. It seems like unraid won't like to write straight to Cache drive anything. My speeds on cache is ~500 MB/s and that's what I think it should be.

 

When I make share Use cache setting only or prefer my writing speeds at first peaks up to 500 MB/s then starts slowly decrease, and on 30seconds it's about 30-40 MB/s. That's same speed what I get when I write straight to array. I tried reconstruct write option and when it's on this first fast speed then slow thing happens when writing to cache. When reconstruct is off then it's the 30-40 MB/s speed.

 

I also think after testing the speeds with that command above my hardware is absolutely fine. I first thinked that updating Unraid would help, but samekind problem going on with 6.7.0   I will still wait with update.

 

Can someone point me where I can look is the Trim option enabled on my cache drive?

Link to comment
1 hour ago, JQNE said:

Can someone point me where I can look is the Trim option enabled on my cache drive

You need the Dynamix SSD Trim plugin installed to get automated trim capability.   When that is installed the relevant settings are under Settings -> Scheduler.

 

You should also try manually running the command

fstrim /mnt/cache

to check that no error is reported as not all controller/SSD combinations support trim.

Link to comment
On 7/13/2019 at 5:58 AM, itimpi said:

You need the Dynamix SSD Trim plugin installed to get automated trim capability.   When that is installed the relevant settings are under Settings -> Scheduler.

 

You should also try manually running the command


fstrim /mnt/cache

to check that no error is reported as not all controller/SSD combinations support trim.


Yeah, that's not it either. If it was a trim issue, then it would happen even during those write speed tests done in terminal. It doesn't.

I've been running trim nightly on the drive using the Dynamix Trim plugin since I set the server up originally. Running it manually, even with verbose flag, shows no errors.

Link to comment

Your cache pool in BTRFS, so I assume it in RAID1. Could you show below by command

 

- btrfs fi show /mnt/cache

- btrfs fi usage /mnt/cache

- btrfs device stats /dev/sd[x]1            ( Cache disk device path )

 

But I would like if you can,  try single SSD cache pool or downgrade to 6.6.X and check have different or not.

 

long time ago, I try use single SSD in cache pool or dual in RAID0, write speed suck but normal because that SSD really suck. Later also try NVMe, speed better a lot ~100MB - 300MB range or higher if multiple write, but always not consistence. Unraid haven't well perform with it. Now I haven't any SSD but use a 10 disks RAIDO, speed over 1GB/s and quite consistence.

 

For your case, pls also check

- SAS disk cache enable or not ( pls search old post ) 

- Network MTU setting ? I always use standard 1500 at all equipment for best compatible

- Turbo-write enable, not set auto

 

 

 

Edited by Benson
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.