Large Unraid Server getting too slow


Recommended Posts

I have an unraid server with 13 4TB drives (one parity) and 9 2TB drives totaling 66TB, of which 44.6TB is already in use.  Its running 6.1.9 on a 8 core  HP Poliant ML150 G5 with 16GB of ram.

 

The data is a mixture of video, music, photos, and computer backups across 45+ shares.  Docker is used to run various services from sickbeard, sabnzb, syncthing, and plex (and more).

 

I have tested direct dd's to /mnt/disk* and when booted, but docker not running and I get 100-500MB/s write and 90-180MB/s reads across all the disks.  Once I startup docker and plex, this access drop considerably to 30-50MB/s write and 40-90MB/s read.  All this is still acceptable.  But once Plex (or other service) begins to use the system, access speed both in and outside the host drop down below 2MB/s. 

 

It was not like this 2-3 months ago.  In that time we moved to add in the 2TB drives in batches due to the existing 4TB drives filling up.

 

What information would you need from the host to assist with some kind of investigation?

 

I am still seeing really crappy performance with just Plex running.

 

 

Link to comment

Thank you Squid.  Posted.

 

Additionally, at the time of generating the diags, I got the followging from outside the server:

 

wmason@vmurphy /smb/tower2/public $ echo "--- Write ---"; dd if=/dev/zero of=test.txt bs=4096 count=250000; echo "--- Read ---"; dd if=test.txt of=/dev/null bs=4096; rm test.txt
--- Write ---
250000+0 records in
250000+0 records out
1024000000 bytes (1.0 GB) copied, 363.179 s, 2.8 MB/s
--- Read ---
250000+0 records in
250000+0 records out
1024000000 bytes (1.0 GB) copied, 1453.64 s, 704 kB/s

tower2-diagnostics-20160704-1302.zip

Link to comment

I also procured a vmstat to give an idea of what the system is doing:

 

root@tower2:/etc# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
5  0      0 2140428     48 9273084    0    0   532   641    1   31 12  9 72  8
13  1      0 2117224     48 9302640    0    0    62  4663 4230 2948  2  5 90  2
33  0      0 2083080     48 9336400    0    0     0  6080 5318 3220  2  7 92  0
25  1      0 2053052     48 9363336    0    0   566  3935 3664 2602  1  6 93  0
11  0      0 2029956     48 9388600    0    0    94  4208 3792 2759  2  6 92  0
34  0      0 2001512     48 9414448    0    0     0  7152 4157 2472  1  5 93  1
11  0      0 1970884     48 9450556    0    0     0  5868 5769 3787  2  9 89  0
10  1      0 1942556     48 9481420    0    0     0  7294 4826 2938  1  6 93  0
5  1      0 1900876     48 9522368    0    0   922  4627 5177 4974  1  7 78 13
13  0      0 1843896     48 9579692    0    0   819  7218 8333 5644  1  9 75 15
14  0      0 1810436     48 9611488    0    0     0  4664 4907 3101  4  8 88  0
17  0      0 1778544     48 9637212    0    0     0  9111 3469 2219  4  9 87  0
9  1      0 1747104     48 9673564    0    0     0  5714 5640 3645  3  8 87  1
29  0      0 1706232     48 9715440    0    0     0  6048 5408 3606  2  6 92  0
14  0      0 1678652     48 9743748    0    0     0  4920 4509 2839  1  5 94  0
12  0      0 1617516     48 9804792    0    0     0 12758 10160 6420  2  9 90  0
22  0      0 1578580     48 9837504    0    0     1  5614 5106 3061  2  8 90  0
7  0      0 1552076     48 9867960    0    0     0  4587 4903 2908  2  7 91  0
9  0      0 1520860     48 9900044    0    0     0  4920 4920 2863  1  5 94  0
24  0      0 1496968     48 9921084    0    0     0  8282 3519 2175  2  7 91  0
28  0      0 1440872     48 9982556    0    0     0  8303 9786 6776  2  9 89  0
21  0      0 1390520     48 10030372    0    0     0  7304 7938 4720  2  9 89  0
23  0      0 1350520     48 10068960    0    0     0  5945 5343 4081  3  6 90  0
15  0      0 1348660     48 10068748    0    0     0  6023 3050 2585  2  5 93  0
8  0      0 2246908     48 9182608    0    0     0  3062  563 1118  2  4 94  0
11  0      0 3339440     48 8093028    0    0   819     0  692 1313  1  3 91  4
17  1      0 3335024     48 8093028    0    0     0     1  713 1757  1  3 95  0
11  0      0 3332448     48 8101348    0    0   819     5  743 1719  1  2 87 10
7  0      0 3318888     48 8110316    0    0   922     0  582  942  1  3 88  8
17  0      0 3321708     48 8110316    0    0     0  4688  923 2537  1  3 96  0
10  0      0 3327796     48 8110312    0    0     0     0  540 1607  1  2 97  0
12  0      0 3324976     48 8110312    0    0     0     0  734 2201  1  3 96  0
25  0      0 3321320     48 8110312    0    0     0     7  631 1961  1  2 97  0
9  0      0 3325232     48 8110312    0    0     0     1  811 2664  2  3 95  0
14  0      0 3317536     48 8110316    0    0     0     0  588 1632  3  3 94  0
7  0      0 3320092     48 8110316    0    0     0  6542  998 2491  3  4 93  0
10  0      0 3319120     48 8110316    0    0     0     0  345  944  1  2 97  0
8  0      0 3322328     48 8110344    0    0     0     0  678 1765  1  3 96  0
15  0      0 3321760     48 8110344    0    0     0     0  643 2051  1  2 97  0
8  0      0 3316624     48 8110352    0    0     0     5  738 2447  1  3 96  0
21  0      0 3317504     48 8110352    0    0     0  4927 1216 4573  2  5 92  0
11  1      0 3314004     48 8110352    0    0     0  1404 1202 4129  2  5 93  0
14  0      0 3484412     48 7942640    0    0     0     0 1317 4352  2  5 93  0
20  0      0 3454788     48 7972976    0    0     0     0 2005 5556  2  5 93  0
14  0      0 3429828     48 7996696    0    0     0     0 1236 2809  1  4 95  0
8  0      0 3419976     48 8004936    0    0     0     5 1458 4544  4  5 91  0

Link to comment

A few thoughts -

 

* You have IDE emulation turned on for your onboard SATA drives.  When you next boot, go into the BIOS settings and look for the SATA mode, and change it to a native SATA mode, preferably AHCI, anything but IDE emulation mode.  It should be slightly faster, and a little safer.  Well, that is, it would be if you were using them!

 

* You have 3 SAS cards (although 1 or 2 might be onboard), it can sometimes pay to make sure their firmware is updated.  If any are onboard, then check for the latest BIOS.

 

* The ps report shows rather heavy CPU usage, especially Plex, but numerous others too.  That may be dragging down the system.  You do have 8 cores though, so *if* they are spreading the load properly, then it shouldn't be a problem.  I'd certainly want to make sure that Plex isn't using CPU 0.

 

* At the moment, it looks like a recovery is in process (parity check or build, or drive rebuild), which will dramatically drop I/O speeds down.

 

* Your disk tunables are somewhat higher than defaults, which is going to use extra memory.

 

* With 22 drives and an older machine (slower busses), you're pushing a lot of bits through older pipes, and there are going to be certain bottlenecks.  I'm sure that HP wasn't designed with 22 drives in mind!  The best pipes and ports are usually the 6 onboard SATA ports, none of which are in use.  Taking 2 drives off each SAS chipset could lighten their I/O load, possibly allow faster speeds per drive.  I could be wrong here though, you'll have to experiment.

 

* A Parity Check begins -

Jul  1 15:27:23 tower2 kernel: mdcmd (78): check CORRECT

Jul  1 15:27:23 tower2 kernel: md: recovery thread woken up ...

Jul  1 15:27:23 tower2 kernel: md: recovery thread checking parity...

 

* Mover runs every hour, slowing down the parity check.

 

* At Jul  3 07:11:16, a large number of parity corrections begin, no way to tell, but I suspect they continued from this point on.  Doing rough calc's, I think it is at the 3.3TB point of the parity check.  At this point, there are over 1.3 million writes to the parity drive, so I assume that means a huge number of parity corrections happening.  That is of course resulting in huge slowdowns.

 

* At Jul  4 02:14:05, timeout on ata11.00, during parity check and during heavy Move that took over 26 minutes.

 

* Syslog ends at Jul  4 13:00:23, but parity check has not completed yet!  I see a few 'drop_caches' near the end, so you were probably doing some speed testing.  But at this time, with the parity check still in effect, I/O would be terribly slow!  Any testing would slow it down even further!

 

I don't know for sure, but I believe there's enough evidence above to explain the slowness.

Link to comment

Yeah, our parity rebuild is getting tops 7MB/s and as low as 2MB/s and expecting about 5-7 days to finish.  The reason for all the parity corrections is because I had stopped the array and did a xfs_repair to ensure all the drive filesystems were clean.  This would invalidate the parity.  But I needed the raid down because my speed with it up it would have taken days to check all the drives.

 

I did get some metadata crc errors that I'm still looking up how to fix.  If you have ideas on that let me know.

 

I'll ensure plex will stay off CPU0 specifically, docker options are a wonderful thing.  I do think Plex is having a hard time because of the slow speed.  Mind you, my Plex DB is 400MB even after a DB optimize.

 

I'm also leaning down the path that we're overloading the FSB on here.

 

The mover is running every hour because we have a small SSD as the cache drive.  That's since been corrected.  I moved one of the WB black drives to the cache drive (new config, reconfig).  So I can likely put that back on a 24hr schedule.  I also disabled some of the plugins for now.  Its helping a little, but not much. 

 

Testing before I restarted the parity or Plex showed about 40-50MB/s, after Plex about 14-16MB/s, after Parity 7-8MB/s (all over network to a smb share with cache enabled).

 

I'm waiting for the parity to finish and I'll test again.  But likely I may need to investigate a new MB if the FSB is maxed out.

 

 

Link to comment

Yeah, our parity rebuild is getting tops 7MB/s and as low as 2MB/s and expecting about 5-7 days to finish.  The reason for all the parity corrections is because I had stopped the array and did a xfs_repair to ensure all the drive filesystems were clean.  This would invalidate the parity.  But I needed the raid down because my speed with it up it would have taken days to check all the drives.

If the array had been put into maintenance mode, then you could have run xfs_repair against the drives (using the /dev/md type devices) without invalidating parity.

Link to comment

Yeah, our parity rebuild is getting tops 7MB/s and as low as 2MB/s and expecting about 5-7 days to finish.

 

That's a very low speed, like Rob said in his excellent analysis you're using a old system with slower buses, still does not explain such low speed, besides the server being slow for days during a parity check, if you upgrade or replace a disk you'll be with an unprotected server for a week.

 

 

 

 

Link to comment

The Core system might be older, but none of the drives are running on the SATA controllers. There are 3x 8 port SAS 6G HBA cards. The Parity drive is a WD SE Enterprise drive. More than enough power to handle as the parity. The main point to the speed problem is that on the same system before we start plex we'll get 45-60MB/sec parity rebuilds, but as soon as plex starts it goes down to 5-7MB/sec. There are some higher CPU utilization, but nothing maxing out all of the CPU cores.

 

M/B: HP - ProLiant ML150 G5

CPU:2x Intel® Xeon® CPU E5410 @ 2.33GHz

Memory: 16384 MB (max. installable capacity 16 GB)

Network: eth0: 1000Mb/s - Full Duplex

Kernel: Linux 4.1.18-unRAID x86_64

 

Link to comment

I found some specs on that HP, and it apparently is first generation PCIe.  That means the 6G speed won't be available, but ideally 3G will, per drive.  The board appears to support 5 PCIe cards with 8x connectors, but only 2 with an x8 bus, the others are x4.  So an 8 port card on an x8 slot should get 8 lanes, 1 per drive at roughly 3G speeds.  But any card in a slot with x4 will get only half, so an 8 port card would get only SATA 1.5gbps speed per drive.  You will want to make sure your faster cards are in the 2 x8 slots.  If there are 8 drives on a card in a x4 slot, they might actually be faster on a motherboard SATA port, which is also 1.5 gbps.  I couldn't find info on how many lanes that board actually supports, although you would think it supports more than normal with a dual CPU.  But you need another lane for the onboard networking, and more for the graphics, and one or more for miscellaneous, like USB ports etc.  There are clear bottlenecks here.  Having fast SAS cards only helps when the rest of the infrastructure can support them, can keep up with them, and right now I suspect your driving Ferrari's in a school zone in a residential area.  All that speed is wasted.  The data has to come off those drives and cards, then join the common slow paths on the board.

 

An Enterprise drive doesn't mean much here, it just means you pay more and hope it lasts longer.  It's not faster or more powerful.

 

I haven't used Plex, but I understand it likes to run a heavy I/O scan.  You have plenty of CPU and other things, but what you don't have is a lot of bus bandwidth.  When you add a high I/O process to already saturated I/O pipes, there's even more waiting.  But what is worse, each source of I/O is causing interruptions to the other, forcing head movement back and forth, and because that causes it to miss its ideal window for the next read, another rotation has to happen, which really slows down the I/O.  I think that the fact each heavy I/O process added slows the speeds down just proves that the system is badly I/O bound.  Feel free to correct me, if I got anything wrong.

Link to comment

I also don't use Plex but if you're trying to run a parity check at the same time that Plex scans your media, it's normal for it to get much slower, you should let Plex finish the any scan before doing a parity check, also I think there's a time setting to scan for new media, set it higher.

 

You can install the Dynamix System Stats plugin, start Plex, check for disk activity and wait for it to stop before starting a parity check.

 

I can't find a block diagram for your board but both top PCIe x8 slots should be CPU/NB slots and the rest are almost certainly SB/DMI slots, you want to use your two fastest controllers with 8 disks each on the x8 slots, leaving only the remaining 6 using the DMI, since it's 1st gen DMI it's going to be your main i/o bottleneck.

 

With the above config and maybe a little tuning the parity check should start close to or above 100MB/s.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.