Strange parity check speeds

June 22, 201313 yr

I am running 5rc10 and have been for some time. For the longest time I had a 2TB 7200rpm parity drive and 18 data drives of various sizes. My parity checks averaged ~8 hours and ~70MB/s. Speeds reported by the GUI during checks would range between 45-90 MB/s pretty much all the way through. About four weeks ago I added a 3TB seagate 7200rpm drive as my new parity and speeds seemed to get weird. Ultimately they ended ~12.5 hours and ~60MB/s, but the speeds reported during the check showed a lot of 20-30MB/s during the check (mostly during the first 50%), but I never thought too much about it.

Last week I added a 4TB Seagate 5900rpm drive as my new parity, moved the 3Tb to a data drive and started removing some my slower 500GB drives. I am now down to Parity plus 13 data drives for a total of 20.5TB (including parity). During the process of copying data and removing drives , I've run numerous parity checks and my speeds are still weird. Overall I get ~15 hours and 65MB/s, but the strange part is the the GUI shows speeds in the 20-35MB/s range for the first 50% which takes around 12 hours, shortly after it reaches 50% speeds jump to ~130MB/s and ultimately fluctuate between 90-130MB/s the remainder of the check which takes ~3 more hours. I have seen this repeated each time I've run a parity check.

What would explain this? The log shows nothing other than the check being initiated. I know there are a lot of variables, but what is the ballpark I should expect in terms of time? I thought removing 5 slow drives (hdparm -t returned ~75MB/s) would speed things up, but it didn't seem to change anything.

Quote

June 22, 201313 yr

Well ... this is definitely strange !!

First, as you undoubtedly know, the check speed can never be any faster than the slowest drive CURRENTLY INVOLVED in the check operation. i.e. if you had an 80GB drive in the array, it would be "involved" for the first 80GB of the checks ... then would no longer be "involved."

That explains why your speed ramps up so dramatically at the 50% point ==> when you get 50% of the way, you're past the 2TB point, so ALL of your previous drives are no longer involved ... only the new 3TB and 4GB drives are included at that point, and I'm sure they're both 1TB/platter drives with excellent areal density, so you see nice, FAST speeds.

However, since you checks used to take 8 hours for 2TB, then assuming you haven't swapped in a SLOWER drive (I'm sure you haven't), you should still get the first 2TB done in about that length of time -- NOT the 12 hours you indicated it's now taking to get to the 2TB point.

Have you moved any drives to a different controller? Most significantly, are you using a PCI controller card for any of the drives?

Have you looked at the SMART data for each of your drives to confirm that none have pending reallocations?

In any event, post a list of the specific drives you have in the array --- and how they're connected to the system [motherboard port; controller card (with make/model); port extender; etc.] And if ANYTHING has changed since before you started re-configuring the array, make a note of it.

Quote

June 22, 201313 yr

What are the hardware specifications? How are the drives connected?

Quote

June 22, 201313 yr

Author

Sorry. I should've included my specs in the OP. Here they are:

Supermicro X9SCM-F-O

Intel Xeon e31230

3 - IBM M1015 flashed to IT mode (all drives connected to these)

Running as a VM in ESXi with 4GB RAM

1 - Seagate ST4000DM000

1 - Seagate ST3000DM001

2 - WD WD20EARX

1 - WD WD20EADS

2 - Hitachi HDS5C3020ALA632

1 - WD WD10EAVS

1 - WD WD1EADS

1 - Samsung HD103SJ

1 - Hitachi HDT721010SLA360

1 - Samsung HD753LJ

1 - WD WD7500AAKS

Here are the results of a speed test from within unMenu (sdd is my parity drive):

/dev/sdb:
Timing cached reads:   20994 MB in  1.99 seconds = 10524.03 MB/sec
Timing buffered disk reads: 276 MB in  3.01 seconds =  91.56 MB/sec

/dev/sdc:
Timing cached reads:   20974 MB in  1.99 seconds = 10513.71 MB/sec
Timing buffered disk reads: 440 MB in  3.01 seconds = 145.95 MB/sec

/dev/sdd:
Timing cached reads:   21290 MB in  1.99 seconds = 10672.55 MB/sec
Timing buffered disk reads: 454 MB in  3.01 seconds = 150.89 MB/sec

/dev/sde:
Timing cached reads:   21000 MB in  2.00 seconds = 10526.31 MB/sec
Timing buffered disk reads: 270 MB in  3.03 seconds =  89.25 MB/sec

/dev/sdg:
Timing cached reads:   21630 MB in  1.99 seconds = 10843.30 MB/sec
Timing buffered disk reads: 316 MB in  3.01 seconds = 105.05 MB/sec

/dev/sdh:
Timing cached reads:   21402 MB in  1.99 seconds = 10730.46 MB/sec
Timing buffered disk reads: 316 MB in  3.01 seconds = 104.88 MB/sec

/dev/sdi:
Timing cached reads:   21572 MB in  1.99 seconds = 10814.00 MB/sec
Timing buffered disk reads: 374 MB in  3.01 seconds = 124.32 MB/sec

/dev/sdj:
Timing cached reads:   23530 MB in  1.99 seconds = 11798.51 MB/sec
Timing buffered disk reads: 492 MB in  3.01 seconds = 163.55 MB/sec

/dev/sdk:
Timing cached reads:   23304 MB in  2.02 seconds = 11534.92 MB/sec
Timing buffered disk reads: 304 MB in  3.01 seconds = 101.16 MB/sec

/dev/sdn:
Timing cached reads:   23840 MB in  1.99 seconds = 11954.75 MB/sec
Timing buffered disk reads: 320 MB in  3.01 seconds = 106.16 MB/sec

/dev/sdo:
Timing cached reads:   23472 MB in  2.05 seconds = 11453.04 MB/sec
Timing buffered disk reads: 268 MB in  3.00 seconds =  89.23 MB/sec

/dev/sdq:
Timing cached reads:   24188 MB in  1.99 seconds = 12129.78 MB/sec
Timing buffered disk reads: 278 MB in  3.00 seconds =  92.51 MB/sec

/dev/sds:
Timing cached reads:   23116 MB in  2.02 seconds = 11425.23 MB/sec
Timing buffered disk reads: 358 MB in  3.00 seconds = 119.20 MB/sec

/dev/sdt:
Timing cached reads:   23888 MB in  1.99 seconds = 11978.41 MB/sec
Timing buffered disk reads: 290 MB in  3.01 seconds =  96.32 MB/sec

EDIT: My most recent parity check finished this morning at 17 hours, 17 minutes with an average speed of 64.3 MB/s (according to Simple Features). Also I ran SMART test and reports on all my drives a few weeks ago and everything came back just fine. And the only drives I have added are the #TB And 4TB and I've only removed my slowes drives (all showed ~75 MB/s with hdparm -t).

Quote

June 22, 201313 yr

You may need to re-arrange the drives. How they are wired up could have some affect on it.

Consider the board first.

2x (x8) PCI-E 3.0*** slots,
2x (x4) PCI-E 2.0 in x8 slots

Now consider the cards.

3 - IBM M1015 flashed to IT mode (all drives connected to these)

So one of these cards has to be in the x4 slot.

Where your parity drive is could have an impact.

Make sure the parity drive is on one of the x8 slots.

Make sure the fastest drives are on the x8 slots.

In my experience the EARX and EADS are a little slow

but the WD10EADS would be slower.

I had the older WD10EACS and it ran really slow.

The HD103SJ is a pretty fast drive.

I suppose what I might do is put the slower drives on the x4 slot, not to exceed the 4 drives there if possible.

What could also help is isolating the parity drive on it's own card.

If you wanted to go that far. It worked for me and my bus architecture wasn't nearly as good as the X9SCM boards.

I.E. all data drives on the 2 x8 cards and just the parity and cache drive on the x4 card.

It's up to you. These are just ideas of where I would look.

Also tuning the unRAID driver with md_num_strpes and md_sync_window could help.

Many people have achived higher rates just by changing those numbers alone.

Start there!

Quote

June 22, 201313 yr

You have 13 drives ... so you only need 2 of the M1015's

I'd be sure they're both plugged in to the x8 slots, and connect all of the drives to them.

Then retry the parity test.

Quote

June 23, 201313 yr

Author

I just recently removed five drives from the array. So far I've only unassigned them, I haven't physically removed them from the server yet. I plan to rearrange them so they are all on the cards in the 8x slots when I remove the unassigned drives.

The reason for my post wasn't so much about how to make them better (though I certainly will try), but more about why the weird change with essentially no hardware changes (other than adding 2 faster drives and removing 5 slower ones). What could cause me to start getting such consistently slow speeds for the first 2 TB, and then really good speeds after? I would've have thought speeds would only increase.

WeeboTech - thanks for all the tips to potentially speed things up. I will certainly try those.

Quote

June 23, 201313 yr

The reasons for the much faster speed after the 2TB point are obvious ... as I noted earlier (and you clearly understand).

But I agree the slower speeds up to that point are VERY strange ... as I indicated before, it certainly shouldn't take longer to get to the 2TB point than it did before !!

One possibility is that the unassigned drives are somehow impacting the interface speeds to those in the array. I can't imagine just how this could happen ... but I'm not sure just what impact the ESXi virtualization may have on this. Try physically disconnecting those drives, and see if that makes a difference.

Quote

June 23, 201313 yr

Author

I'll try that. I'm also curious to try rc15a as I've read some have experienced improved write and parity speeds. Unfortunately, I am waiting for a compatible VMWare tools to keep my serve protected by UPS.

Quote

June 23, 201313 yr

You can download RC15a and add it as an option in your syslinux.cfg and test it out for speed.

FWIW, where you have drives connected and to what bus can matter.

Look at block diagram for the manual

http://www.supermicro.com/manuals/motherboard/C202_C204/MNL-1270.pdf

2 of the PCIe X8 and 1 PCIe x4 is direct to the CPU

The other X4 is connected to the Cougar Point controller by an X4 path.

Without knowing what drives are connected to where, and where the two new fastest drives are on your bus configuration, it's hard to estimate.

The fastest drives easily get 150-190MB/s on the outer tracks. Who knows if they are being choked by other drives.

The additional buffering provided by expanding the md_num_stripes and md_sync_window could come into play here.

Quote

June 23, 201313 yr

Author

That's some good info on the mobo. I didn't realize that one of the 4x slots was different. I'll have to pull my server out and check that while I am removing the unused drives and making sure the existing ones are all connected to cards on the 8x slots.

How would I go about adding RC15a to my syslinux.cfg? So I would have it installed along side my RC10?

Quote

June 23, 201313 yr

Post a syslog. A bad or loose cable could slow things down. There should be some indication in the syslog.

Quote

June 23, 201313 yr

Author

Here it is, but it's truncated for some reason. The:

Jun 21 04:40:01 WOPR syslogd 1.4.1: restart.

...is not when I last rebooted or restarted my unRAID vm.

syslog-2013-06-22.txt

Quote

June 23, 201313 yr

Look in /var/log/ for the rest of the log. If the log is being rotated there is something going on.

Quote

June 23, 201313 yr

Author

Look in /var/log/ for the rest of the log. If the log is being rotated there is something going on.

That log is the same.

Quote

June 23, 201313 yr

Look in /var/log/ for the rest of the log. If the log is being rotated there is something going on.

That log is the same.

Look for syslog.1 or syslog.2 in /var/log.

Quote

June 25, 201313 yr

Author

I opened up my server and made sure that all my unRAID drives are plugged in to two of the M1015's which are both in the 8x slots on the mobo. I then removed the unused drives so that only the 14 I am using in unRAID are actually plugged in and ran another parity check.

This time it was 19 hours 4 minutes at 58.3 MB/s as reported by Simple Features.

So it went even SLOWER this time. I wasn't able to watch this on as closely, but it appeared to run similarly; 25-35 MB/s for the first 50% and then it get much faster 90-115 MB/sec. One thing that struck me as odd though was when I checked on it a couple times between 80-90% finished and every drive was spun down except for parity. I thought unRAID was always reading from one of the data drives while doing a parity check. I can;t imagine what the problem could be. I've attached a log of the last run.

syslog-2013-06-24.txt

Quote

June 25, 201313 yr

Simplefeatures will cause very slow parity check speeds.

Quote

June 25, 201313 yr

I thought unRAID was always reading from one of the data drives while doing a parity check.

Not if the parity drive is larger than all the other drives. So with a 4TB parity drive, but no 4TB drives, there will be a long stretch at the end where all the parity check is doing is confirming the parity drive values are correct. (the last TB in your case, since you have a 3TB data drive)

Quote

June 25, 201313 yr

Just for grins, boot to the new "Safe Mode" and run a parity check.

[You'll need to upgrade to RC15a if you didn't do that yet]

Quote

Strange parity check speeds

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)