File sharing service become unresponsive at end of parity check


klamath

Recommended Posts

Howdy,

 

I have been battling this for a while.  Over the holidays I accidentally fubared my super.dat while rolling back from 6.3 to 6.2 (don't computer before coffee).  I was in the process of upgrading parity and a few data disks to new HE8 drives.  After getting everything resettled and working I started to notice weird link speeds with the HP SAS expander, randomly it would show link speeds of 1.5 on some drives, after replacing the SAS expander with a new Intel one all drive speeds now report correctly.  This last week I added another Intel SAS expander and replaced all SAS cables and 9211-8i card and my speeds doubled.  So far so good, when running parity checks I am avg 81MB/s with 28 drives, everything works however im seeing process unraidd using 100% of CPU0, seems to be all in system wait.  Things all appear to be working fine, however when the parity crosses over to the 8TB drives NFS and SMB stop responding, I see errors emitted on NFS clients reporting timeouts waiting for server to respond.  I have been trying to figure out why the 8TBs are causing issues.

 

-All interrupts for 9211 are on CPU0, I changed affinity of that IRQ to allow scheduling on all cores, doesn't seem to help according to /proc/interrupts.

-IRQ issues?  It seems that from lspci -v both my USB controller/9211 and 10gb all share IRQ 10, however /proc/interrupts shows them on different interrupts.

-Same issues seen in 6.2 and 6.4 release.

-nr_requests set at default 128, however increasing to 512 seems to have a speed increase.  The Max queue depth for SAS 2008 is 3200, should those match per port?

 


Drives all check out, no pending sectors or other indicators of an issue, nothing in syslog showing up during these slow downs. 

 

Running out of ideas at this point as to why unraidd is taking up so much CPU during a parity check and if that has anything to do with NFS going unresponsive.  Once parity check is canceled all file sharing services start responding as normal.

 

Any help would be appreciated,

 

Tim

 

orion-diagnostics-20180128-1331.zip

Link to comment

High CPU usage with dual parity on larger arrays is normal, much more so than with single parity, you're using an older CPU with low single thread performance, I did some tests a while a ago and for the CPU not being a bottleneck on such a large array you need a CPU with a single thread rating of around 2000 passmarks, now if that is a reason for NFS going unresponsive I'm not sure but suspect that it is.

Link to comment
2 minutes ago, klamath said:

Using a Xeon 5560, not sure passmarks score but googling shows around 5400

1357 single threaded.

 

2 minutes ago, klamath said:

is there a reason why when check is running with 28 drives everything is responsive, however when checking the 8TB drives things become unresponsive?

None I can think of, but with dual parity the CPU usage should remain about the same all the way through, since parity2 is still calculated as if the disks are all there, it's just zeros for those.

Link to comment

So thinking out loud, if CPU was an issue with the amount of drives is the issue NFS should stop responding when checking all 28 drives, not the 5 HE8 drives.  So im thinking maybe the tunable testing script might be not factoring in the entire run into the recommendation.  Speeds jump to 100+MB/s once system only checks the He8 drives.  Think returning values to default will help?  The only system that looks like it would work is the Dell T130, fits price point well and gives good CPU numbers.

 

Tim

Link to comment
  • 2 weeks later...

@klamath @johnnie.black I'm also running into this issue. I've got 22 drives in my array, most are 4-6TB, 1x8TB, and dual 8TBs in parity.

 

Your last message is confusing, did you make a change that made a difference? My NFS files access is basically *useless* when running a parity check. I seem to be capped at 75MB/s throughout the whole check as well, so it's slowwwww. My CPU never goes above 40% either.

 

Also, this is on 6.4.1, so the UI is still responsive during checks, just NFS access is atrocious.

 

Thanks

Edited by Drewster727
Link to comment

I converted my Norco case into a jbod, I had a super micro motherboard with a X5570 CPU all inside the norco case to begin with.   I just bought a Dell T130 with a Xeon 1270 v5 and i went from 65MB/s party check speeds to 100-150MB with default tune, no modifications to the config at all.  The system did a party check, start to finish without any hiccups on the network with NFS, no client reported any timeouts at all, plex (a major subscriber to NFS) didn't register any issues at all during parity check.  @Drewster727 In my case I install nmon and saw the unraidd process logged most time in System Wait during a parity check, another interesting thing between systems i noticed is my SAS card's interrupts are now spread across all cores vs old system having all interrupts assigned to core 0.

 

Hope this helps a little bit!

 

Tim  

Link to comment
5 hours ago, Drewster727 said:

@klamath @johnnie.black I'm also running into this issue. I've got 22 drives in my array, most are 4-6TB, 1x8TB, and dual 8TBs in parity.

 

Your last message is confusing, did you make a change that made a difference? My NFS files access is basically *useless* when running a parity check. I seem to be capped at 75MB/s throughout the whole check as well, so it's slowwwww. My CPU never goes above 40% either.

 

Also, this is on 6.4.1, so the UI is still responsive during checks, just NFS access is atrocious.

 

Thanks

 

We'd need your diagnostics to have any opinion on the problem, but likely a not powerful enough CPU.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.