[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)


TODDLT

Recommended Posts

Yes, it is a Celeron 450.

Is there no other way as of switching the CPU?

 

I believe if you are seeing 100% CPU usage it’s your only option, it doesn’t have to be an expensive CPU, any dual core close to 2Ghz or above should be enough.

 

The CPU is not limiting I/O operation. I have a Celeron G1840 and it handles it just fine. There is an underlying issue with reads during a parity check.

First off guys, I want to say that I'm so happy there is finally a single thread that seems to be gaining traction with discussing this issue.  Several threads have mentioned it before, but this one seems to be on a good track. 

 

I have an old slow processor.  A single core celeron 440 purchased circa 2008.  In Unraid 6.0 b14 processor utilization hovers roughly between 45 and 50% during a parity check.  Each version of unraid I've tried after 6.0 B14, cpu utilization gets pegged at 100% during the parity check, and it processes at roughly a third of the speed it normally would. 

 

A used\refurbished dual core processor that should work in my motherboard could be had for $20 or less, and I was seriously thinking about trying that before seeing this thread.  I really need to save up the cash for a complete system upgrade and I would prefer not to mess around with upgrading the processor in my 7.5 year old server if I don't have to. 

 

Also, I don't know if it makes a difference or not, but I have a SASLP, not a SAS2LP.  My motherboard is an Intel BOXDP35DPM.  I have no plugins(anymore) or dockers on my system.  Just running base unraid. 

Link to comment
  • Replies 453
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I have an old slow processor.  A single core celeron 440 purchased circa 2008.  In Unraid 6.0 b14 processor utilization hovers roughly between 45 and 50% during a parity check.  Each version of unraid I've tried after 6.0 B14, cpu utilization gets pegged at 100% during the parity check, and it processes at roughly a third of the speed it normally would. 

 

A used\refurbished dual core processor that should work in my motherboard could be had for $20 or less, and I was seriously thinking about trying that before seeing this thread.  I really need to save up the cash for a complete system upgrade and I would prefer not to mess around with upgrading the processor in my 7.5 year old server if I don't have to. 

 

Also, I don't know if it makes a difference or not, but I have a SASLP, not a SAS2LP.  My motherboard is an Intel BOXDP35DPM.  I have no plugins(anymore) or dockers on my system.  Just running base unraid.

 

You appear to have the same single core Celeron issue, maybe Tom can say what change in beta 15 but you can easily fix your problem by upgrading to a dual core CPU, I once used a dual core Celeron E1200 1.6Ghz and it was enough, at least for 8 disks.

 

If you change the CPU and your SASLP is fully loaded it’s going to limit your parity check speed to about 75-80Mb/s.

 

Link to comment

Johnnie, i think you missed something important in that post.

 

6.0b14 the celeron 440 was 45-50% during parity checks, then after it was 100%, meaning something in software changed (driver, kernel, unraid code, who knows).

 

This shows the CPU is capable of handling the I/O without issue.

 

Case in point. During a boot the kernel does several alogorithm checks on the processor to determine which method to use for xor calculations.

 

My celeron G1840:

 

Jun 28 09:27:39 Tower kernel: raid6: sse2x1    8511 MB/s
Jun 28 09:27:39 Tower kernel: raid6: sse2x2   11046 MB/s
Jun 28 09:27:39 Tower kernel: raid6: sse2x4   13101 MB/s
Jun 28 09:27:39 Tower kernel: raid6: using algorithm sse2x4 (13101 MB/s)
Jun 28 09:27:39 Tower kernel: raid6: using ssse3x2 recovery algorithm

 

 

Link to comment

Only Tom or someone else from Limetech can say what changed in beta 15 and if it can be changed back, but for V6 or V6.1 final anyone with a single core Celeron will be highly limited during a parity check.

 

Maybe what changed in beta 15 is also causing the slow check with the SAS2LP, can 6beta14 still be download from anywhere so I can do a quick test?

Link to comment

Only Tom or someone else from Limetech can say what changed in beta 15 and if it can be changed back, but for V6 or V6.1 final anyone with a single core Celeron will be highly limited during a parity check.

 

Maybe what changed in beta 15 is also causing the slow check with the SAS2LP, can 6beta14 still be download from anywhere so I can do a quick test?

Going from memory...

 

PreEmptible Linux Kernel was introduced then.

That was to fix Kernel Panics crashes in typical circumstances like Parity Checks.

Link to comment

Too frequent polling of the smart attributes could be a cause.

Frequent emhttp page refreshes could also be a cause. I disable the page updating frequency.

 

in Tunable (poll_attributes): in the DiskSettings page I have 1800

 

Perhaps upload a diagnostic zip archive for someone to review the logs and settings.

Link to comment

Going from memory...

 

PreEmptible Linux Kernel was introduced then.

That was to fix Kernel Panics crashes in typical circumstances like Parity Checks.

 

I think you are onto something, from beta 15 release notes:

 

* Preemptible kernel.  This should solve the 'RCU Timeout' errors seen by some h/w configurations.  Though additional overhead comes with a preemptible kernel, the general "response" will be much smoother, especially within VM's.  Some users may experience a slight decrease in parity sync/check times, let us know your results.

 

I sent an email to support asking for a link to b14 and b15 so I can try it on my test server, will report back if I get them.

Link to comment

Going from memory...

 

PreEmptible Linux Kernel was introduced then.

That was to fix Kernel Panics crashes in typical circumstances like Parity Checks.

 

I think you are onto something, from beta 15 release notes:

 

* Preemptible kernel.  This should solve the 'RCU Timeout' errors seen by some h/w configurations.  Though additional overhead comes with a preemptible kernel, the general "response" will be much smoother, especially within VM's.  Some users may experience a slight decrease in parity sync/check times, let us know your results.

 

I sent an email to support asking for a link to b14 and b15 so I can try it on my test server, will report back if I get them.

 

Johnnie, send me a PM with your email. I will post both to my onedrive.

Link to comment

Thanks to Mr-hexen for sending me b14 and b15.

 

The preemptible kernel may be the cause of the single core Celeron slowdown but it not the cause of the SAS2LP slowdown.

 

I did however find on the net v6beta1, 2 and 3 and in all 3 the SAS2LP is back to normal speed, so it was a change between beta 3 and beta 14, I can’t find those versions so if Mr-Hexen or anyone else can send me a link to the betas in between so I can test and pinpoint the version here the issue begins I will be very grateful.

 

tBpD3QJ.jpg UtbxTBB.jpg

Link to comment

I've been a little distracted for a couple days but coming back, I'm happy to see so much discussion.

 

- I have a 2 core processor (see signature configuration).  Not a big one, but a 2 core.

- I reported that during a parity check my CPU was ramping up and down, not running smoothly and was at a much higher usage than I previously remembered seeing during a parity check.  However, check out the two attached graphs which were snapshots with the server at rest.  I had noticed on the dashboard the CPU was ramping up to 60% and back down pretty routinely.  Something is happening in the background.  Also it simultaneously affects network usage.  I'm certain, the only thing that could be accessing the server is my win 7 machine and it's not doing anything intentionally, but there are drives mapped from here to there. 

 

1. whatever is running is happening that is taxing the CPU could affect a parity check by clogging up the CPU.  yes/no?

2. (this will show my lack of knowledge of the inner workings of MB architecture)  would a PCIe 2.0 x 8 card have a higher taxing overhead on the system as compared to a PCIe x4 wtih the same througput?  Would this overhead hit the CPU?  Just a thought, combined issue.

 

The first two images below are while the server is at rest.  the last one is during a parity check.

 

I may have some time tomorrow and will roll back to V5 and see what all this looks like.  If I should check an earlier version of V6 let me know.

2015-09-05_CPU_Capture.JPG.1f53399a265e7e3703de53b6d3882f42.JPG

2015-09-05_network_Capture.JPG.c84b9d52380d232365b05f114cdb5428.JPG

during_parity_check.JPG.d5d90f082ba33ca573a526f0e95288fe.JPG

Link to comment

1. whatever is running is happening that is taxing the CPU could affect a parity check by clogging up the CPU.  yes/no?

 

I remember having background CPU spikes like those when I turned “Scan user shares” to Yes on the Cache Dirs plugin, are you using it?

 

2. (this will show my lack of knowledge of the inner workings of MB architecture)  would a PCIe 2.0 x 8 card have a higher taxing overhead on the system as compared to a PCIe x4 wtih the same througput?  Would this overhead hit the CPU?  Just a thought, combined issue.

 

With everything running normally, for the same system, higher parity check speed = higher CPU utilization, as long as it not pinned at 100% it’s normal.

See attachment below, you can see the correlation between CPU load and parity check speed as the speed goes down going to the drives inner tracks.

 

 

If I should check an earlier version of V6 let me know.

 

You can try beta 14 and 15 with the SASLP (if you still have it) and check if there’s any difference with CPU utilization, I suspect that with the SAS2LP it will be similar with both betas because the card is limiting more than the CPU. PM me if you want them.

sc_2.png.3e361e18d0170282aa7724152fc21c0b.png

Link to comment

Be careful with the beta 5-8 range, as somewhere in there was a nasty Silent ReiserFS Filesystem Corruption bug. Be sure to checkout the announcement threads to see which exact range(s) you want to avoid.

 

unRAID6-beta7/8 POSSIBLE DATA CORRUPTION ISSUE: PLEASE READ.

http://lime-technology.com/forum/index.php?topic=35161.msg327070#msg327070

 

This bug was silent and destructive.  People had difficulty converting to XFS or even using the array if meta data was corrupt.

Avoid these versions at all costs.

 

Thanks for the warning, I remember there was that problem with some of the earlier betas but I will be using them only on my test server with no data.

 

I emailed support to ask for betas 4 to 13, while I wait for a response if someone has any of these versions and can send me a link I’ll be very grateful.

 

Link to comment

Update:

 

Found v6 beta6, thanks google!

 

SAS2LP slowdown is present, so the issue was introduced in beta 4, 5 or 6, looking at the release notes of these betas this stands out:

 

 

 

Summary of changes from 6.0-beta5a to 6.0-beta6

-----------------------------------------------

  CONFIG_SCSI_MVSAS_TASKLET: Support for interrupt tasklet (improves mvsas performance)

 

 

Could it be that instead of improving performance this is causing the issue with the SAS2LP, at least for some users?

 

I can’t find beta 5, anyway to disable this option to confirm?

 

Link to comment

Update:

 

Found v6 beta6, thanks google!

 

SAS2LP slowdown is present, so the issue was introduced in beta 4, 5 or 6, looking at the release notes of these betas this stands out:

 

 

 

Summary of changes from 6.0-beta5a to 6.0-beta6

-----------------------------------------------

  CONFIG_SCSI_MVSAS_TASKLET: Support for interrupt tasklet (improves mvsas performance)

 

 

Could it be that instead of improving performance this is causing the issue with the SAS2LP, at least for some users?

 

I can’t find beta 5, anyway to disable this option to confirm?

 

Dont know if the is a runtime setting to override this.

But that setting is baked into the kernel.

Link to comment

Update:

 

Found v6 beta6, thanks google!

 

SAS2LP slowdown is present, so the issue was introduced in beta 4, 5 or 6, looking at the release notes of these betas this stands out:

 

 

 

Summary of changes from 6.0-beta5a to 6.0-beta6

-----------------------------------------------

  CONFIG_SCSI_MVSAS_TASKLET: Support for interrupt tasklet (improves mvsas performance)

 

 

Could it be that instead of improving performance this is causing the issue with the SAS2LP, at least for some users?

 

I can’t find beta 5, anyway to disable this option to confirm?

 

I'm about to publish a patch release: 6.1.1 - what do you think?  Should I remove this option in this release?

Link to comment

Thanks for your report.  I modified my previous post to use '1b4b' instead - that's the proper vendor-id for that card.

 

Also, in your report notice the

 

Subsystem: Marvell Technology Group Ltd. Device 9480

 

That's what we're looking for.  9480 is the subdevice id of the 'older' cards.  The value 9485 is the 'new' subdevice id.  Maybe those are the ones with this issue?  Let's see...

Now that's interesting!  We recently had a user (3blackdots) with a 9485 that had the Marvell bug (no drives seen), post is here with my bug confirmation after it.  A few posts up, user opentoe had a 9480 where the drives showed up, but parity checks were very slow, summary post is here.

 

This seems like 2 strikes against Marvell now.  Would it be useful to get them involved?  Their reputation is at stake here, going to 'strike out' with unRAID users, if they can't give us a correction/patch or configuration change.

 

Ironic. Without even seeing these posts yet I started a parity check earlier. I stopped it since it was going terribly slow. 60MB/sec or less. Should be in the 100+ range on my system. I have two SAS2 cards. All drives on both cards. Not using any of the mainboards SATA connections. Right now I'm running that tunable script so I don't want to touch the server. Both cards are attached to Norco 4224 blackplanes. I would even buy a couple other brand cards, but if I do that would want to get, compatible and good performing cards. Even for test purposes I would do it, but need to find the cards available somewhere online.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.