Jump to content
TODDLT

[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)

454 posts in this topic Last Reply

Recommended Posts

I left this open.  While my speed issue is resolved, others are unsure about redball issue during the parity check.

 

I will apologize right off, because I know there are a couple threads talking about this issue.  I am not trying to turn this into another one, but at least one person suggested a solution is not even on the drawing board, so my intent in asking is to decide if I should just sell the card on Ebay and not worry about it, or just hang on for a bit and wait.

 

I actually just got the SAS2LP card to replace my older SASLP.  After swapping my parity checks went from 75-80 down to 35-40 MB/sec.  This is the case for 6.01 and 6.1  Tonight I just rolled back to V5.0.5 to test and the parity check was averaging 85 MB /sec.  So with all the hardware the same, there is something clearly different between the updates.

 

The only other thing I will add is I took these charts off my V6 with the SAS2LP installed and the CPU / drive speed jumps, its not consistent.  It was not like this with the SASLP card installed.  I wish I had kept those graphs but can get them in a day or so when I swap the card back.

 

Anyway, I really just want to know if its on the radar to solve. If so I'll wait patiently and help if I can.  If not I don't want to hold on toa $100 card for nothing.

 

 

 

CPU.JPG.bb7499ce3c5e9f095c43901fc0875377.JPG

Storage.JPG.a8167571d63ec762ab6203e98f988c49.JPG

Share this post


Link to post

The stability issue does appear to affect only some people, it doesn’t affect me, but I believe the parity check speed issue affects everyone with a SAS2LP, although some users much more than others.

 

I tested in 6 very different boards, including two Supermicro server boards, with 4 or 16Gb of RAM, CPUs ranging from 1.6Ghz Celeron to a 4Ghz i5 and parity check speed was always a lot slower compared to Unraid V5, anything from 25Mb/s to best case about 90Mb/s, most common between 40 and 70b/s.

 

Below is one example in my test server, nothing changes between tests except Unraid version.

 

2MqDic8.jpg mFyCzFF.jpg

 

 

I understand this is not a high priority issue, but would also like to know if it is being looked at, or I might consider selling my SAS2LP cards.

 

More tests and more users with same problem in this thread:

http://lime-technology.com/forum/index.php?topic=39125.msg386462#msg386462

Share this post


Link to post

Have you guys tried using the tunables script? I am one of those users who have 2 SAS2LP cards, and all my drives are on them (none on the MB) and have had no issues. Here is my parity check from yesterday:

 

Last checked on Tue 01 Sep 2015 05:45:14 PM EDT (yesterday), finding 0 errors.

Duration: 17 hours, 45 minutes, 13 seconds. Average speed: 93.9 MB/sec

 

Some users get higher, but I don't think this is an unreasonable speed.

 

There so seem to be a few users with issues with these cards, but there seems to be at least as many (and likely quite a few more) that run these cards without issues. This leads one to believe that the issue may not be with UnRAID per se, but something in various user's configs.

Share this post


Link to post

I've written this up as one of 3 separate controller compatibility issues (here), although not complete for this one.  It appears to me to be a completely different problem than the Marvell one, but I'm not an expert, and have done little research.

 

Research is what is needed, as I don't see what Tom can do here.  It looks like a driver issue, perhaps something about the 64 bit driver in the newer kernels with virtualization.  I suspect that other Linux users with similar kernels, 64 bit, are also having similar problems, and sooner or later one of them will discover a setting somewhere, or a patch needed.  Unfortunately, the specifics of the issue here (only during parity checks, 4.1 kernel) may make it hard to find elsewhere.

 

Have to add the obligatory comment here for perspective, a parity check is just a maintenance task, not totally required.  It's hard to see the rationale in replacing a card that works fine in normal operations, but is slow in a particular maintenance task, and can be scheduled for the off-hours.  Parity check speed does not seem that important, more for bragging rights than anything else.

Share this post


Link to post

... This leads one to believe that the issue may not be with UnRAID per se, but something in various user's configs.

 

I don't think it's a configuration issue, as the card works fine in v5, and only has the issue in v6.  Several folks with this issue have confirmed this.    It could, of course, be only specific configurations when coupled with the v6 drivers ... but the fact it works fine with v5 certainly implies it CAN be resolved.

 

Agree, however, that it's a very difficult issue to pin down, since it's not a consistent problem among all SAS2LP users.  I'd certainly avoid the card if I was buying a new 8-port card right now ... but whether you want to sell ones you already have or not is a personal choice.    As Rob noted, it's only slow when you're using all the drives on it at once ... which is basically just during parity checks and rebuilds ... so it's not really a major issue.

 

 

Share this post


Link to post

I assume that everyone with the card, with or without the issue, has flashed it with the same firmware?  And same BIOS virtualization settings?

 

It would be good to accumulate a list of the exact models involved, the firmware, the virtualization settings, anything unusual in their setup, and if they have the issue or not.

Share this post


Link to post

Have you guys tried using the tunables script? I am one of those users who have 2 SAS2LP cards, and all my drives are on them (none on the MB) and have had no issues. Here is my parity check from yesterday:

 

Last checked on Tue 01 Sep 2015 05:45:14 PM EDT (yesterday), finding 0 errors.

Duration: 17 hours, 45 minutes, 13 seconds. Average speed: 93.9 MB/sec

 

Some users get higher, but I don't think this is an unreasonable speed.

 

There so seem to be a few users with issues with these cards, but there seems to be at least as many (and likely quite a few more) that run these cards without issues. This leads one to believe that the issue may not be with UnRAID per se, but something in various user's configs.

 

Your case is different, at the 3Tb mark you lose half your drives, and at the 4tb mark you only have left the two 6tb drives, this will highly inflate your average speed, based on my tests your starting speed is around 90 – 100Mb/s, feel free to post a screenshot.

 

This is from a test I did on an unlimited server with drives similar to yours, parity check start’s at around 150Mb/s:

 

Duration: 15 hours, 33 minutes, 40 seconds. Average speed: 107.1 MB/sec

 

So you’re still limited, although in your case is not a big difference because of the various drive sizes.

 

Share this post


Link to post

Have you guys tried using the tunables script? I am one of those users who have 2 SAS2LP cards, and all my drives are on them (none on the MB) and have had no issues. Here is my parity check from yesterday:

 

Last checked on Tue 01 Sep 2015 05:45:14 PM EDT (yesterday), finding 0 errors.

Duration: 17 hours, 45 minutes, 13 seconds. Average speed: 93.9 MB/sec

 

Some users get higher, but I don't think this is an unreasonable speed.

 

There so seem to be a few users with issues with these cards, but there seems to be at least as many (and likely quite a few more) that run these cards without issues. This leads one to believe that the issue may not be with UnRAID per se, but something in various user's configs.

 

Your case is different, at the 3Tb mark you lose half your drives, and at the 4tb mark you only have left the two 6tb drives, this will highly inflate your average speed, based on my tests your starting speed is around 90 – 100Mb/s, feel free to post a screenshot.

 

This is from a test I did on an unlimited server with drives similar to yours, parity check start’s at around 150Mb/s:

 

Duration: 15 hours, 33 minutes, 40 seconds. Average speed: 107.1 MB/sec

 

So you’re still limited, although in your case is not a big difference because of the various drive sizes.

 

Fair enough, but also to RobJ's point - this is an event that happens once a month - as a maintenance task, so +/- 10MB/sec difference between friends isn't causing me a lot of sleepless nights. To be fair, if I was around 30-40MB/sec and knew the average was ~90-140MB/sec I would want to better understand the discrepancy, but I find my speed reasonable, so am not overly worried about it.

 

That being said, I am happy to contribute my settings for others to compare against to help determine where the differences lie to help determine root cause. I know one thread indicated that having a mix of MB based SATA and SAS2LP based SATA could give much worse results than having all drives on the SAS2LP cards - however since this has always been my config I can't validate it - but it may be worth checking out for those with issues as a test.

Share this post


Link to post

 

As Rob noted, it's only slow when you're using all the drives on it at once ... which is basically just during parity checks and rebuilds ... so it's not really a major issue.

 

I have 7 drives on the card and my speed is cut in half from V5 to V6 during parity checks which then take more than a day instead of going over night.

Share this post


Link to post

 

As Rob noted, it's only slow when you're using all the drives on it at once ... which is basically just during parity checks and rebuilds ... so it's not really a major issue.

 

I have 7 drives on the card and my speed is cut in half from V5 to V6 during parity checks which then take more than a day instead of going over night.

 

So, back to the question I posted earlier? Have you tried the tunables script at all?

 

http://lime-technology.com/forum/index.php?topic=29009.0

 

I would highly suggest you use this to refine your parameters and see where it leaves you. Moving to v6 is also moving to 64-bit which can change things. I think dropping 50% is definitely extreme, but with a correctly tuned environment this may be substantially reduced.

Share this post


Link to post

I assume that everyone with the card, with or without the issue, has flashed it with the same firmware?  And same BIOS virtualization settings?

 

It would be good to accumulate a list of the exact models involved, the firmware, the virtualization settings, anything unusual in their setup, and if they have the issue or not.

 

PSU: 2 x Supermicro 1200W

MB: Dell 0YXT71

SATA: Supermicro AOC-SAS2LP-MV8 with the latest FW

Backplane: Supermicro SAS2-846EL1

RAM: 8Gb

 

Card locks up only during parity check. Long story in this thread.

Share this post


Link to post

@Tom:

Would it help to enhance the driver logging and let ppl post logs with and without the issue?

In bugzilla of kernel.org I found some hints how to do that - could you assist what is the right method for unRaid?

[...] provide the driver logs by setting the driver logging level to 0x3f8.

Here are the steps to set the mpt2sas driver logging level

a.While loading the driver

        modprobe mpt2sas logging_level=0x3f8

b. If driver is in ramdisk, then in RHEL5/SLES/OEL5 OS, following line has to be added in /etc/modprobe.conf and reboot the system

options mpt2sas logging_level=0x3f8

(Or)

Add below word at the end of kernel module parameters line in /boot/grub/menu.lst or /boot/grub/grub.conf file and reboot the system

mpt2sas.logging_level=0x3f8

c. During driver run time

        echo 0x3f8 > /sys/module/mpt2sas/parameters/logging_level

driver options according modinfo mpt2sas:

parm:          logging_level: bits for enabling additional logging info (default=0)

parm:          max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)

parm:          missing_delay: device missing delay , io missing delay (array of int)

parm:          max_lun: max lun, default=16895  (int)

parm:          diag_buffer_enable: post diag buffers (TRACE=1/SNAPSHOT=2/EXTENDED=4/default=0) (int)

parm:          prot_mask: host protection capabilities mask, def=7  (int)

parm:          max_queue_depth: max controller queue depth  (int)

parm:          max_sgl_entries: max sg entries  (int)

parm:          msix_disable: disable msix routed interrupts (default=0) (int)

parm:          max_msix_vectors: max msix vectors  (int)

parm:          mpt2sas_fwfault_debug: enable detection of firmware fault and halt firmware - (default=0)

parm:          disable_discovery: disable discovery  (int)

iirc, the exact values can be found in teh LSI documentation...

Share this post


Link to post

What might be helpful is to post the output of this command:

 

lspci -vv -d 1b4b:*

 

This will list the details of the Marvell controller installed in your server.

 

There was an odd thing that happened with that card about 2 years ago.  During testing of a batch of about 10 cards, we came across one that would not be recognized by linux.  We set it aside and then encountered another one.  I think we ended up finding 3 or 4 cards that had this problem.  If only one or two cards I might have just RMA'ed them, but decided to look into it and discovered that the PCI subdevice ID was different.  Eventually found a patch as well.  (This patch has since been merged into linux upstream.)

 

Well the cards seemed to work ok with the motherboards we were using so we shipped 'em.

 

I've always wondered about that however.  Why would Marvell change the subdevice ID?  I have no idea.  But maybe we can discover a pattern of which controllers work and which suffer a slowdown.

 

For a completely different theory: besides going from 32-bit to 64-bit kernel, the other big change in unRaid is that we moved from non-preemptible to preemptible kernel.  Maybe this makes a difference?

 

Also, back in 6.0-beta6 we added this:

  CONFIG_SCSI_MVSAS_TASKLET: Support for interrupt tasklet (improves mvsas performance)

 

This is everything I know about this issue.  Needless to say, we cannot debug the mvsas driver code.  If we can find a pattern and/or find other postings on the 'net related to this issue, there might be a chance of a fix.

 

 

 

Share this post


Link to post

What might be helpful is to post the output of this command:

 

lspci -vv -d 11ab:*

 

This will list the details of the Marvell controller installed in your server.

 

This results in nothing coming back for me.

 

root@CydStorage:~# lspci -vv -d 11ab:*
root@CydStorage:~#

Share this post


Link to post

What might be helpful is to post the output of this command:

 

lspci -vv -d 11ab:*

 

This will list the details of the Marvell controller installed in your server.

 

This results in nothing coming back for me.

 

root@CydStorage:~# lspci -vv -d 11ab:*
root@CydStorage:~#

 

Instead of 11ab, try 1b4b

Share this post


Link to post

What might be helpful is to post the output of this command:

 

lspci -vv -d 11ab:*

 

This will list the details of the Marvell controller installed in your server.

 

This results in nothing coming back for me.

 

root@CydStorage:~# lspci -vv -d 11ab:*
root@CydStorage:~#

 

Instead of 11ab, try 1b4b

 

If that doesn't work you can just type:

 

lspci -vv > /boot/pci.txt

 

This will pipe the output to a file on your flash called 'pci.txt'.  Next you can edit the file and find 'Marvell' section - that's the only stuff that is relevant.

Share this post


Link to post

What might be helpful is to post the output of this command:

 

lspci -vv -d 11ab:*

 

This will list the details of the Marvell controller installed in your server.

 

This results in nothing coming back for me.

 

root@CydStorage:~# lspci -vv -d 11ab:*
root@CydStorage:~#

 

Instead of 11ab, try 1b4b

 

That seems to have worked better:

 

root@CydStorage:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f7340000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at f7300000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at f7360000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

02:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f7240000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at f7200000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at f7260000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

root@CydStorage:~#

 

As a reminder, I am one of the people with these cards and no issues.

Share this post


Link to post

Thanks for your report.  I modified my previous post to use '1b4b' instead - that's the proper vendor-id for that card.

 

Also, in your report notice the

 

Subsystem: Marvell Technology Group Ltd. Device 9480

 

That's what we're looking for.  9480 is the subdevice id of the 'older' cards.  The value 9485 is the 'new' subdevice id.  Maybe those are the ones with this issue?  Let's see...

Share this post


Link to post

Thanks for your report.  I modified my previous post to use '1b4b' instead - that's the proper vendor-id for that card.

 

Also, in your report notice the

 

Subsystem: Marvell Technology Group Ltd. Device 9480

 

That's what we're looking for.  9480 is the subdevice id of the 'older' cards.  The value 9485 is the 'new' subdevice id.  Maybe those are the ones with this issue?  Let's see...

Now that's interesting!  We recently had a user (3blackdots) with a 9485 that had the Marvell bug (no drives seen), post is here with my bug confirmation after it.  A few posts up, user opentoe had a 9480 where the drives showed up, but parity checks were very slow, summary post is here.

 

This seems like 2 strikes against Marvell now.  Would it be useful to get them involved?  Their reputation is at stake here, going to 'strike out' with unRAID users, if they can't give us a correction/patch or configuration change.

Share this post


Link to post

I have the speed issue in both my SAS2LP and they are also subsystem 9480, posting the complete results in case there’s any relevant info.

 

root@Testv6:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at dfa40000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at dfa00000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at dfa60000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

 

 

root@Testv6:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at dfa40000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at dfa00000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at dfa60000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

 

 

 

 

Share this post


Link to post

Here are the differences between the SAS2LP 9485 cards of bkastner and johnnie.black.  Cards appear to be identical models -

01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)

        Subsystem: Marvell Technology Group Ltd. Device 9480

 

bkastner's card (works fine) - (only showing lines with differences)

                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-

                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+

                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-

                CESta:  RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+

 

johnnie.black's card (works very slow on parity checks) -

                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+

                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

 

Share this post


Link to post

 

As Rob noted, it's only slow when you're using all the drives on it at once ... which is basically just during parity checks and rebuilds ... so it's not really a major issue.

 

I have 7 drives on the card and my speed is cut in half from V5 to V6 during parity checks which then take more than a day instead of going over night.

 

So, back to the question I posted earlier? Have you tried the tunables script at all?

 

http://lime-technology.com/forum/index.php?topic=29009.0

 

I would highly suggest you use this to refine your parameters and see where it leaves you. Moving to v6 is also moving to 64-bit which can change things. I think dropping 50% is definitely extreme, but with a correctly tuned environment this may be substantially reduced.

 

I am running this now.  thanks. 

 

I wont be able to do anything with the results till tomorrow night.

Share this post


Link to post

 

If that doesn't work you can just type:

 

lspci -vv > /boot/pci.txt

 

This will pipe the output to a file on your flash called 'pci.txt'.  Next you can edit the file and find 'Marvell' section - that's the only stuff that is relevant.

 

02:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
Subsystem: Marvell Technology Group Ltd. Device 9480
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at fe9e0000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at fe980000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at fe9d0000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
	Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
	Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
	DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
		ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
	DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
		RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
	Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
	Arb:	Fixed- WRR32- WRR64- WRR128-
	Ctrl:	ArbSelect=Fixed
	Status:	InProgress-
	VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
		Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
		Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
		Status:	NegoPending- InProgress-
Kernel driver in use: mvsas
Kernel modules: mvsas

03:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. Device 9215
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 30
Region 0: I/O ports at d800 [size=8]
Region 1: I/O ports at d400 [size=4]
Region 2: I/O ports at d000 [size=8]
Region 3: I/O ports at c800 [size=4]
Region 4: I/O ports at c400 [size=32]
Region 5: Memory at feaff800 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at feae0000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
	Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
	Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Address: fee0300c  Data: 41a2
Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
	DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
		ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
	DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
		RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [e0] SATA HBA v0.0 BAR4 Offset=00000004
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: ahci
Kernel modules: ahci

 

I think that captured what you wanted, but in case I missed it, also added the txt file.

pci.txt

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.