[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)

wgstarks · September 8, 2015

I see that you and bkastner, which appears to be only a little affected, both have Xeons, I can’t go buy one just for testing but if anyone reading this thread has a Xeon and slow parity checks speed with a SAS2LP, or normal speed without a Xeon please post here, maybe we can find some logic to why only some users are affected.

I have the Xeon E3 1230v3 Haswell. CPU utilization cycles between 5% and 15% during parity checks with average speed ~30 MB/s.

JorgeB · September 8, 2015

I see that you and bkastner, which appears to be only a little affected, both have Xeons, I can’t go buy one just for testing but if anyone reading this thread has a Xeon and slow parity checks speed with a SAS2LP, or normal speed without a Xeon please post here, maybe we can find some logic to why only some users are affected.

I have the Xeon E3 1230v3 Haswell. CPU utilization cycles between 5% and 15% during parity checks with average speed ~30 MB/s.

Well, another theory out the window…

I can’t find anything in common for users with the issue or without it.

flaggart · September 8, 2015

Same issues as everyone else.. sync speed around 40MB/sec to start with, increases once the 2tb drives complete.

01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
Subsystem: Marvell Technology Group Ltd. Device 9480
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at ff640000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at ff600000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at ff660000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap:MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl:Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta:CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap:Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl:ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta:Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
UESta:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt:DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta:RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk:RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap:First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps:LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed- WRR32- WRR64- WRR128-
Ctrl:ArbSelect=Fixed
Status:InProgress-
VC0:Caps:PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:Enable+ ID=0 ArbSelect=Fixed TC/VC=0
Status:NegoPending- InProgress-
Kernel driver in use: mvsas
Kernel modules: mvsas

Bungy · September 8, 2015

I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard?

JorgeB · September 8, 2015

I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard?

System Stats from Dynamix V6 Plugins

https://lime-technology.com/forum/index.php?topic=36543.0

RobJ · September 8, 2015

I'm probably not saying anything new here, but just wanted to clarify the issues, restate the problem. There appear to be 2 completely different issues at work, both of which can result in much slower parity checks than previously on the same hardware.

* The 'maxed out CPU issue' - something unknown causes the CPU to max out at 100%, which results in all major processes (such as parity checks and builds, drive rebuilds, possibly even the Mover) being severely impacted in performance. It can be any disk controller. A higher performance CPU can help, but doesn't necessarily resolve the issue.

* The 'SAS2LP issue' - disk controller is a SAS2LP only. CPU usage is normal, usually low. Only the parity *check* speed is affected.

The acid test indicator is high or low CPU numbers. The SAS2LP appears to have multiple issues with v6 (see my post here).

I have to wonder if the 'maxed out CPU issue' is at all related to the 'CPU is 100% and emhttp is hanging' issue, which has several threads out there. There's an interesting thread with a potential workaround being tested here.

I agree with Brit on the 2 line items of note (back here). If anyone is interested in poring through the Linux kernel change logs, I have a wiki page that will help, shows what kernel version was include with each unRAID release.

I would like to express my appreciation to johnnie.black, for the great job he has done with all the testing, and all the time it took!

Kir · September 8, 2015

Rob,

What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue.

RobJ · September 8, 2015

Rob,

What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue.

I missed that, but it sounds like a special case of the SAS2LP issue, an extreme case!

ntrlsur · September 8, 2015

I'm probably not saying anything new here, but just wanted to clarify the issues, restate the problem. There appear to be 2 completely different issues at work, both of which can result in much slower parity checks than previously on the same hardware.

* The 'maxed out CPU issue' - something unknown causes the CPU to max out at 100%, which results in all major processes (such as parity checks and builds, drive rebuilds, possibly even the Mover) being severely impacted in performance. It can be any disk controller. A higher performance CPU can help, but doesn't necessarily resolve the issue.

* The 'SAS2LP issue' - disk controller is a SAS2LP only. CPU usage is normal, usually low. Only the parity *check* speed is affected.

The acid test indicator is high or low CPU numbers. The SAS2LP appears to have multiple issues with v6 (see my post here).

I have to wonder if the 'maxed out CPU issue' is at all related to the 'CPU is 100% and emhttp is hanging' issue, which has several threads out there. There's an interesting thread with a potential workaround being tested here.

I agree with Brit on the 2 line items of note (back here). If anyone is interested in poring through the Linux kernel change logs, I have a wiki page that will help, shows what kernel version was include with each unRAID release.

I would like to express my appreciation to johnnie.black, for the great job he has done with all the testing, and all the time it took!

I went through the change log for Kernal 3.15.1 a bunch of changes happened for btrfs and some other lower level stuff that was over my head. I am a hardware / network guy not a software guy

Reviewing the latest posts what was interesting to me is EdgarWallace got good results from his parity check over 100MB/sec. He posted the results of the lspci from his card and I noticed some diffrences between the results for his card and my card and flaggart's card. The line that stands out to me

Edgar and bkastner who don't have problems with there SAS2LP's

LnkCtl:	ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

Mine flaggart opentoe we do.

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

RobJ pointed this out earlier in the thread and for some reason its sticking in my mind as an issue. I am wondering if Active State Power Management is the problem here. By running

lspci -vv | grep ASPM

I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check.

JorgeB · September 8, 2015

I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check.

I did change my bios PCIe power setting after Rob noticing mine was off and it didn't make any difference, but no harm if you try it also.

I was planning on making more tests before posting but I did find what looks like a correlation between the total number of reads and the speed of a parity check, i'm not talking about the different individual disk read numbers that we all assume are normal, but the total read numbers for the same array at the end of a parity check, for example:

version - avg speed - total reads

v6b1 - 125.1 - 2.612.645

v6b14 - 69.7 - 5.249.593

v.6.1.1 - 51.8 - 7.595.330

i'm going to compare all betas to confirm if it's true for all, also plan to test with a different controller like a SASLP to check if read numbers also change with different versions.

It makes sense that more reads = more i/o's = less speed, but this issue doesn't make much sense, so who knows, also have no clue why there's such a big variation and if anything can be done to change it.

bkastner · September 8, 2015

Rob,

What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue.

I missed that, but it sounds like a special case of the SAS2LP issue, an extreme case!

You actually helped me diagnose this specific issue:

http://lime-technology.com/forum/index.php?topic=42666.0

jonp · September 8, 2015

Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue:

What might be helpful is to post the output of this command:
lspci -vv -d 1b4b:*
This will list the details of the Marvell controller installed in your server.

wgstarks · September 8, 2015

Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue:
What might be helpful is to post the output of this command:
lspci -vv -d 1b4b:*
This will list the details of the Marvell controller installed in your server.

Just to be clear, I am experiencing these slowdown issues.

Brunnhilde login: root
Password: 
Linux 4.1.5-unRAID.
Last login: Tue Sep  8 18:51:59 -0400 2015 on /dev/tty1.
root@Brunnhilde:~# lspci -vv -d 1b4b:*
04:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
Subsystem: Marvell Technology Group Ltd. Device 9480
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f7140000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at f7100000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at f7160000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
	Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
	Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
	DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
		ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
	DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
		RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
	Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
	Arb:	Fixed- WRR32- WRR64- WRR128-
	Ctrl:	ArbSelect=Fixed
	Status:	InProgress-
	VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
		Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
		Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
		Status:	NegoPending- InProgress-
Kernel driver in use: mvsas
Kernel modules: mvsas

root@Brunnhilde:~#

opentoe · September 8, 2015

@opentoe, I experienced the same. Look at the mess here (partially self created): http://lime-technology.com/forum/index.php?topic=42594.msg407433#msg407433

Anyhow, I don't trust my array anymore. The >200 errors the the parity sync showed and that have been corrected yesterday were making me very nervous and I really don't know which files were affected by that "correction". My main question: is a backup now overwriting with corrupted files?

root@Tower:~# lspci -vv -d 1b4b:*
02:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
Subsystem: Marvell Technology Group Ltd. Device 9480
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at dfa40000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at dfa00000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at dfa60000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
	Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
	Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
	DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
		ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
	DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
		RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
		MaxPayload 128 bytes, MaxReadReq 512 bytes
	DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
	LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
		ClockPM- Surprise- LLActRep- BwNot-
	LnkCtl:	ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
		ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
	LnkSta:	Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
	DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
		 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
		 Compliance De-emphasis: -6dB
	LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
		 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Advanced Error Reporting
	UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
	UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
	CESta:	RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+
	CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
	AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
	Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
	Arb:	Fixed- WRR32- WRR64- WRR128-
	Ctrl:	ArbSelect=Fixed
	Status:	InProgress-
	VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
		Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
		Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
		Status:	NegoPending- InProgress-
Kernel driver in use: mvsas
Kernel modules: mvsas

I'm not affected by the speed drop. Parity check is starting at around 150MB/sec and finished yesterday at 105MB/sec.

Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this.

opentoe · September 8, 2015

Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue:
What might be helpful is to post the output of this command:
lspci -vv -d 1b4b:*
This will list the details of the Marvell controller installed in your server.

Glad to hear this. At least the little few of us aren't crazy! I'm not going to perform a parity check again until maybe there is some odd solution or I find those Dell H310 cards. I did post the output of that command, so if anything else is needed let me know. Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site. Also, remember this post? It was one of the times I did a parity check and kept getting the same errors. I don't remember what version of unraid I was using but was for certain using the SAS card(s).

http://lime-technology.com/forum/index.php?topic=38359.30

mr-hexen · September 8, 2015

Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this.

How's the rebuild speed?

ntrlsur · September 8, 2015

Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue:
What might be helpful is to post the output of this command:
lspci -vv -d 1b4b:*
This will list the details of the Marvell controller installed in your server.

root@Zeus:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)
        Subsystem: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at fd240000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at fd200000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at fd260000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP+ Rollover- Timeout+ NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

Also having speed issues.

tdallen · September 8, 2015

I upgraded from 6.0.1 to 6.1.1 without issue. The speed issues that have been present since moving from unRAID 5 to unRAID 6b12 are still present, though (unRAID 5 parity check speeds were in the 90s?). Parity checks start out at around 46MB/s and get up to over 100MB/s when it passes the boundary from my 3 3TB drives up to the single 6TB drive. Total average parity check speed is 64.6MB/s, virtually identical to 6.0.1. I worked through my red ball issues during the beta, no recent recurrences of that, thankfully.

root@Tower:~# lspci -vv -d 1b4b:*
01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3)
        Subsystem: Marvell Technology Group Ltd. Device 9480
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fc7e0000 (64-bit, non-prefetchable) [size=128K]
        Region 2: Memory at fc780000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at fc7d0000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Kernel driver in use: mvsas
        Kernel modules: mvsas

CPU usage is always very low.

opentoe · September 9, 2015

Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this.

How's the rebuild speed?

Never thought to check, was a long day at work. It is 126MB/sec. A lot faster then expected. A re-build isn't like doing a parity check, is it? Cause if it is then why I would get 60MB/sec during a parity check and then double that with a re-build? I'm afraid to do a parity test again. Don't want to lose any data or have to re-do another drive.

ntrlsur · September 9, 2015

I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check.

I did change my bios PCIe power setting after Rob noticing mine was off and it didn't make any difference, but no harm if you try it also.

I was planning on making more tests before posting but I did find what looks like a correlation between the total number of reads and the speed of a parity check, i'm not talking about the different individual disk read numbers that we all assume are normal, but the total read numbers for the same array at the end of a parity check, for example:

version - avg speed - total reads

v6b1 - 125.1 - 2.612.645

v6b14 - 69.7 - 5.249.593

v.6.1.1 - 51.8 - 7.595.330

i'm going to compare all betas to confirm if it's true for all, also plan to test with a different controller like a SASLP to check if read numbers also change with different versions.

It makes sense that more reads = more i/o's = less speed, but this issue doesn't make much sense, so who knows, also have no clue why there's such a big variation and if anything can be done to change it.

ASPM didn't make a difference on my system back to the drawing board it would seem.

hbr245b · September 9, 2015

unraid as a guest on esxi:

root@unraid-backup:~# lspci -vv -d 1b4b:*

03:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03)

Subsystem: Marvell Technology Group Ltd. Device 9480

Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

Latency: 64, Cache Line Size: 64 bytes

Interrupt: pin A routed to IRQ 18

Region 0: Memory at fd5c0000 (64-bit, non-prefetchable)

Region 2: Memory at fd580000 (64-bit, non-prefetchable)

Capabilities: [40] Power Management version 3

Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)

Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+

Address: 0000000000000000 Data: 0000

Capabilities: [70] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us

ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-

DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

MaxPayload 128 bytes, MaxReadReq 128 bytes

DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

LnkCap: Port #0, Speed 5GT/s, Width x32, ASPM L0s, Latency L0 <64ns, L1 <1us

ClockPM- Surprise- LLActRep- BwNot-

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 5GT/s, Width x32, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported

DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-

Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

Compliance De-emphasis: -6dB

LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

Capabilities: [100 v1] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

Capabilities: [140 v1] Virtual Channel

Caps: LPEVC=0 RefClk=100ns PATEntryBits=1

Arb: Fixed- WRR32- WRR64- WRR128-

Ctrl: ArbSelect=Fixed

Status: InProgress-

VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-

Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-

Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01

Status: NegoPending- InProgress-

Kernel driver in use: mvsas

Kernel modules: mvsas

I don't have a problem with the speed of parity check; what happens on my system is that one or more disks red-balls during a check.

Fireball3 · September 9, 2015

Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site.

I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310.

I will check the link in this post.

What would be another appropriate free file host where I can upload that package for you?

Bungy · September 9, 2015

I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard?

System Stats from Dynamix V6 Plugins

https://lime-technology.com/forum/index.php?topic=36543.0

Thanks Johnnie!

Bungy · September 9, 2015

Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site.

I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310.

I will check the link in this post.

What would be another appropriate free file host where I can upload that package for you?

It's been a while since I flashed my H310's but these are the links that I have saved from that process. I also have the firmware's that I used. At one point in time I found a link where somebody packaged all the commands into separate batch files and depending on your hardware and your needs, you would run specific versions of those files. I can't find that link, but I found it to be helpful.

https://techmattr.wordpress.com/2013/08/30/sas-hba-crossflashing-or-flashing-to-it-mode/

http://forums.overclockers.com.au/showthread.php?t=1045376

https://forums.servethehome.com/index.php?threads/lsi-raid-controller-and-hba-complete-listing-plus-oem-models.599/

opentoe · September 9, 2015

Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site.

I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310.

I will check the link in this post.

What would be another appropriate free file host where I can upload that package for you?

I don't want to turn this thread into another crossflashing one, so any files I did find I threw them up on my FTP server that I posted earlier in this thread. I think they were the same files from that malware site.

By the way, that drive that red balled when I was doing a parity check rebuilt %100 and is running fine. I ran a long SMART test on it with no errors.

[Partially SOLVED] Is there an effort to solve the SAS2LP issue? (Tom Question)

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation