wgstarks Posted September 8, 2015 Share Posted September 8, 2015 I see that you and bkastner, which appears to be only a little affected, both have Xeons, I can’t go buy one just for testing but if anyone reading this thread has a Xeon and slow parity checks speed with a SAS2LP, or normal speed without a Xeon please post here, maybe we can find some logic to why only some users are affected. I have the Xeon E3 1230v3 Haswell. CPU utilization cycles between 5% and 15% during parity checks with average speed ~30 MB/s. Quote Link to comment
JorgeB Posted September 8, 2015 Share Posted September 8, 2015 I see that you and bkastner, which appears to be only a little affected, both have Xeons, I can’t go buy one just for testing but if anyone reading this thread has a Xeon and slow parity checks speed with a SAS2LP, or normal speed without a Xeon please post here, maybe we can find some logic to why only some users are affected. I have the Xeon E3 1230v3 Haswell. CPU utilization cycles between 5% and 15% during parity checks with average speed ~30 MB/s. Well, another theory out the window… I can’t find anything in common for users with the issue or without it. Quote Link to comment
flaggart Posted September 8, 2015 Share Posted September 8, 2015 Same issues as everyone else.. sync speed around 40MB/sec to start with, increases once the 2tb drives complete. 01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3) Subsystem: Marvell Technology Group Ltd. Device 9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at ff640000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at ff600000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at ff660000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap:MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl:Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta:CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap:Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl:ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta:Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk:DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt:DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta:RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk:RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap:First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps:LPEVC=0 RefClk=100ns PATEntryBits=1 Arb:Fixed- WRR32- WRR64- WRR128- Ctrl:ArbSelect=Fixed Status:InProgress- VC0:Caps:PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl:Enable+ ID=0 ArbSelect=Fixed TC/VC=0 Status:NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas Quote Link to comment
Bungy Posted September 8, 2015 Share Posted September 8, 2015 I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard? Quote Link to comment
JorgeB Posted September 8, 2015 Share Posted September 8, 2015 I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard? System Stats from Dynamix V6 Plugins https://lime-technology.com/forum/index.php?topic=36543.0 Quote Link to comment
RobJ Posted September 8, 2015 Share Posted September 8, 2015 I'm probably not saying anything new here, but just wanted to clarify the issues, restate the problem. There appear to be 2 completely different issues at work, both of which can result in much slower parity checks than previously on the same hardware. * The 'maxed out CPU issue' - something unknown causes the CPU to max out at 100%, which results in all major processes (such as parity checks and builds, drive rebuilds, possibly even the Mover) being severely impacted in performance. It can be any disk controller. A higher performance CPU can help, but doesn't necessarily resolve the issue. * The 'SAS2LP issue' - disk controller is a SAS2LP only. CPU usage is normal, usually low. Only the parity *check* speed is affected. The acid test indicator is high or low CPU numbers. The SAS2LP appears to have multiple issues with v6 (see my post here). I have to wonder if the 'maxed out CPU issue' is at all related to the 'CPU is 100% and emhttp is hanging' issue, which has several threads out there. There's an interesting thread with a potential workaround being tested here. I agree with Brit on the 2 line items of note (back here). If anyone is interested in poring through the Linux kernel change logs, I have a wiki page that will help, shows what kernel version was include with each unRAID release. I would like to express my appreciation to johnnie.black, for the great job he has done with all the testing, and all the time it took! Quote Link to comment
Kir Posted September 8, 2015 Share Posted September 8, 2015 Rob, What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue. Quote Link to comment
RobJ Posted September 8, 2015 Share Posted September 8, 2015 Rob, What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue. I missed that, but it sounds like a special case of the SAS2LP issue, an extreme case! Quote Link to comment
ntrlsur Posted September 8, 2015 Share Posted September 8, 2015 I'm probably not saying anything new here, but just wanted to clarify the issues, restate the problem. There appear to be 2 completely different issues at work, both of which can result in much slower parity checks than previously on the same hardware. * The 'maxed out CPU issue' - something unknown causes the CPU to max out at 100%, which results in all major processes (such as parity checks and builds, drive rebuilds, possibly even the Mover) being severely impacted in performance. It can be any disk controller. A higher performance CPU can help, but doesn't necessarily resolve the issue. * The 'SAS2LP issue' - disk controller is a SAS2LP only. CPU usage is normal, usually low. Only the parity *check* speed is affected. The acid test indicator is high or low CPU numbers. The SAS2LP appears to have multiple issues with v6 (see my post here). I have to wonder if the 'maxed out CPU issue' is at all related to the 'CPU is 100% and emhttp is hanging' issue, which has several threads out there. There's an interesting thread with a potential workaround being tested here. I agree with Brit on the 2 line items of note (back here). If anyone is interested in poring through the Linux kernel change logs, I have a wiki page that will help, shows what kernel version was include with each unRAID release. I would like to express my appreciation to johnnie.black, for the great job he has done with all the testing, and all the time it took! I went through the change log for Kernal 3.15.1 a bunch of changes happened for btrfs and some other lower level stuff that was over my head. I am a hardware / network guy not a software guy Reviewing the latest posts what was interesting to me is EdgarWallace got good results from his parity check over 100MB/sec. He posted the results of the lspci from his card and I noticed some diffrences between the results for his card and my card and flaggart's card. The line that stands out to me Edgar and bkastner who don't have problems with there SAS2LP's LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- Mine flaggart opentoe we do. LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- RobJ pointed this out earlier in the thread and for some reason its sticking in my mind as an issue. I am wondering if Active State Power Management is the problem here. By running lspci -vv | grep ASPM I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check. Quote Link to comment
JorgeB Posted September 8, 2015 Share Posted September 8, 2015 I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check. I did change my bios PCIe power setting after Rob noticing mine was off and it didn't make any difference, but no harm if you try it also. I was planning on making more tests before posting but I did find what looks like a correlation between the total number of reads and the speed of a parity check, i'm not talking about the different individual disk read numbers that we all assume are normal, but the total read numbers for the same array at the end of a parity check, for example: version - avg speed - total reads v6b1 - 125.1 - 2.612.645 v6b14 - 69.7 - 5.249.593 v.6.1.1 - 51.8 - 7.595.330 i'm going to compare all betas to confirm if it's true for all, also plan to test with a different controller like a SASLP to check if read numbers also change with different versions. It makes sense that more reads = more i/o's = less speed, but this issue doesn't make much sense, so who knows, also have no clue why there's such a big variation and if anything can be done to change it. Quote Link to comment
bkastner Posted September 8, 2015 Share Posted September 8, 2015 Rob, What about the issue when SAS2LP locks up/throws errors during parity check only? Looks like I'm not the only one with the issue. I missed that, but it sounds like a special case of the SAS2LP issue, an extreme case! You actually helped me diagnose this specific issue: http://lime-technology.com/forum/index.php?topic=42666.0 Quote Link to comment
jonp Posted September 8, 2015 Share Posted September 8, 2015 Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue: What might be helpful is to post the output of this command: lspci -vv -d 1b4b:* This will list the details of the Marvell controller installed in your server. Quote Link to comment
wgstarks Posted September 8, 2015 Share Posted September 8, 2015 Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue: What might be helpful is to post the output of this command: lspci -vv -d 1b4b:* This will list the details of the Marvell controller installed in your server. Just to be clear, I am experiencing these slowdown issues. Brunnhilde login: root Password: Linux 4.1.5-unRAID. Last login: Tue Sep 8 18:51:59 -0400 2015 on /dev/tty1. root@Brunnhilde:~# lspci -vv -d 1b4b:* 04:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3) Subsystem: Marvell Technology Group Ltd. Device 9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at f7140000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at f7100000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at f7160000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas root@Brunnhilde:~# Quote Link to comment
opentoe Posted September 8, 2015 Share Posted September 8, 2015 @opentoe, I experienced the same. Look at the mess here (partially self created): http://lime-technology.com/forum/index.php?topic=42594.msg407433#msg407433 Anyhow, I don't trust my array anymore. The >200 errors the the parity sync showed and that have been corrected yesterday were making me very nervous and I really don't know which files were affected by that "correction". My main question: is a backup now overwriting with corrupted files? root@Tower:~# lspci -vv -d 1b4b:* 02:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03) Subsystem: Marvell Technology Group Ltd. Device 9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 17 Region 0: Memory at dfa40000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at dfa00000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at dfa60000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas I'm not affected by the speed drop. Parity check is starting at around 150MB/sec and finished yesterday at 105MB/sec. Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this. Quote Link to comment
opentoe Posted September 8, 2015 Share Posted September 8, 2015 Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue: What might be helpful is to post the output of this command: lspci -vv -d 1b4b:* This will list the details of the Marvell controller installed in your server. Glad to hear this. At least the little few of us aren't crazy! I'm not going to perform a parity check again until maybe there is some odd solution or I find those Dell H310 cards. I did post the output of that command, so if anything else is needed let me know. Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site. Also, remember this post? It was one of the times I did a parity check and kept getting the same errors. I don't remember what version of unraid I was using but was for certain using the SAS card(s). http://lime-technology.com/forum/index.php?topic=38359.30 Quote Link to comment
mr-hexen Posted September 8, 2015 Share Posted September 8, 2015 Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this. How's the rebuild speed? Quote Link to comment
ntrlsur Posted September 8, 2015 Share Posted September 8, 2015 Update: We scrounged up a system that actually has two of these cards. Vendor ID 1b4b Product ID 9485. We are experiencing similar performance degradation and are doing further testing to see what we can do to improve this. For those experiencing this issue, Tom had requested that folks perform some commands and reply here with the information. Please continue to do so as this will help us in narrowing down the root cause of this issue: What might be helpful is to post the output of this command: lspci -vv -d 1b4b:* This will list the details of the Marvell controller installed in your server. root@Zeus:~# lspci -vv -d 1b4b:* 01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03) Subsystem: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at fd240000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at fd200000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at fd260000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM L0s Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP+ Rollover- Timeout+ NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas Also having speed issues. Quote Link to comment
tdallen Posted September 8, 2015 Share Posted September 8, 2015 I upgraded from 6.0.1 to 6.1.1 without issue. The speed issues that have been present since moving from unRAID 5 to unRAID 6b12 are still present, though (unRAID 5 parity check speeds were in the 90s?). Parity checks start out at around 46MB/s and get up to over 100MB/s when it passes the boundary from my 3 3TB drives up to the single 6TB drive. Total average parity check speed is 64.6MB/s, virtually identical to 6.0.1. I worked through my red ball issues during the beta, no recent recurrences of that, thankfully. root@Tower:~# lspci -vv -d 1b4b:* 01:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev c3) Subsystem: Marvell Technology Group Ltd. Device 9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at fc7e0000 (64-bit, non-prefetchable) [size=128K] Region 2: Memory at fc780000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at fc7d0000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <512ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch+ ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas CPU usage is always very low. Quote Link to comment
opentoe Posted September 9, 2015 Share Posted September 9, 2015 Soon as I started fooling around with doing parity checks I've had two drives RED BALL on me. Thank god one drive was almost empty. Last night I removed the one that was half full, stuck it in my windows external dock and ran a sector by sector and verify check and all checked fine. I stuck it back in the array and it is now re-building. I remember this also happened when I did my previous parity check. Did the same thing. Stuck the drive in my Windows dock, did a sector by sector check and put it back in the array and it is still being used today, no problems. One was 2 TB drive and one was a 1TB drive. I removed the 1TB drive (the 2nd red ball that was empty) and just stuck with the 2TB drive. Doing a rebuild at %2 now. Can't explain any of this. How's the rebuild speed? Never thought to check, was a long day at work. It is 126MB/sec. A lot faster then expected. A re-build isn't like doing a parity check, is it? Cause if it is then why I would get 60MB/sec during a parity check and then double that with a re-build? I'm afraid to do a parity test again. Don't want to lose any data or have to re-do another drive. Quote Link to comment
ntrlsur Posted September 9, 2015 Share Posted September 9, 2015 I know that it is for sure disabled on my system. When I get home I will check the BIOS to see if I can enable Active State Power Management and run another parity check. I did change my bios PCIe power setting after Rob noticing mine was off and it didn't make any difference, but no harm if you try it also. I was planning on making more tests before posting but I did find what looks like a correlation between the total number of reads and the speed of a parity check, i'm not talking about the different individual disk read numbers that we all assume are normal, but the total read numbers for the same array at the end of a parity check, for example: version - avg speed - total reads v6b1 - 125.1 - 2.612.645 v6b14 - 69.7 - 5.249.593 v.6.1.1 - 51.8 - 7.595.330 i'm going to compare all betas to confirm if it's true for all, also plan to test with a different controller like a SASLP to check if read numbers also change with different versions. It makes sense that more reads = more i/o's = less speed, but this issue doesn't make much sense, so who knows, also have no clue why there's such a big variation and if anything can be done to change it. ASPM didn't make a difference on my system back to the drawing board it would seem. Quote Link to comment
hbr245b Posted September 9, 2015 Share Posted September 9, 2015 unraid as a guest on esxi: root@unraid-backup:~# lspci -vv -d 1b4b:* 03:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller (rev 03) Subsystem: Marvell Technology Group Ltd. Device 9480 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at fd5c0000 (64-bit, non-prefetchable) Region 2: Memory at fd580000 (64-bit, non-prefetchable) Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x32, ASPM L0s, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x32, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Kernel driver in use: mvsas Kernel modules: mvsas I don't have a problem with the speed of parity check; what happens on my system is that one or more disks red-balls during a check. Quote Link to comment
Fireball3 Posted September 9, 2015 Share Posted September 9, 2015 Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site. I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310. I will check the link in this post. What would be another appropriate free file host where I can upload that package for you? Quote Link to comment
Bungy Posted September 9, 2015 Share Posted September 9, 2015 I have no idea what the problem is, but I just wanted to throw out a question that at least has an answer . How are you guys creating these graphs of CPU/memory/network/etc usage over time? Am I missing something on the dashboard? System Stats from Dynamix V6 Plugins https://lime-technology.com/forum/index.php?topic=36543.0 Thanks Johnnie! Quote Link to comment
Bungy Posted September 9, 2015 Share Posted September 9, 2015 Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site. I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310. I will check the link in this post. What would be another appropriate free file host where I can upload that package for you? It's been a while since I flashed my H310's but these are the links that I have saved from that process. I also have the firmware's that I used. At one point in time I found a link where somebody packaged all the commands into separate batch files and depending on your hardware and your needs, you would run specific versions of those files. I can't find that link, but I found it to be helpful. https://techmattr.wordpress.com/2013/08/30/sas-hba-crossflashing-or-flashing-to-it-mode/ http://forums.overclockers.com.au/showthread.php?t=1045376 https://forums.servethehome.com/index.php?threads/lsi-raid-controller-and-hba-complete-listing-plus-oem-models.599/ Quote Link to comment
opentoe Posted September 9, 2015 Share Posted September 9, 2015 Almost makes me just want to order the H310 cards. I can get them new on Ebay shipped from the US. Just have to confirm how to cross flash. I know there was a post here by the cross flash expert but couldn't get the files from that malware site. I feel schmoozed beeing called an "expert", but I'm not - maybe a bit experienced with the crossflashing of the H310. I will check the link in this post. What would be another appropriate free file host where I can upload that package for you? I don't want to turn this thread into another crossflashing one, so any files I did find I threw them up on my FTP server that I posted earlier in this thread. I think they were the same files from that malware site. By the way, that drive that red balled when I was doing a parity check rebuilt %100 and is running fine. I ran a long SMART test on it with no errors. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.