crowdx42 Posted December 9, 2022 Share Posted December 9, 2022 Hi all, so I have installed an Intel RES2SV240 SAS Expander in my test setup and I am getting really low speeds on parity build. Speeds with 16 drives on the expander and a single drive on the motherboard SATA port are running at 30mb/s. Without the Expander I am getting 150mb/s with the same drive setup. What I have is a Dell Perc 310 flashed to IT mode connected to the SAS expander via both ports and using the first two ports on the expander card. The 310 is plugged into the main 16x PCIe on the motherboard. The main error I am seeing in the logs is the screenshot below. Anyone got any ideas on what is going on? I have tried moving through the ports on the SAS card to see if disconnecting any of the ports and the drives attached is making an impact and it did not help. The only time I have seen reasonable speeds is when only 8 drives are connected. The drives are all 4 tb WD green drives with and additional 4 8tb WD white drives (shucked). The drives have all tested fine in Scrutiny plugin. Thoughts? Quote Link to comment
JorgeB Posted December 9, 2022 Share Posted December 9, 2022 Please post the diagnostics. Quote Link to comment
crowdx42 Posted December 9, 2022 Author Share Posted December 9, 2022 Here are the diagnostics. I did a little more testing and everything seems ok up until I add disk 11 to the array and try to build the parity drive. At that point regardless of what drive I use, the speeds drop off a cliff. tower-diagnostics-20221209-0933.zip Quote Link to comment
JorgeB Posted December 9, 2022 Share Posted December 9, 2022 So with any 10 disks works fine and an additional one causes issues? That's quite strange. Quote Link to comment
JorgeB Posted December 9, 2022 Share Posted December 9, 2022 Just to test see try using a single cable from the HBA to the expander, see if you get the same controller errors. Quote Link to comment
crowdx42 Posted December 9, 2022 Author Share Posted December 9, 2022 Yes, I already tried with a single cable and had the exact same issue, with the speed dropping by half, which is expected. There is a post from back in 2013 about flashing the firmware in these cards but I would think the card I just got should have a newer firmware regardless. I guess I need to check that out. Below is the link to that thread. Quote Link to comment
JorgeB Posted December 9, 2022 Share Posted December 9, 2022 3 minutes ago, crowdx42 said: Yes, I already tried with a single cable and had the exact same issue, with the speed dropping by half, which is expected. Yes, that part is expected, but do you also get these: Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: [sdk] tag#207 CDB: opcode=0x88 88 00 00 00 00 00 00 68 cd 00 00 00 04 00 00 00 Dec 9 09:34:28 Tower kernel: scsi target3:0:8: handle(0x0012), sas_address(0x5001e674645f1ff0), phy(16) Dec 9 09:34:28 Tower kernel: scsi target3:0:8: enclosure logical id(0x5001e674645f1fff), slot(16) Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: No reference found at driver, assuming scmd(0x0000000072900076) might have completed Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: task abort: SUCCESS scmd(0x0000000072900076) Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: attempting task abort!scmd(0x00000000ad72b41a), outstanding for 30612 ms & timeout 30000 ms Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: [sdk] tag#197 CDB: opcode=0x88 88 00 00 00 00 00 00 68 bd 00 00 00 04 00 00 00 Dec 9 09:34:28 Tower kernel: scsi target3:0:8: handle(0x0012), sas_address(0x5001e674645f1ff0), phy(16) Dec 9 09:34:28 Tower kernel: scsi target3:0:8: enclosure logical id(0x5001e674645f1fff), slot(16) Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: No reference found at driver, assuming scmd(0x00000000ad72b41a) might have completed Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: task abort: SUCCESS scmd(0x00000000ad72b41a) Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: attempting task abort!scmd(0x00000000cee9299c), outstanding for 30587 ms & timeout 30000 ms Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: [sdk] tag#194 CDB: opcode=0x88 88 00 00 00 00 00 00 68 d7 c0 00 00 04 00 00 00 Dec 9 09:34:28 Tower kernel: scsi target3:0:8: handle(0x0012), sas_address(0x5001e674645f1ff0), phy(16) Dec 9 09:34:28 Tower kernel: scsi target3:0:8: enclosure logical id(0x5001e674645f1fff), slot(16) Dec 9 09:34:28 Tower kernel: sd 3:0:8:0: No reference found at driver, assuming scmd(0x00000000cee9299c) might have completed Quote Link to comment
crowdx42 Posted December 9, 2022 Author Share Posted December 9, 2022 So I disconnected one of the ports coming from the 310 to the SAS expander and I have attached the diagnostics. tower-diagnostics-20221209-1119.zip Quote Link to comment
JorgeB Posted December 9, 2022 Share Posted December 9, 2022 Same errors, so it's not a link issue, my main suspect would be a bad expander. Quote Link to comment
crowdx42 Posted December 9, 2022 Author Share Posted December 9, 2022 I bought it off eBay and I have contacted the seller, so we will see if they will replace the card. Maybe I will get lucky, who knows. Quote Link to comment
crowdx42 Posted December 10, 2022 Author Share Posted December 10, 2022 So as a last ditch effort I updated the firmware, it was not on the latest version. So speeds have increased after the firmware updat but I am still questioning the results. At 10 WD 4tb drives I am at 150mb/s but when I add an additional 2 WD 4tb drives speed drop down to 130mb/s . If I add all the drive, so an additional 4 WD 8tb drives speed drop down to 99mb/s. This is way better than before the firmware upgrade but it is still way behind what 2 Dell Perc 310 cards are doing with the same drives. Am I expecting too much from the expander card? With this config I am actually not gaining anything, if I was to hook up all drivs to the expander I would lose a link from the Dell 310 and so speeds would also drop. Thoughts?? Quote Link to comment
JorgeB Posted December 11, 2022 Share Posted December 11, 2022 14 hours ago, crowdx42 said: So speeds have increased after the firmware updat That's good but are the errors gone? Quote Link to comment
crowdx42 Posted December 11, 2022 Author Share Posted December 11, 2022 I believe so, I have attached diagnostic file below. tower-diagnostics-20221211-0734.zip Quote Link to comment
JorgeB Posted December 12, 2022 Share Posted December 12, 2022 Log is clear so far, as for for speeds, you should get around 3GB/s total if using dual link and correct link speed/width for the HBA, post output of: cat /sys/class/sas_host/host5/device/port-5\:0/sas_port/port-5\:0/num_phys and lspci -d 1000: -vv Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 I am getting no such directorty for the first command: root@Tower:~# cat /sys/class/sas_host/host5/device/port-5\:0/sas_port/port-5\:0/num_phys cat: '/sys/class/sas_host/host5/device/port-5:0/sas_port/port-5:0/num_phys': No such file or directory Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 Below are the results of the second command: root@Tower:~# lspci -d 1000: -vv 01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) Subsystem: Dell 6Gbps SAS HBA Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 IOMMU group: 1 Region 0: I/O ports at e000 Region 1: Memory at f7140000 (64-bit, non-prefetchable) Region 3: Memory at f7100000 (64-bit, non-prefetchable) Expansion ROM at f7000000 [disabled] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend+ LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [d0] Vital Product Data pcilib: sysfs_read_vpd: read failed: No such device Not readable Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] MSI-X: Enable+ Count=15 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f800 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 04000001 00000003 01010000 5f710ee2 Capabilities: [138 v1] Power Budgeting <?> Kernel driver in use: mpt3sas Kernel modules: mpt3sas Quote Link to comment
JorgeB Posted December 12, 2022 Share Posted December 12, 2022 1 minute ago, crowdx42 said: LnkSta: Speed 5GT/s, Width x4 (downgraded) Problem is here. Other command won't work if the host# changed, it can after a reboot, but likely not needed now anyway. Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 Does this mean the motherboard is downgrading the PCIe connection with the 310 or the expander? Quote Link to comment
JorgeB Posted December 12, 2022 Share Posted December 12, 2022 The HBA, checking the board you are using I see that only the top x16 slot is x16 electrically, other two are x4 only, so use the top one for the HBA if available. Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 I already have the 310 installed in the x16 slot. This is a test setup and I have another board and cpu I can throw into this server to test with. I will do that and report back my findings. Otherwise, if it is not a board issue, why would the 310 get downgraded to x4 instead of it's max x16? Quote Link to comment
JorgeB Posted December 12, 2022 Share Posted December 12, 2022 If it's in the top slot it could be a board/CPU or HBA problem, try the other one. Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 Well I pulled the SAS expander and powered it with a molex, that had no effect on the x4 downgrade. I also moved the 310 to a different slot and got the same results, Could the cpu be causing the downgrade? It is only a dual core i3 that is in the test machine. Quote Link to comment
JorgeB Posted December 12, 2022 Share Posted December 12, 2022 1 hour ago, crowdx42 said: Well I pulled the SAS expander and powered it with a molex, that had no effect on the x4 downgrade. It not expander related. 1 hour ago, crowdx42 said: I also moved the 310 to a different slot and got the same results That would be expected since the other slots are x4, it doesn't help to drill down the issue to the board/CPU or HBA. Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 Well as luck would have it, I have a second 310 which I just swapped in. It shows: LnkSta: Speed 5GT/s, Width x8 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- So I guess it is the 310 which has an issue. I wonder what could be causing it's connection to downgrade? The main server has two 310s also, as my original goal was to have the test server with a similar setup to the main server. Quote Link to comment
crowdx42 Posted December 12, 2022 Author Share Posted December 12, 2022 So I just ran the same command on the main server and one of the cards on that server also is downgraded to x4. I never noticed this, due to having two cards in that server made it a non-issue. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.