[SOLVED] Degraded performance on only SAS drive in system, correlating to messages in syslog


TD2779

Recommended Posts

[edit]

SOLVED

 

After speaking with the guy I purchased the card from, we identified that the DRIVE has HP firmware and seems to be presenting some incompatibilities.   I was able to connect 4 SATA drives to the card without issue.

[/edit]

 

I added a LSI controller card and a new SAS drive to my server.  All other drives live on built-in SATA controller.  Precleared the drive, and it passed without issue.  From what I could tell, the SMART data looked good as well.  This is an LSI card in IT mode, purchased from artofserver on ebay.  Drive is attached to a cable that starts at the card and breaks off into 4x SAS connectors.  Only using one at the moment.

 

Well I added it to my array to replace a smaller drive I had, and the rebuild went smoothly.  Completed within 10 hours or so. (6TB parity, with 2x6TB drives)

 

The past couple of days I was moving some large files from a machine on my network to AND from my server and noticed the transfers taking longer than I expected.  If I watched the stats, I'd see the transfer plateau at 1Gbps but then periodically drop to almost ZERO, only to pick back up again.   We're talking in the middle of transferring a single file in the 10's of GB range.  To eliminate the other machine or network being an issue I used the command line on the server to copy some large files from drives to themselves.  i.e. /mnt/disk1 to /mnt/disk1  All drives worked as expected EXCEPT the new drive. Same issue.  Watching the stats, I could see the copy periodically hang before continuing. 

 

Never had this prior to adding the new drive.  Installed the DriveSpeed docker to benchmark it, and the speed of the 6TB SAS drive matched pretty closely to the results of the other two 6TB SATA drives in my system. Other things I've tried: Swapped the PCI slot for the controller, used a different SAS connector on the cable, removing non-array drives from system in case it was a power issue,  and used a different power connector.  Unfortunately I don't have another SAS drive or different cable to try out.

 

I started watching the syslog during transfers and started seeing a large number of the same message EXACTLY when the transfer speeds would stop.

Jan 16 18:49:09 CYBERTRON kernel: sd 1:0:0:0: attempting task abort! scmd(00000000093ec5f7) 

Jan 16 18:49:09 CYBERTRON kernel: sd 1:0:0:0: [sdb] tag#222 CDB: opcode=0x88 88 00 00 00 00 01 05 9d 7b 80 00 00 00 40 00 00 

Jan 16 18:49:09 CYBERTRON kernel: scsi target1:0:0: handle(0x0009), sas_address(0x5000cca23c11ae65), phy(2) 

Jan 16 18:49:09 CYBERTRON kernel: scsi target1:0:0: enclosure logical id(0x50050760438173e8), slot(1) 

Jan 16 18:49:09 CYBERTRON kernel: sd 1:0:0:0: task abort: SUCCESS scmd(00000000093ec5f7)

 

My google-fu might be bad, but I don't know enough to understand what these messages are and I've run out of things I can think of to try without spending money.  Help! :)

 

p.s. you can see come good examples near the bottom of the syslog from about 6:15pm to 6:45pm.

cybertron-diagnostics-20210116-1858.zip

Edited by TD2779
Link to comment
13 hours ago, Vr2Io said:

Pls perform parity check without corection to check have speed drop or not.

 

A few hours into the parity check and the same messages hit the log roughly every 3 minutes.  I don't particularly see a hang, but maybe a slight dip in the storage stats as shown in the attach screen snippet.  

 

I'm not sure how I'd test the power, other than having a completely 2nd PSU running this one drive.  Is that even safe to do?

blip.png

Link to comment
6 hours ago, TD2779 said:

A few hours into the parity check and the same messages hit the log roughly every 3 minutes.  I don't particularly see a hang, but maybe a slight dip in the storage stats as shown in the attach screen snippet.  

 

I'm not sure how I'd test the power, other than having a completely 2nd PSU running this one drive.  Is that even safe to do?

 

Slight dip could be ignore. Speed chart so flat and look normal, so file transfer dip should not relate controller/disk problem.

 

Pls enable disk write cache, it will affect write performance.

Read Cache is:        Enabled
Writeback Cache is:   Disabled

 

I haven't idea for the error message and haven't SAS disk, from its SMART data, some info. seems odd, it say  error count 6000 and scan is "active". The error may be indicate disk in busy state.

Non-medium error count:     6000

No Self-tests have been logged

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 286:49 [17209 minutes]
    Number of background scans performed: 9,  scan progress: 29.70%
    Number of background medium scans performed: 9

 

 

BTW, you can connect SATA disk to HBA to verify have any abnormal.

Edited by Vr2Io
Link to comment
  • JorgeB changed the title to [SOLVED] Degraded performance on only SAS drive in system, correlating to messages in syslog

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.