Jump to content

Help w/verifying bad SAS drive


pendo

Recommended Posts

Hello all,

 

I think I got a bad drive on my recent eBay purchase, but as this is my first ever use of an SAS drive, I wanted to double check here for anything I might have missed.  I acquired an ASRock board with a built in lsi 2308 controller.  I had read that I did not need to flash to IT mode on this board, but I did end up doing that as part of my troubleshooting.  Now at 20.0.4.0 IT. 

 

Here's what I'm seeing in syslog:

 

Sep  8 11:36:35 tower kernel: mpt2sas_cm0: hba_port entry: 00000000b5bc9d14, port: 255 is added to hba_port list
Sep  8 11:36:35 tower kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500138f000007ca0), phys(8)
Sep  8 11:36:35 tower kernel: mpt2sas_cm0: handle(0x9) sas_address(0x5000cca26aa0cc69) port_type(0x1)
Sep  8 11:36:35 tower kernel: mpt2sas_cm0: port enable: SUCCESS
Sep  8 11:36:35 tower kernel: scsi 1:0:0:0: Direct-Access     HGST     HUH721010AL42C0  A38K PQ: 0 ANSI: 6
Sep  8 11:36:35 tower kernel: scsi 1:0:0:0: SSP: handle(0x0009), sas_addr(0x5000cca26aa0cc69), phy(4), device_name(0x5000cca26aa0cc6b)
Sep  8 11:36:35 tower kernel: scsi 1:0:0:0: enclosure logical id (0x500138f000007ca0), slot(4) 
Sep  8 11:36:35 tower kernel: scsi 1:0:0:0: qdepth(254), tagged(1), scsi_level(7), cmd_que(1)
Sep  8 11:36:35 tower kernel: scsi 1:0:0:0: Power-on or device reset occurred
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: Attached scsi generic sg6 type 0
Sep  8 11:36:35 tower kernel: end_device-1:0: add: handle(0x0009), sas_addr(0x5000cca26aa0cc69)
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Unit Not Ready
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] 2441609216 4096-byte logical blocks: (10.0 TB/9.10 TiB)
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Write Protect is off
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Mode Sense: f7 00 10 08
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: attempting task abort!scmd(0x00000000b7a35679), outstanding for 30022 ms & timeout 30000 ms
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: tag#2370 CDB: opcode=0xa3, sa=0xc a3 0c 01 12 00 00 00 00 02 00 00 00
Sep  8 11:36:35 tower kernel: scsi target1:0:0: handle(0x0009), sas_address(0x5000cca26aa0cc69), phy(4)
Sep  8 11:36:35 tower kernel: scsi target1:0:0: enclosure logical id(0x500138f000007ca0), slot(4) 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: task abort: SUCCESS scmd(0x00000000b7a35679)
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Unit Not Ready
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Read Capacity(16) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] 0 4096-byte logical blocks: (0 B/0 B)
Sep  8 11:36:35 tower kernel: sdg: detected capacity change from 19532873728 to 0
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Unit Not Ready
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Read Capacity(16) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Read Capacity(10) failed: Result: hostbyte=0x00 driverbyte=DRIVER_OK
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Sense Key : 0x4 [current] 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] ASC=0x3e ASCQ=0x3 
Sep  8 11:36:35 tower kernel: sd 1:0:0:0: [sdg] Attached SCSI disk

 

Manual attempts to format fail:

 

sg_format --format -v /dev/sg6
    HGST      HUH721010AL42C0   A38K   peripheral_type: disk [0x0]
      PROTECT=1
      << supports protection information>>
      Unit serial number:         2TJVEELD
      LU name: 5000cca26aa0cc68
    mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
Mode Sense (block descriptor) data, prior to changes:
  Number of blocks=2441609216 [0x91880000]
  Block size=4096 [0x1000]

A FORMAT UNIT will commence in 15 seconds
    ALL data on /dev/sg6 will be DESTROYED
        Press control-C to abort

A FORMAT UNIT will commence in 10 seconds
    ALL data on /dev/sg6 will be DESTROYED
        Press control-C to abort

A FORMAT UNIT will commence in 5 seconds
    ALL data on /dev/sg6 will be DESTROYED
        Press control-C to abort
    Format unit cdb: [04 18 00 00 00 00]
Format unit: transport: Host_status=0x03 [DID_TIME_OUT]

Format unit command: Transport error, driver or interconnect error
FORMAT UNIT failed

 

I was originally using a mini-SAS to SAS cable that had SATA power pigtails.  After learning about the power-disable 'feature' the hard way, I received the above errors.  In the process of elimination, I ordered a new mini-SAS to SAS cable that had molex power connections, but I still have the same result.  I believe I have a bad drive, but, being as green as I am I wanted to double check with someone.

 

Thanks!

Link to comment

TypeCodeNameDescription

Host Status[0x4]BAD_TARGETThis status is returned after the driver aborts commands to a bad target. Typically this status occurs when the target experiences a hardware error, but it can also occur if a command is sent to a bad target ID.

Device Status[0x18]RESERVATION CONFLICTThis status is returned when a LUN is in a Reserved status and commands from initiators that did not place that SCSI reservation attempt to issue commands

 

based on sense code looks like a bad disk

Link to comment
7 hours ago, SimonF said:

TypeCodeNameDescription

Host Status[0x4]BAD_TARGETThis status is returned after the driver aborts commands to a bad target. Typically this status occurs when the target experiences a hardware error, but it can also occur if a command is sent to a bad target ID.

Device Status[0x18]RESERVATION CONFLICTThis status is returned when a LUN is in a Reserved status and commands from initiators that did not place that SCSI reservation attempt to issue commands

 

based on sense code looks like a bad disk

Thank you.  This was what I arrived at, too.  I've contacted the seller, who has sold thousands of these drives.  I'm sure it will get sorted in the end.  

 

Sorry for not posting diagnoatics.  I thought extracted the relavent info from the syslog, but I still should have attached them. Unfortunately, I'm not where I can post diagnostics at the moment, but I will be in a few hours.  If there's any other tests I should do when I get to the machine, let me know.

 

Thanks again! 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...