Drive Errors


Kube

Recommended Posts

Hey guys, been just having constant drive issues with some new gear I set up.

 

I add an LSI HBA Card - Genuine LSI 6Gbps SAS HBA LSI 9211-8i- (https://www.ebay.com/itm/Genuine-LSI-6Gbps-SAS-HBA-LSI-9211-8i-9201-8i-P20-IT-Mode-ZFS-FreeNAS-unRAID/163846248833?ssPageName=STRK%3AMEBIDX%3AIT&_trksid=p2057872.m2749.l2649)

 

 A Mini SAS 36P SFF-8087 to 4 SFF-8482 Connectors With SATA Power Cable 3FT 1M - US

 

I have tried multiple power cables, reseating, complete replacement, Swapping positions.

 

5 - 4TB drives that I took from the DataCenter I work at. Drives were known good and pulled from a working environment. Came out of a Cisco C-Series 240 Rackmount

 

To the issue.

 

SMART Drive will not show any results for any of the new drives.

Drives will work fine for days, then after a few reboots, one fails, or starts tossing errors.

 

This seems to be the error that always comes back.

 

:10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28951400 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3618925, async page read Apr 11 21:10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28952336 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3619042, async page read Apr 11 21:10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28953272 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3619159, async page read

 

Right now I am running benhex preclear on them and its lighting up the logs with that message.

 

All cables have been reseated, the only cable I have not replaced yet is the SAS Break out cable. Issues only effect the new SAS drives.

 

My existing drives are SATA.

 

Thanks

tower-diagnostics-20200411-2121.zip

Link to comment

I see this for the SMART report. They show as passed and that it is available. But when I review the report theres not much there, with an error at the end.

 

You point about firmware. I know the Cisco Firmware update process included drive firmware, not sure if was custom or not. Any way to flash it back to a stock firmware ?

 

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST4000NM0023
Revision:             0004
Compliance:           SPC-4
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500575f10d3
Serial number:        Z1Z21G7D00009409AY00
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Apr 12 06:56:10 2020 CDT
device becoming ready (wait)
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Link to comment

Sdf which appears the most problematic is using a different firmware, hence why I suggest disconnecting that one first.

 

27 minutes ago, Kube said:

Any way to flash it back to a stock firmware ?

Sorry no, you'd need to search for a stock Seagate firmware for that model that may or not be flashable.

Link to comment

Yes, in the previous diags the problem started with sdf (device 7:0:4:0) but ended up causing issues with other disks, in this example devices 7:0:1:0 and 7:0:4:0:

 

Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 Sense Key : 0x2 [current] [descriptor]
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 ASC=0x4 ASCQ=0x11
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Apr 11 17:03:34 Tower kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 Sense Key : 0x2 [current] [descriptor]
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 ASC=0x4 ASCQ=0x11
Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Apr 11 17:03:34 Tower kernel: print_req_error: I/O error, dev sdf, sector 7814036992
Apr 11 17:03:34 Tower kernel: Buffer I/O error on dev sdf, logical block 976754624, async page read
Apr 11 17:03:46 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a)
Apr 11 17:03:47 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a)
Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Apr 11 17:03:50 Tower kernel: mpt2sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00
Apr 11 17:03:50 Tower kernel: mpt2sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d)
Apr 11 17:03:51 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d)
Apr 11 17:03:53 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred
Apr 11 17:03:57 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a)
Apr 11 17:03:58 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a)
Apr 11 17:04:16 Tower kernel: sd 7:0:3:0: device_block, handle(0x000c)
Apr 11 17:04:16 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d)
Apr 11 17:04:18 Tower kernel: sd 7:0:3:0: device_unblock and setting to running, handle(0x000c)
Apr 11 17:04:18 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d)
Apr 11 17:04:18 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred
Apr 11 17:04:20 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a)
Apr 11 17:04:21 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a)
Apr 11 17:04:28 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d)
Apr 11 17:04:29 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d)
Apr 11 17:04:30 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred
Apr 11 17:04:31 Tower kernel: sd 7:0:3:0: device_block, handle(0x000c)
Apr 11 17:04:32 Tower kernel: sd 7:0:3:0: device_unblock and setting to running, handle(0x000c)
Apr 11 17:04:33 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a)
Apr 11 17:04:35 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a)

 

Link to comment

So I started a pre-clear to just see what it would do....drive errors seem to have returned...not sure if these are normal or not.

 

Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Write Protect is off
Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Mode Sense: db 00 10 08
Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Write cache: disabled, read cache: enabled, supports DPO and FUA
Apr 12 07:32:56 Tower kernel: sde: sde1
Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Attached SCSI disk
Apr 12 07:33:11 Tower emhttpd: ST4000NM0023_Z1Z23R250000C410736M_35000c50057634463 (sde) 512 7814037168
Apr 12 08:00:23 Tower move: move: file /mnt/cache/appdata/DiskSpeed/Instances/local/hdparm_sde.txt
Apr 12 08:55:45 Tower preclear_disk_Z1Z23R250000C410736M[23921]: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 3 --frequency 1 --cycles 1 --no-prompt /dev/sde
Apr 12 08:55:47 Tower preclear_disk_Z1Z23R250000C410736M[23921]: Pre-Read: dd if=/dev/sde of=/dev/null bs=2097152 skip=0 count=4000787030016 conv=notrunc,noerror iflag=nocache,count_bytes,skip_bytes
Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 a3 58 00 00 00 a8 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385131352
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 94 00 00 00 06 00 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385127424
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 9a 00 00 00 04 18 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385128960
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 9e 18 00 00 05 40 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385130008
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 94 00 00 00 00 08 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385127424
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48140928, async page read
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 CDB: opcode=0x88 88 00 00 00 00 00 29 6b 36 d1 00 00 00 01 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 694892241
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 CDB: opcode=0x88 88 00 00 00 00 00 86 7e dc ba 00 00 00 01 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 2256460986
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 Sense Key : 0x2 [current] [descriptor]
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 ASC=0x4 ASCQ=0x11
Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be af 00 00 00 01 00 00
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 7814037167
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 1767169515
Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 0
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48140928, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48141312, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48141696, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142080, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142464, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142848, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48143232, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48143616, async page read
Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48144000, async page read

tower-diagnostics-20200412-0918.zip

Link to comment

So pre-clear all finished all flagged as sucessful....on the second read, there was none of the errors, kinda found that odd....Gonna start the process again and see what happens.

 

############################################################################################################################
#                                                                                                                          #
#                                     unRAID Server Preclear of disk 5000c500575f1303                                      #
#                                       Cycle 1 of 1, partition start on sector 64.                                        #
#                                                                                                                          #
#                                                                                                                          #
#   Step 1 of 5 - Pre-read verification:                                                  [10:26:42 @ 106 MB/s] SUCCESS    #
#   Step 2 of 5 - Zeroing the disk:                                                        [7:38:09 @ 145 MB/s] SUCCESS    #
#   Step 3 of 5 - Writing unRAID's Preclear signature:                                                          SUCCESS    #
#   Step 4 of 5 - Verifying unRAID's Preclear signature:                                                        SUCCESS    #
#   Step 5 of 5 - Post-Read verification:                                                  [7:43:37 @ 143 MB/s] SUCCESS    #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
#                                                                                                                          #
############################################################################################################################
#                              Cycle elapsed time: 25:48:32 | Total elapsed time: 25:48:32                                 #
############################################################################################################################

--> RESULT: Preclear Finished Successfully!.

 

Link to comment

Heres the drive logs...about half way through the errors stopped and never came back.

 

Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061987416
Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748427, async page read
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1709 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1709 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 d2 98 00 00 00 08 00 00
Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061989016
Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748627, async page read
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1713 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1713 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 d8 d8 00 00 00 08 00 00
Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061990616
Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748827, async page read
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1718 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00
Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1718 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 df 18 00 00 00 08 00 00
Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061992216
Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632749027, async page read
Apr 12 19:19:07 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Zeroing: dd if=/dev/zero of=/dev/sdc bs=2097152 seek=2097152 count=4000784932864 conv=notrunc iflag=count_bytes,nocache,fullblock oflag=seek_bytes
Apr 13 02:55:11 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: cmp /tmp/.preclear/sdc/fifo /dev/zero
Apr 13 02:55:11 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: dd if=/dev/sdc of=/tmp/.preclear/sdc/fifo count=2096640 skip=512 conv=notrunc iflag=nocache,count_bytes,skip_bytes
Apr 13 02:55:13 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: cmp /tmp/.preclear/sdc/fifo /dev/zero
Apr 13 02:55:13 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: dd if=/dev/sdc of=/tmp/.preclear/sdc/fifo bs=2097152 skip=2097152 count=4000784932864 conv=notrunc iflag=nocache,count_bytes,skip_bytes

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.