Kube Posted April 12, 2020 Share Posted April 12, 2020 Hey guys, been just having constant drive issues with some new gear I set up. I add an LSI HBA Card - Genuine LSI 6Gbps SAS HBA LSI 9211-8i- (https://www.ebay.com/itm/Genuine-LSI-6Gbps-SAS-HBA-LSI-9211-8i-9201-8i-P20-IT-Mode-ZFS-FreeNAS-unRAID/163846248833?ssPageName=STRK%3AMEBIDX%3AIT&_trksid=p2057872.m2749.l2649) A Mini SAS 36P SFF-8087 to 4 SFF-8482 Connectors With SATA Power Cable 3FT 1M - US I have tried multiple power cables, reseating, complete replacement, Swapping positions. 5 - 4TB drives that I took from the DataCenter I work at. Drives were known good and pulled from a working environment. Came out of a Cisco C-Series 240 Rackmount To the issue. SMART Drive will not show any results for any of the new drives. Drives will work fine for days, then after a few reboots, one fails, or starts tossing errors. This seems to be the error that always comes back. :10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28951400 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3618925, async page read Apr 11 21:10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28952336 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3619042, async page read Apr 11 21:10:24 Tower kernel: print_req_error: I/O error, dev sdc, sector 28953272 Apr 11 21:10:24 Tower kernel: Buffer I/O error on dev sdc, logical block 3619159, async page read Right now I am running benhex preclear on them and its lighting up the logs with that message. All cables have been reseated, the only cable I have not replaced yet is the SAS Break out cable. Issues only effect the new SAS drives. My existing drives are SATA. Thanks tower-diagnostics-20200411-2121.zip Quote Link to comment
JorgeB Posted April 12, 2020 Share Posted April 12, 2020 There are issues from boot, especially with one of them, try disconnecting sdf (Serial Z1Z8S9J40000C5451WXZ) and post new diags after a boot. These disks can sometimes come with a custom firmware and have issues working outside the enclosure they came with. Quote Link to comment
Kube Posted April 12, 2020 Author Share Posted April 12, 2020 I see this for the SMART report. They show as passed and that it is available. But when I review the report theres not much there, with an error at the end. You point about firmware. I know the Cisco Firmware update process included drive firmware, not sure if was custom or not. Any way to flash it back to a stock firmware ? smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.107-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST4000NM0023 Revision: 0004 Compliance: SPC-4 LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c500575f10d3 Serial number: Z1Z21G7D00009409AY00 Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Sun Apr 12 06:56:10 2020 CDT device becoming ready (wait) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Quote Link to comment
JorgeB Posted April 12, 2020 Share Posted April 12, 2020 Sdf which appears the most problematic is using a different firmware, hence why I suggest disconnecting that one first. 27 minutes ago, Kube said: Any way to flash it back to a stock firmware ? Sorry no, you'd need to search for a stock Seagate firmware for that model that may or not be flashable. Quote Link to comment
Kube Posted April 12, 2020 Author Share Posted April 12, 2020 7 minutes ago, johnnie.black said: Sdf which appears the most problematic is using a different firmware, hence why I suggest disconnecting that one first. Drive has been removed and rebooted. Hows it look now? tower-diagnostics-20200412-0733.zip Quote Link to comment
JorgeB Posted April 12, 2020 Share Posted April 12, 2020 All disk related errors are gone at least for now, even for the other disks, I would suggest running it like that for a couple of days. Quote Link to comment
Kube Posted April 12, 2020 Author Share Posted April 12, 2020 Would it be in the realm of plausible that one drive could effect the controller and as such effect the other drives attached ? Quote Link to comment
JorgeB Posted April 12, 2020 Share Posted April 12, 2020 Yes, in the previous diags the problem started with sdf (device 7:0:4:0) but ended up causing issues with other disks, in this example devices 7:0:1:0 and 7:0:4:0: Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 Sense Key : 0x2 [current] [descriptor] Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 ASC=0x4 ASCQ=0x11 Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 Apr 11 17:03:34 Tower kernel: print_req_error: I/O error, dev sdf, sector 7814036992 Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 Sense Key : 0x2 [current] [descriptor] Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 ASC=0x4 ASCQ=0x11 Apr 11 17:03:34 Tower kernel: sd 7:0:4:0: [sdf] tag#935 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 Apr 11 17:03:34 Tower kernel: print_req_error: I/O error, dev sdf, sector 7814036992 Apr 11 17:03:34 Tower kernel: Buffer I/O error on dev sdf, logical block 976754624, async page read Apr 11 17:03:46 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a) Apr 11 17:03:47 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a) Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Apr 11 17:03:50 Tower kernel: mpt2sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100) Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: [sdf] tag#1764 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Apr 11 17:03:50 Tower kernel: mpt2sas_cm0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100) ### [PREVIOUS LINE REPEATED 2 TIMES] ### Apr 11 17:03:50 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d) Apr 11 17:03:51 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d) Apr 11 17:03:53 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred Apr 11 17:03:57 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a) Apr 11 17:03:58 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a) Apr 11 17:04:16 Tower kernel: sd 7:0:3:0: device_block, handle(0x000c) Apr 11 17:04:16 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d) Apr 11 17:04:18 Tower kernel: sd 7:0:3:0: device_unblock and setting to running, handle(0x000c) Apr 11 17:04:18 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d) Apr 11 17:04:18 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred Apr 11 17:04:20 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a) Apr 11 17:04:21 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a) Apr 11 17:04:28 Tower kernel: sd 7:0:4:0: device_block, handle(0x000d) Apr 11 17:04:29 Tower kernel: sd 7:0:4:0: device_unblock and setting to running, handle(0x000d) Apr 11 17:04:30 Tower kernel: sd 7:0:4:0: Power-on or device reset occurred Apr 11 17:04:31 Tower kernel: sd 7:0:3:0: device_block, handle(0x000c) Apr 11 17:04:32 Tower kernel: sd 7:0:3:0: device_unblock and setting to running, handle(0x000c) Apr 11 17:04:33 Tower kernel: sd 7:0:1:0: device_block, handle(0x000a) Apr 11 17:04:35 Tower kernel: sd 7:0:1:0: device_unblock and setting to running, handle(0x000a) Quote Link to comment
Kube Posted April 12, 2020 Author Share Posted April 12, 2020 So I started a pre-clear to just see what it would do....drive errors seem to have returned...not sure if these are normal or not. Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Write Protect is off Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Mode Sense: db 00 10 08 Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Write cache: disabled, read cache: enabled, supports DPO and FUA Apr 12 07:32:56 Tower kernel: sde: sde1 Apr 12 07:32:56 Tower kernel: sd 1:0:3:0: [sde] Attached SCSI disk Apr 12 07:33:11 Tower emhttpd: ST4000NM0023_Z1Z23R250000C410736M_35000c50057634463 (sde) 512 7814037168 Apr 12 08:00:23 Tower move: move: file /mnt/cache/appdata/DiskSpeed/Instances/local/hdparm_sde.txt Apr 12 08:55:45 Tower preclear_disk_Z1Z23R250000C410736M[23921]: Command: /usr/local/emhttp/plugins/preclear.disk/script/preclear_disk.sh --notify 3 --frequency 1 --cycles 1 --no-prompt /dev/sde Apr 12 08:55:47 Tower preclear_disk_Z1Z23R250000C410736M[23921]: Pre-Read: dd if=/dev/sde of=/dev/null bs=2097152 skip=0 count=4000787030016 conv=notrunc,noerror iflag=nocache,count_bytes,skip_bytes Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 12 09:14:11 Tower kernel: sd 1:0:3:0: [sde] tag#2937 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2888 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 a3 58 00 00 00 a8 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385131352 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2889 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 94 00 00 00 06 00 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385127424 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2890 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 9a 00 00 00 04 18 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385128960 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2891 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 9e 18 00 00 05 40 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385130008 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2886 CDB: opcode=0x88 88 00 00 00 00 00 16 f4 94 00 00 00 00 08 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 385127424 Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48140928, async page read Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2944 CDB: opcode=0x88 88 00 00 00 00 00 29 6b 36 d1 00 00 00 01 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 694892241 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2945 CDB: opcode=0x88 88 00 00 00 00 00 86 7e dc ba 00 00 00 01 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 2256460986 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 Sense Key : 0x2 [current] [descriptor] Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 ASC=0x4 ASCQ=0x11 Apr 12 09:14:13 Tower kernel: sd 1:0:3:0: [sde] tag#2946 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be af 00 00 00 01 00 00 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 7814037167 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 1767169515 Apr 12 09:14:13 Tower kernel: print_req_error: I/O error, dev sde, sector 0 Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48140928, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48141312, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48141696, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142080, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142464, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48142848, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48143232, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48143616, async page read Apr 12 09:14:13 Tower kernel: Buffer I/O error on dev sde, logical block 48144000, async page read tower-diagnostics-20200412-0918.zip Quote Link to comment
JorgeB Posted April 13, 2020 Share Posted April 13, 2020 No, not normal, look for a stock firmware from Seagate, other than that don't have other ideas. Quote Link to comment
Kube Posted April 13, 2020 Author Share Posted April 13, 2020 So pre-clear all finished all flagged as sucessful....on the second read, there was none of the errors, kinda found that odd....Gonna start the process again and see what happens. ############################################################################################################################ # # # unRAID Server Preclear of disk 5000c500575f1303 # # Cycle 1 of 1, partition start on sector 64. # # # # # # Step 1 of 5 - Pre-read verification: [10:26:42 @ 106 MB/s] SUCCESS # # Step 2 of 5 - Zeroing the disk: [7:38:09 @ 145 MB/s] SUCCESS # # Step 3 of 5 - Writing unRAID's Preclear signature: SUCCESS # # Step 4 of 5 - Verifying unRAID's Preclear signature: SUCCESS # # Step 5 of 5 - Post-Read verification: [7:43:37 @ 143 MB/s] SUCCESS # # # # # # # # # # # # # # # ############################################################################################################################ # Cycle elapsed time: 25:48:32 | Total elapsed time: 25:48:32 # ############################################################################################################################ --> RESULT: Preclear Finished Successfully!. Quote Link to comment
Kube Posted April 13, 2020 Author Share Posted April 13, 2020 Heres the drive logs...about half way through the errors stopped and never came back. Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061987416 Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748427, async page read Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1709 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1709 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 d2 98 00 00 00 08 00 00 Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061989016 Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748627, async page read Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1713 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1713 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 d8 d8 00 00 00 08 00 00 Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061990616 Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632748827, async page read Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1718 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=0x00 Apr 12 15:30:07 Tower kernel: sd 1:0:1:0: [sdc] tag#1718 CDB: opcode=0x88 88 00 00 00 00 01 2d b7 df 18 00 00 00 08 00 00 Apr 12 15:30:07 Tower kernel: print_req_error: I/O error, dev sdc, sector 5061992216 Apr 12 15:30:07 Tower kernel: Buffer I/O error on dev sdc, logical block 632749027, async page read Apr 12 19:19:07 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Zeroing: dd if=/dev/zero of=/dev/sdc bs=2097152 seek=2097152 count=4000784932864 conv=notrunc iflag=count_bytes,nocache,fullblock oflag=seek_bytes Apr 13 02:55:11 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: cmp /tmp/.preclear/sdc/fifo /dev/zero Apr 13 02:55:11 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: dd if=/dev/sdc of=/tmp/.preclear/sdc/fifo count=2096640 skip=512 conv=notrunc iflag=nocache,count_bytes,skip_bytes Apr 13 02:55:13 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: cmp /tmp/.preclear/sdc/fifo /dev/zero Apr 13 02:55:13 Tower preclear_disk_Z1Z21G7D00009409AY00[21894]: Post-Read: dd if=/dev/sdc of=/tmp/.preclear/sdc/fifo bs=2097152 skip=2097152 count=4000784932864 conv=notrunc iflag=nocache,count_bytes,skip_bytes Quote Link to comment
JorgeB Posted April 14, 2020 Share Posted April 14, 2020 That is strange, but as long as it's working... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.