Strange problems with newly added drive


ssb201

Recommended Posts

So I just added a new drive to my array and I am getting weird errors. I pre-cleared the drive (admittedly without pre or post read since this was a drive I had used previously) and it ran overnight successfully. I added the drive to my array and it formatted and joined seemingly normal. But if I review the syslog I see:

 

Feb 16 17:35:01 Tower kernel: sd 8:0:6:0: attempting task abort! scmd(00000000f8e68c0b)
Feb 16 17:35:01 Tower kernel: sd 8:0:6:0: [sdj] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 61 c1 b0 00 00 00 08 00 00
Feb 16 17:35:01 Tower kernel: scsi target8:0:6: handle(0x0010), sas_address(0x50015b20780b9435), phy(21)
Feb 16 17:35:01 Tower kernel: scsi target8:0:6: enclosure logical id(0x50015b20780b943f), slot(3) 
Feb 16 17:35:02 Tower kernel: sd 8:0:6:0: task abort: SUCCESS scmd(00000000f8e68c0b)
Feb 16 17:35:03 Tower kernel: sd 8:0:6:0: Power-on or device reset occurred

 

If I review the individual disk log I see repeated errors.

 

But if I review the SMART data the drive comes back all green. The only thing that looks a bit off in the report is the number of read recovery attempts.

HGST_HUH728080ALE600_2EGM5T4X-20190216-1739.txt

Link to comment

The disk is connected via the 12 port back plane that all the drives are connected to. The controller is a Dell Perc H200 flashed with IT firmware. The UPS is not plugged in at the moment. I had to disconnect it to reset the memory when replacing batteries and have not plugged it back in yet.

 

I see a million read errors in dmesg, but not a one in the SMART(posted above - not sure why it shows on the direct page but not in diagnostics). No write errors at all. Only difference I can think of between this and the other disks is this is an AF 4Kn drive.

 

I could try swapping this to a different bay in the server, I doubt it will make a difference.

Edited by ssb201
Link to comment

After updating the firmware I no longer had any problems at start, but my read speeds on the array plummeted to almost nothing. I tried removing the disk (since nothing of interest had been written yet) and rerunning preclear on it. As soon as the pre-read started it was spitting out the same read errors. Still no issues showing in SMART and I am baffled why it is only on reads, never writes.

 

 

Link to comment
  • 2 weeks later...

New update: The drive shows as Not Installed (missing) from the array. For some reason it is still receiving writes to shares as when I extracted files to a share they ended up being written to the array. I assume this is due to some weirdness with the union file system. What I do not get is why I was able to go directly to /mnt/disk5 and read and write files without any errors or problems. 

 

 

Link to comment

Yeah, I understand that there will be additional work for the drives until I replace it. 

 

I took the drive out and used it with a USB-SATA controller on Windows and it worked just fine. That leads me to suspect a controller problem, despite the firmware update. I am just puzzled, because I have other 512e drives working just fine with the controller and the drive is explicitly listed in the controller support document: https://docs.broadcom.com/docs/IT-SAS-Gen2.5CompatibilityList

 

The two Hitachi drives that are working with the controller are:

HDN728080ALE604 - DeskStar - 512e SATA 6Gb/s - Secure Erase (overwrite only)

HUH728080AL4200 - UltraStar - 4kn SAS 12Gb/s - Instant Secure Erase

 

This one is not:

HUH728080ALE600 - UltraStar - 512e SATA 6Gb/s - Instant Secure Erase

 

The DeskStar has the exact same interface (512e SATA 6b/s) and the UltraStar uses an even more advanced interface and supports the same Instant Secure Erase. The only other idea I could come up with is that it has to do with the backplane expander, since this is a 12 port system and this is the first drive pushing it past the half way count (drive number 7).

 

Any ideas what I could try next? I am wracking my brains on this.

Link to comment

I hooked it up to the on-board SATA controller and saw ATA errors.

[  369.829354] sd 4:0:0:0: [sdc] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06
[  369.829360] sd 4:0:0:0: [sdc] tag#26 CDB: opcode=0x88 88 00 00 00 00 00 00 00 64 00 00 00 06 00 00 00
[  369.829362] print_req_error: I/O error, dev sdc, sector 25600

 

It still seems to work just fine on my Windows machine using a USB-SATA controller. I have given up trying to figure this puzzle out. I am ordering a new drive and will just use the problem child somewhere else.

 

Thanks everyone for the ideas.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.