ssb201 Posted February 17, 2019 Share Posted February 17, 2019 So I just added a new drive to my array and I am getting weird errors. I pre-cleared the drive (admittedly without pre or post read since this was a drive I had used previously) and it ran overnight successfully. I added the drive to my array and it formatted and joined seemingly normal. But if I review the syslog I see: Feb 16 17:35:01 Tower kernel: sd 8:0:6:0: attempting task abort! scmd(00000000f8e68c0b) Feb 16 17:35:01 Tower kernel: sd 8:0:6:0: [sdj] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 61 c1 b0 00 00 00 08 00 00 Feb 16 17:35:01 Tower kernel: scsi target8:0:6: handle(0x0010), sas_address(0x50015b20780b9435), phy(21) Feb 16 17:35:01 Tower kernel: scsi target8:0:6: enclosure logical id(0x50015b20780b943f), slot(3) Feb 16 17:35:02 Tower kernel: sd 8:0:6:0: task abort: SUCCESS scmd(00000000f8e68c0b) Feb 16 17:35:03 Tower kernel: sd 8:0:6:0: Power-on or device reset occurred If I review the individual disk log I see repeated errors. But if I review the SMART data the drive comes back all green. The only thing that looks a bit off in the report is the number of read recovery attempts. HGST_HUH728080ALE600_2EGM5T4X-20190216-1739.txt Quote Link to comment
trurl Posted February 17, 2019 Share Posted February 17, 2019 syslog snippets are seldom sufficient, post complete diagnostics Quote Link to comment
ssb201 Posted February 17, 2019 Author Share Posted February 17, 2019 Diagnostics attached tower-diagnostics-20190216-1915.zip Quote Link to comment
trurl Posted February 17, 2019 Share Posted February 17, 2019 The disk isn't producing a SMART report in those diagnostics. How is it connected? Also, a lot of UPS disconnections happening in your syslog. Any idea why? Quote Link to comment
ssb201 Posted February 18, 2019 Author Share Posted February 18, 2019 (edited) The disk is connected via the 12 port back plane that all the drives are connected to. The controller is a Dell Perc H200 flashed with IT firmware. The UPS is not plugged in at the moment. I had to disconnect it to reset the memory when replacing batteries and have not plugged it back in yet. I see a million read errors in dmesg, but not a one in the SMART(posted above - not sure why it shows on the direct page but not in diagnostics). No write errors at all. Only difference I can think of between this and the other disks is this is an AF 4Kn drive. I could try swapping this to a different bay in the server, I doubt it will make a difference. Edited February 18, 2019 by ssb201 Quote Link to comment
JorgeB Posted February 18, 2019 Share Posted February 18, 2019 8 hours ago, ssb201 said: this is an AF 4Kn drive. That could be the problem, since you're using an ancient LSI firmware: FWVersion(07.15.08.00) Update to latest 20.00.07.00 Quote Link to comment
ssb201 Posted February 19, 2019 Author Share Posted February 19, 2019 After updating the firmware I no longer had any problems at start, but my read speeds on the array plummeted to almost nothing. I tried removing the disk (since nothing of interest had been written yet) and rerunning preclear on it. As soon as the pre-read started it was spitting out the same read errors. Still no issues showing in SMART and I am baffled why it is only on reads, never writes. Quote Link to comment
ssb201 Posted February 28, 2019 Author Share Posted February 28, 2019 New update: The drive shows as Not Installed (missing) from the array. For some reason it is still receiving writes to shares as when I extracted files to a share they ended up being written to the array. I assume this is due to some weirdness with the union file system. What I do not get is why I was able to go directly to /mnt/disk5 and read and write files without any errors or problems. Quote Link to comment
JorgeB Posted February 28, 2019 Share Posted February 28, 2019 Unraid emulates any missing disk, that's what parity is for. Quote Link to comment
ssb201 Posted February 28, 2019 Author Share Posted February 28, 2019 Doh. That makes perfect sense. I am still not clear why I am getting read errors without any SMART issues at all. I will have to pull the drive and try it in some other systems. Quote Link to comment
trurl Posted February 28, 2019 Share Posted February 28, 2019 Since you have dual parity and one missing disk, you still have parity protection, but only for one more disk. And any read/write to that missing disk will make all the other disks work harder since they all have to be read to emulate the disk. Quote Link to comment
ssb201 Posted February 28, 2019 Author Share Posted February 28, 2019 Yeah, I understand that there will be additional work for the drives until I replace it. I took the drive out and used it with a USB-SATA controller on Windows and it worked just fine. That leads me to suspect a controller problem, despite the firmware update. I am just puzzled, because I have other 512e drives working just fine with the controller and the drive is explicitly listed in the controller support document: https://docs.broadcom.com/docs/IT-SAS-Gen2.5CompatibilityList The two Hitachi drives that are working with the controller are: HDN728080ALE604 - DeskStar - 512e SATA 6Gb/s - Secure Erase (overwrite only) HUH728080AL4200 - UltraStar - 4kn SAS 12Gb/s - Instant Secure Erase This one is not: HUH728080ALE600 - UltraStar - 512e SATA 6Gb/s - Instant Secure Erase The DeskStar has the exact same interface (512e SATA 6b/s) and the UltraStar uses an even more advanced interface and supports the same Instant Secure Erase. The only other idea I could come up with is that it has to do with the backplane expander, since this is a 12 port system and this is the first drive pushing it past the half way count (drive number 7). Any ideas what I could try next? I am wracking my brains on this. Quote Link to comment
JorgeB Posted February 28, 2019 Share Posted February 28, 2019 Try connecting that disk to an onboard SATA port, if if just to test and see if it makes a difference. Quote Link to comment
ssb201 Posted March 5, 2019 Author Share Posted March 5, 2019 I hooked it up to the on-board SATA controller and saw ATA errors. [ 369.829354] sd 4:0:0:0: [sdc] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x06 [ 369.829360] sd 4:0:0:0: [sdc] tag#26 CDB: opcode=0x88 88 00 00 00 00 00 00 00 64 00 00 00 06 00 00 00 [ 369.829362] print_req_error: I/O error, dev sdc, sector 25600 It still seems to work just fine on my Windows machine using a USB-SATA controller. I have given up trying to figure this puzzle out. I am ordering a new drive and will just use the problem child somewhere else. Thanks everyone for the ideas. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.