DocScot

Members
  • Posts

    1
  • Joined

  • Last visited

Everything posted by DocScot

  1. I recently purchased an 8TB HGST SAS drive off eBay, I already have 4 of the exact same type installed (HUH728080AL) so I expected everything to be fine. My hardware set up is a NSFW 1.0 build with a Gigabyte GA-7PESH2, so my SFF8482 breakout cable was fully occupied. I have some 8TB SATA drives in there too, so I just got an adaptor from SFF8482 to SFF8087 and hooked it up. More out of habit than necessity I ran preclear on the new drive, which at first stalled out and crashed during the pre-read phase. I figured the cable might have some problem (the added space with the adaptor brought the SFF8087 close up to the case) so I switched cable & adapter with a different drive (that is fastened somewhere else and won't have that problem). A second run of preclear looked a bit better (pre-read worked fine), but has been horribly slow during the zeroing phase (pauses often, when it runs it does so at 1MB/second). Interestingly, this doesn't seem to be anything that others have reported here, but I found a post on reddit that led made me follow up in a similar manner. Unfortunately, however, while the OP posted great method of diagnosing the problem they never posted a resolution to fix it. I looked at the new HGST, and confirmed that the settings were identical on `sdparm`, and then followed the same line of inquiry as u/fmillion on reddit: Unfortunately, `fio` (installed through Nerdtools) doesn't work and throws a segfault (just says "Illegal exception"), which apparently isn't entirely unheard of and could be fixed (see here: https://github.com/dmacias72/unRAID-NerdPack/issues/14 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898473#5, but that's a story for another time. Thankfully `sg3_utils` works and lets me reproduce and confirm that it does crash when only using `sg_write_buffer`, but not `sg_write_verify`, as posted by u/fmillion on reddit. Frustratingly, `sg_write_buffer` will allow you to write a new firmware (or as it is called in this situation: microcode; in the `sg3_utils` docs http://sg.danny.cz/sg/sg3_utils.html the process is somewhat misleadingly referred to as "downloading firmware"), but there is no software based process to read firmware (unless you read directly from the eprom; something I'm not particularly inclined to do). So my next step will be to get in touch with HGST Helpdesk to get a proper firmware and then write that. Perhaps as further point of documentation and lead in to my actual question(s): - preclear zeros using `dd`, which `sg_write_buffer` simulates, which further suggests that this is the problem why I'm seeing a bunch of errors along these lines every few seconds Feb 6 04:43:37 Tower kernel: print_req_error: critical medium error, dev sdb, sector 125388072 Feb 6 04:43:37 Tower kernel: Buffer I/O error on dev sdb, logical block 15673509, lost async page write - apparently the author of `sg3_utils` also wrote a dd variant, `ddpt`. http://sg.danny.cz/sg/ddpt.html, that allows extension of using the verify flag whilst writing. I have considered amending preclear and posting a pull request to catch this case, but I wonder: would the disk run fine once included in the array? - has anybody experienced a situation like this before and solved it without calling the helpdesk? While I have ordered a new drive (just in case), I will keep this thread updated as I learn more and hope that it serves as a matter of documentation and hopefully as help for others, should they run in to a similar situation.