DocScot Posted February 7, 2021 Share Posted February 7, 2021 I recently purchased an 8TB HGST SAS drive off eBay, I already have 4 of the exact same type installed (HUH728080AL) so I expected everything to be fine. My hardware set up is a NSFW 1.0 build with a Gigabyte GA-7PESH2, so my SFF8482 breakout cable was fully occupied. I have some 8TB SATA drives in there too, so I just got an adaptor from SFF8482 to SFF8087 and hooked it up. More out of habit than necessity I ran preclear on the new drive, which at first stalled out and crashed during the pre-read phase. I figured the cable might have some problem (the added space with the adaptor brought the SFF8087 close up to the case) so I switched cable & adapter with a different drive (that is fastened somewhere else and won't have that problem). A second run of preclear looked a bit better (pre-read worked fine), but has been horribly slow during the zeroing phase (pauses often, when it runs it does so at 1MB/second). Interestingly, this doesn't seem to be anything that others have reported here, but I found a post on reddit that led made me follow up in a similar manner. Unfortunately, however, while the OP posted great method of diagnosing the problem they never posted a resolution to fix it. I looked at the new HGST, and confirmed that the settings were identical on `sdparm`, and then followed the same line of inquiry as u/fmillion on reddit: Unfortunately, `fio` (installed through Nerdtools) doesn't work and throws a segfault (just says "Illegal exception"), which apparently isn't entirely unheard of and could be fixed (see here: https://github.com/dmacias72/unRAID-NerdPack/issues/14 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898473#5, but that's a story for another time. Thankfully `sg3_utils` works and lets me reproduce and confirm that it does crash when only using `sg_write_buffer`, but not `sg_write_verify`, as posted by u/fmillion on reddit. Frustratingly, `sg_write_buffer` will allow you to write a new firmware (or as it is called in this situation: microcode; in the `sg3_utils` docs http://sg.danny.cz/sg/sg3_utils.html the process is somewhat misleadingly referred to as "downloading firmware"), but there is no software based process to read firmware (unless you read directly from the eprom; something I'm not particularly inclined to do). So my next step will be to get in touch with HGST Helpdesk to get a proper firmware and then write that. Perhaps as further point of documentation and lead in to my actual question(s): - preclear zeros using `dd`, which `sg_write_buffer` simulates, which further suggests that this is the problem why I'm seeing a bunch of errors along these lines every few seconds Feb 6 04:43:37 Tower kernel: print_req_error: critical medium error, dev sdb, sector 125388072 Feb 6 04:43:37 Tower kernel: Buffer I/O error on dev sdb, logical block 15673509, lost async page write - apparently the author of `sg3_utils` also wrote a dd variant, `ddpt`. http://sg.danny.cz/sg/ddpt.html, that allows extension of using the verify flag whilst writing. I have considered amending preclear and posting a pull request to catch this case, but I wonder: would the disk run fine once included in the array? - has anybody experienced a situation like this before and solved it without calling the helpdesk? While I have ordered a new drive (just in case), I will keep this thread updated as I learn more and hope that it serves as a matter of documentation and hopefully as help for others, should they run in to a similar situation. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.