Bait Fish Posted May 1, 2022 Share Posted May 1, 2022 (edited) Good day everyone. My turn at asking for help. Diagnostics attached. My 5 year old Seagate 8TB drive (sdi) seems to have failed in the array this morning with 1024 errors. Luckily, I guess, it was holding a minimum of data. Before really reading up, I started Unbalance to move what little data it held on it to the other drives. I'm guessing this is for not, since I'm running emulated. I'll find a replacement drive ASAP and try to get that in there today. In the meantime, what's your take looking at the diags? This has been working for months, since last November. I haven't cracked the case open and do not suspect a cable or some other cause from me touching anything. It's just been. . . working, until this morning. Edit: added info, modified title homer-diagnostics-20220501-0722.zip Edited May 1, 2022 by Bait Fish Quote Link to comment
JorgeB Posted May 2, 2022 Share Posted May 2, 2022 Disks appears to be failing, you can run an extended SMART test to confirm. 1 Quote Link to comment
Bait Fish Posted May 2, 2022 Author Share Posted May 2, 2022 (edited) Okay, I'm back. This should be my last edit. I do not think the extended test is completing. And I am not sure the downloaded SMART report will show that. This last, third time running the extended test. I sat and watched the progress. It appeared to stop on its own. I feel like the extended test should run for quite some time for an 8TB disk, not 10 minutes. In the drive capabilities section it states "Extended self-test routine recommended polling time: 937 minutes." Below is what I have observed. And I have also attached the last three SMART reports (download button). Even further below, are the last data from the Attributes table. Hope this helps diagnose. While I watch the progress of the SMART extended test (short test button greys out), the most progress observed is, self-test in progress, 10% complete. Then maybe 10 minutes later I notice it then says, Last SMART test result: No self-tests logged on this disk Refreshing the page shows a new status, Last SMART test result: Aborted by host (text colored orange) Further details from the page follow. I did not capture a downloaded report from the first extended test, four times ago. SMART self-test history: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 90% 47049 - # 2 Extended offline Aborted by host 90% 47048 - # 3 Extended offline Aborted by host 90% 47047 - # 4 Extended offline Aborted by host 90% 47047 - # 5 Short offline Completed without error 00% 21432 - SMART error log: No Errors Logged Attributes [before the last test. Also, highlighed in gold are #197 AND #198] # ATTRIBUTE NAME FLAG VALUE WORST THRESHOLD TYPE UPDATED FAILED RAW VALUE 1 Raw read error rate 0x000f 105 099 006 Pre-fail Always Never 7920184 3 Spin up time 0x0003 092 090 000 Pre-fail Always Never 0 4 Start stop count 0x0032 100 100 020 Old age Always Never 805 5 Reallocated sector count 0x0033 100 100 010 Pre-fail Always Never 0 7 Seek error rate 0x000f 076 060 030 Pre-fail Always Never 91008406049 9 Power on hours 0x0032 047 047 000 Old age Always Never 47048 (5y, 4m, 13d, 8h) 10 Spin retry count 0x0013 100 100 097 Pre-fail Always Never 0 12 Power cycle count 0x0032 100 100 020 Old age Always Never 293 183 Runtime bad block 0x0032 100 100 000 Old age Always Never 0 184 End-to-end error 0x0032 100 100 099 Old age Always Never 0 187 Reported uncorrect 0x0032 100 100 000 Old age Always Never 0 188 Command timeout 0x0032 100 100 000 Old age Always Never 1 189 High fly writes 0x003a 100 100 000 Old age Always Never 0 190 Airflow temperature cel 0x0022 069 033 045 Old age Always In the past 31 (255 255 36 27 0) 191 G-sense error rate 0x0032 100 100 000 Old age Always Never 0 192 Power-off retract count 0x0032 100 100 000 Old age Always Never 583 193 Load cycle count 0x0032 085 085 000 Old age Always Never 31233 194 Temperature celsius 0x0022 031 067 000 Old age Always Never 31 (0 19 0 0 0) 195 Hardware ECC recovered 0x001a 105 099 000 Old age Always Never 7920184 197 Current pending sector 0x0012 098 098 000 Old age Always Never 776 198 Offline uncorrectable 0x0010 098 098 000 Old age Offline Never 776 199 UDMA CRC error count 0x003e 200 200 000 Old age Always Never 0 240 Head flying hours 0x0000 100 253 000 Old age Offline Never 17295 (178 106 0) 241 Total lbas written 0x0000 100 253 000 Old age Offline Never 94608024951 242 Total lbas read 0x0000 100 253 000 Old age Offline Never 3333974639875 Attributes [after the last test. Again, highlighed in gold are #197 AND #198] # ATTRIBUTE NAME FLAG VALUE WORST THRESHOLD TYPE UPDATED FAILED RAW VALUE 1 Raw read error rate 0x000f 105 099 006 Pre-fail Always Never 7920184 3 Spin up time 0x0003 092 090 000 Pre-fail Always Never 0 4 Start stop count 0x0032 100 100 020 Old age Always Never 807 5 Reallocated sector count 0x0033 100 100 010 Pre-fail Always Never 0 7 Seek error rate 0x000f 076 060 030 Pre-fail Always Never 91008485358 9 Power on hours 0x0032 047 047 000 Old age Always Never 47049 (5y, 4m, 13d, 9h) 10 Spin retry count 0x0013 100 100 097 Pre-fail Always Never 0 12 Power cycle count 0x0032 100 100 020 Old age Always Never 293 183 Runtime bad block 0x0032 100 100 000 Old age Always Never 0 184 End-to-end error 0x0032 100 100 099 Old age Always Never 0 187 Reported uncorrect 0x0032 100 100 000 Old age Always Never 0 188 Command timeout 0x0032 100 100 000 Old age Always Never 1 189 High fly writes 0x003a 100 100 000 Old age Always Never 0 190 Airflow temperature cel 0x0022 066 033 045 Old age Always In the past 34 (255 255 36 27 0) 191 G-sense error rate 0x0032 100 100 000 Old age Always Never 0 192 Power-off retract count 0x0032 100 100 000 Old age Always Never 588 193 Load cycle count 0x0032 085 085 000 Old age Always Never 31241 194 Temperature celsius 0x0022 034 067 000 Old age Always Never 34 (0 19 0 0 0) 195 Hardware ECC recovered 0x001a 105 099 000 Old age Always Never 7920184 197 Current pending sector 0x0012 098 098 000 Old age Always Never 776 198 Offline uncorrectable 0x0010 098 098 000 Old age Offline Never 776 199 UDMA CRC error count 0x003e 200 200 000 Old age Always Never 0 240 Head flying hours 0x0000 100 253 000 Old age Offline Never 17296 (164 225 0) 241 Total lbas written 0x0000 100 253 000 Old age Offline Never 94608024951 242 Total lbas read 0x0000 100 253 000 Old age Offline Never 3333974639875 homer-smart-20220502-0943[1008].zip homer-smart-20220502-0943[0951].zip homer-smart-20220502-0821.zip Edited May 2, 2022 by Bait Fish Figuring out what's really going on with these extended tests and posting any info I can. Quote Link to comment
itimpi Posted May 2, 2022 Share Posted May 2, 2022 Anything other than 0 for Pending Sectors is never a good sign, and with the number you have I would think the drive could fail completely any time now. 1 Quote Link to comment
trurl Posted May 2, 2022 Share Posted May 2, 2022 On 5/1/2022 at 10:50 AM, Bait Fish said: started Unbalance to move what little data it held on it to the other drives. I'm guessing this is for not, since I'm running emulated Might even be considered a bad idea, since with a disabled disk and single parity, you have no protection. And of course, you are making all the other disks work much harder due to emulation 3 hours ago, Bait Fish said: # 1 Extended offline Aborted by host 90% 47049 - You would probably have to disable spindown on the disk to get it to complete. Moot point though because 36 minutes ago, itimpi said: Anything other than 0 for Pending Sectors is never a good sign, and with the number you have I would think the drive could fail completely any time now. 1 Quote Link to comment
Bait Fish Posted May 3, 2022 Author Share Posted May 3, 2022 Thanks for the tips and insights. Making progress now that spin down is disabled. So simple. . . I'll remember this next time. Your cautions spurred me to quit new disk intensive activity I started today. Now most everything is stopped, disk activity at a minimum. The new replacement 8TB drive's preclear is going to finish soon, 2% post-read left. To get the array to normal sooner, I'll play it safe by adding the new drive in first, rebuilding, then testing the failing drive later while unassigned. Quote Link to comment
Bait Fish Posted May 4, 2022 Author Share Posted May 4, 2022 Following up. I did not get a good extended test log via Unraid. I instead swapped the suspect drive out and got its replacement going first. Attempts at testing it as an external drive kept failing, but without log as far as I could tell. Scanning it with Seatools on Windows ended with a failure in the long test warning that the drive is... failing. Thanks again, all of you, for your help. Unraid, and its communitity, are awesome! --------------- SeaTools for Windows v1.4.0.7 --------------- 5/24/2019 3:17:39 PM Model Number: Seagate Backup+ Hub BK Serial Number: NA8TGN5Z Firmware Revision: D781 SMART - Started 5/24/2019 3:17:39 PM SMART - Pass 5/24/2019 3:17:45 PM Short DST - Started 5/24/2019 3:17:52 PM Short DST - Pass 5/24/2019 3:18:58 PM Identify - Started 5/24/2019 3:19:03 PM Short Generic - Started 5/24/2019 3:25:19 PM Short Generic - Pass 5/24/2019 3:26:31 PM Identify - Started 5/3/2022 4:36:31 PM Short DST - Started 5/3/2022 4:36:46 PM Short DST - Pass 5/3/2022 4:37:58 PM Short Generic - Started 5/3/2022 4:38:45 PM Short Generic - Pass 5/3/2022 4:40:27 PM Long Generic - Started 5/3/2022 4:41:42 PM Long Generic - FAIL 5/4/2022 1:18:40 AM SeaTools Test Code: E896A6D4 Quote Link to comment
trurl Posted May 4, 2022 Share Posted May 4, 2022 On 5/1/2022 at 10:50 AM, Bait Fish said: Unbalance to move what little data it held on it to the other drives A safer approach would be to copy (not move) the data to somewhere other than the array. Then nothing is written and emulation only has to read all disks. You must always have backups of anything important and irreplaceable even if everything is working well. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.