Jump to content

Disk Errors Caused Disabled Drive in Array


Recommended Posts

Good day everyone. My turn at asking for help. Diagnostics attached.

 

My 5 year old Seagate 8TB drive (sdi) seems to have failed in the array this morning with 1024 errors. Luckily, I guess, it was holding a minimum of data.

 

Before really reading up, I started Unbalance to move what little data it held on it to the other drives. I'm guessing this is for not, since I'm running emulated. I'll find a replacement drive ASAP and try to get that in there today.

 

In the meantime, what's your take looking at the diags?

 

This has been working for months, since last November. I haven't cracked the case open and do not suspect a cable or some other cause from me touching anything. It's just been. . . working, until this morning.

 

 

Edit: added info, modified title

homer-diagnostics-20220501-0722.zip

Edited by Bait Fish
Link to comment
  • Bait Fish changed the title to Disk Errors Caused Disabled Drive in Array

Okay, I'm back. This should be my last edit.

 

I do not think the extended test is completing. And I am not sure the downloaded SMART report will show that. This last, third time running the extended test. I sat and watched the progress. It appeared to stop on its own. I feel like the extended test should run for quite some time for an 8TB disk, not 10 minutes. In the drive capabilities section it states "Extended self-test routine recommended polling time: 937 minutes."

 

Below is what I have observed. And I have also attached the last three SMART reports (download button). Even further below, are the last data from the Attributes table. Hope this helps diagnose.

 

While I watch the progress of the SMART extended test (short test button greys out), the most progress observed is, self-test in progress, 10% complete.


Then maybe 10 minutes later I notice it then says,

Last SMART test result:
No self-tests logged on this disk

 

Refreshing the page shows a new status,

Last SMART test result:

Aborted by host (text colored orange)

 

Further details from the page follow. I did not capture a downloaded report from the first extended test, four times ago.

 

SMART self-test history:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%     47049         -
# 2  Extended offline    Aborted by host               90%     47048         -
# 3  Extended offline    Aborted by host               90%     47047         -
# 4  Extended offline    Aborted by host               90%     47047         -
# 5  Short offline       Completed without error       00%     21432         -
 

SMART error log:
No Errors Logged


Attributes [before the last test. Also, highlighed in gold are #197 AND #198]

#    ATTRIBUTE NAME    FLAG    VALUE    WORST    THRESHOLD    TYPE    UPDATED    FAILED    RAW VALUE
1    Raw read error rate    0x000f    105    099    006    Pre-fail    Always    Never    7920184
3    Spin up time    0x0003    092    090    000    Pre-fail    Always    Never    0
4    Start stop count    0x0032    100    100    020    Old age    Always    Never    805
5    Reallocated sector count    0x0033    100    100    010    Pre-fail    Always    Never    0
7    Seek error rate    0x000f    076    060    030    Pre-fail    Always    Never    91008406049
9    Power on hours    0x0032    047    047    000    Old age    Always    Never    47048 (5y, 4m, 13d, 8h)
10    Spin retry count    0x0013    100    100    097    Pre-fail    Always    Never    0
12    Power cycle count    0x0032    100    100    020    Old age    Always    Never    293
183    Runtime bad block    0x0032    100    100    000    Old age    Always    Never    0
184    End-to-end error    0x0032    100    100    099    Old age    Always    Never    0
187    Reported uncorrect    0x0032    100    100    000    Old age    Always    Never    0
188    Command timeout    0x0032    100    100    000    Old age    Always    Never    1
189    High fly writes    0x003a    100    100    000    Old age    Always    Never    0
190    Airflow temperature cel    0x0022    069    033    045    Old age    Always    In the past    31 (255 255 36 27 0)
191    G-sense error rate    0x0032    100    100    000    Old age    Always    Never    0
192    Power-off retract count    0x0032    100    100    000    Old age    Always    Never    583
193    Load cycle count    0x0032    085    085    000    Old age    Always    Never    31233
194    Temperature celsius    0x0022    031    067    000    Old age    Always    Never    31 (0 19 0 0 0)
195    Hardware ECC recovered    0x001a    105    099    000    Old age    Always    Never    7920184
197    Current pending sector    0x0012    098    098    000    Old age    Always    Never    776
198    Offline uncorrectable    0x0010    098    098    000    Old age    Offline    Never    776
199    UDMA CRC error count    0x003e    200    200    000    Old age    Always    Never    0
240    Head flying hours    0x0000    100    253    000    Old age    Offline    Never    17295 (178 106 0)
241    Total lbas written    0x0000    100    253    000    Old age    Offline    Never    94608024951
242    Total lbas read    0x0000    100    253    000    Old age    Offline    Never    3333974639875

 

Attributes [after the last test. Again, highlighed in gold are #197 AND #198]

#	ATTRIBUTE NAME	FLAG	VALUE	WORST	THRESHOLD	TYPE	UPDATED	FAILED	RAW VALUE
1	Raw read error rate	0x000f	105	099	006	Pre-fail	Always	Never	7920184
3	Spin up time	0x0003	092	090	000	Pre-fail	Always	Never	0
4	Start stop count	0x0032	100	100	020	Old age	Always	Never	807
5	Reallocated sector count	0x0033	100	100	010	Pre-fail	Always	Never	0
7	Seek error rate	0x000f	076	060	030	Pre-fail	Always	Never	91008485358
9	Power on hours	0x0032	047	047	000	Old age	Always	Never	47049 (5y, 4m, 13d, 9h)
10	Spin retry count	0x0013	100	100	097	Pre-fail	Always	Never	0
12	Power cycle count	0x0032	100	100	020	Old age	Always	Never	293
183	Runtime bad block	0x0032	100	100	000	Old age	Always	Never	0
184	End-to-end error	0x0032	100	100	099	Old age	Always	Never	0
187	Reported uncorrect	0x0032	100	100	000	Old age	Always	Never	0
188	Command timeout	0x0032	100	100	000	Old age	Always	Never	1
189	High fly writes	0x003a	100	100	000	Old age	Always	Never	0
190	Airflow temperature cel	0x0022	066	033	045	Old age	Always	In the past	34 (255 255 36 27 0)
191	G-sense error rate	0x0032	100	100	000	Old age	Always	Never	0
192	Power-off retract count	0x0032	100	100	000	Old age	Always	Never	588
193	Load cycle count	0x0032	085	085	000	Old age	Always	Never	31241
194	Temperature celsius	0x0022	034	067	000	Old age	Always	Never	34 (0 19 0 0 0)
195	Hardware ECC recovered	0x001a	105	099	000	Old age	Always	Never	7920184
197	Current pending sector	0x0012	098	098	000	Old age	Always	Never	776
198	Offline uncorrectable	0x0010	098	098	000	Old age	Offline	Never	776
199	UDMA CRC error count	0x003e	200	200	000	Old age	Always	Never	0
240	Head flying hours	0x0000	100	253	000	Old age	Offline	Never	17296 (164 225 0)
241	Total lbas written	0x0000	100	253	000	Old age	Offline	Never	94608024951
242	Total lbas read	0x0000	100	253	000	Old age	Offline	Never	3333974639875

 

homer-smart-20220502-0943[1008].zip homer-smart-20220502-0943[0951].zip homer-smart-20220502-0821.zip

Edited by Bait Fish
Figuring out what's really going on with these extended tests and posting any info I can.
Link to comment
On 5/1/2022 at 10:50 AM, Bait Fish said:

started Unbalance to move what little data it held on it to the other drives. I'm guessing this is for not, since I'm running emulated

Might even be considered a bad idea, since with a disabled disk and single parity, you have no protection. And of course, you are making all the other disks work much harder due to emulation

 

3 hours ago, Bait Fish said:

# 1  Extended offline    Aborted by host               90%     47049         -

You would probably have to disable spindown on the disk to get it to complete.

 

Moot point though because

36 minutes ago, itimpi said:

Anything other than 0 for Pending Sectors is never a good sign, and with the number you have I would think the drive could fail completely any time now.

 

 

  • Thanks 1
Link to comment

Thanks for the tips and insights. Making progress now that spin down is disabled. So simple. . . I'll remember this next time.

 

Your cautions spurred me to quit new disk intensive activity I started today. Now most everything is stopped, disk activity at a minimum.

 

The new replacement 8TB drive's preclear is going to finish soon, 2% post-read left. To get the array to normal sooner, I'll play it safe by adding the new drive in first, rebuilding, then testing the failing drive later while unassigned.

 

 

Link to comment

Following up. I did not get a good extended test log via Unraid. I instead swapped the suspect drive out and got its replacement going first. Attempts at testing it as an external drive kept failing, but without log as far as I could tell. Scanning it with Seatools on Windows ended with a failure in the long test warning that the drive is... failing. Thanks again, all of you, for your help. Unraid, and its communitity, are awesome!

 

--------------- SeaTools for Windows v1.4.0.7 ---------------
5/24/2019 3:17:39 PM
Model Number: Seagate Backup+ Hub BK
Serial Number: NA8TGN5Z
Firmware Revision: D781
SMART - Started 5/24/2019 3:17:39 PM
SMART - Pass 5/24/2019 3:17:45 PM
Short DST - Started 5/24/2019 3:17:52 PM
Short DST - Pass 5/24/2019 3:18:58 PM
Identify - Started 5/24/2019 3:19:03 PM
Short Generic - Started 5/24/2019 3:25:19 PM
Short Generic - Pass 5/24/2019 3:26:31 PM
Identify - Started 5/3/2022 4:36:31 PM
Short DST - Started 5/3/2022 4:36:46 PM
Short DST - Pass 5/3/2022 4:37:58 PM
Short Generic - Started 5/3/2022 4:38:45 PM
Short Generic - Pass 5/3/2022 4:40:27 PM
Long Generic - Started 5/3/2022 4:41:42 PM
Long Generic - FAIL 5/4/2022 1:18:40 AM
SeaTools Test Code: E896A6D4

 

Link to comment
On 5/1/2022 at 10:50 AM, Bait Fish said:

Unbalance to move what little data it held on it to the other drives

A safer approach would be to copy (not move) the data to somewhere other than the array. Then nothing is written and emulation only has to read all disks.

 

You must always have backups of anything important and irreplaceable even if everything is working well.

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...