Jump to content

One of my HDD shows a error


Recommended Posts

I just built this server and just now installed Unraid for the first time.

 

The HDD came from another machine where they were working great, they were wiped, and a health test/check was ran and showed they were in perfect health. It is four 6TB HGST drives that are 2 or 3 years old but not worked hard.

 

Unraid is showing one of them with a error.

On the Self Test screen, under Attributes to the left there is a column of numbers...1-12 and 192-199 and in row 198 is says "offline uncorrectable". Here are all the values going across....

 

Flag: 0x0008

Value: 100

Worst: 100

Threshold: 000

Type: Old Age

Updated: offline

Failed: never

Raw Value: 13

 

I have no idea what all that means. Why does it say Old Age? They are a couple years old but not worked very hard, not sure what constitutes "old age"

 

I started running the "SMART extended self-test" but it looks like it's going to take a very long time before I have results to share about that.

 

Should I stop the extended self test and do something else instead?

 

Theres no data on the drive.

Link to comment
23 minutes ago, johnnie.black said:

Old age is just the type of attribute, while a non zero attribute for offline uncorrectable is never good it doesn't mean that the disk is failing, wait for the extended SMART test, if it passes disk is good for now, and as long as that value doesn't increase it would be fine.

 

Thank You for the info

 

Do you have a rough idea how long the extended SMART test should take, being a 6Tb drive with no data.

 

Is the "offline uncorrectable" raw value of 13 telling me that it found 13 bad sectors?

Edited by SPOautos
Link to comment
6 minutes ago, SPOautos said:

Do you have a rough idea how long the extended SMART test should take, being a 6Tb drive with no data.

2/3 hours per TB, you can also see the estimated time for that specific drive in the SMART report, this line:

 

Extended self-test routine
recommended polling time:      (1128) minutes.

 

7 minutes ago, SPOautos said:

Is the "offline uncorrectable" raw value of 13 telling me that it found 13 bad sectors?

Possibly but not necessarily, different firmwares have sometimes different behavior regarding on how they handle bad sectors.

  • Thanks 1
Link to comment
8 hours ago, johnnie.black said:

2/3 hours per TB, you can also see the estimated time for that specific drive in the SMART report, this line:

 


Extended self-test routine
recommended polling time:      (1128) minutes.

 

Possibly but not necessarily, different firmwares have sometimes different behavior regarding on how they handle bad sectors.

 

The extended check is at 90% so pretty close to finished.  Even if it clears this disk, should I use it as a data disk or my parity disk? I'm thinking parity because if it goes bad and I lose it I replace the disk and the parity is rebuild and my data doesnt risk corruption or something. 

 

But I'm FAR from a IT person....what do you think?

Link to comment
11 hours ago, johnnie.black said:

2/3 hours per TB, you can also see the estimated time for that specific drive in the SMART report, this line:

 


Extended self-test routine
recommended polling time:      (1128) minutes.

 

Possibly but not necessarily, different firmwares have sometimes different behavior regarding on how they handle bad sectors.

 

The results of the finished extended test is error free. Yet on the dashboard it still lists that drive with a orange thumbs down saying errors. Do I need to reboot the server or should I do something else? Any other type of test?

 

I'm not sure what I should do from here.

Edited by SPOautos
Link to comment
3 hours ago, ChatNoir said:

It depends on what is the origin of the thumb down.

You should post your diagnostics so the guys that know SMART can provide suggestions on what to do next.

 

I just installed unraid for the first time yesterday.....so please excuse the ignorance lol.....which diagnostics and how do I get the diagnostics from unraid to here. I dont have any plugins or dockers setup yet and not even an array (was waiting on this before I start labeling drives).

 

I saw where there is the option to download the results of my extended test but I dont have email or anything like that in Unraid to get it out of Unraid.

 

Link to comment
1 minute ago, ChatNoir said:

Yes ! :) 

I see that your disk sdc has indeed :

198 Offline_Uncorrectable   ---R--   100   100   000    -    13

 

Let's see what the expert can make of that, I am learning as you are.

 

Yes, that's what prompted the original post. But then when I ran the extended smart test, it found zero errors. I guess the errors were in the past? I have no idea really. 

Link to comment
34 minutes ago, hawihoney said:

The orange mark comes from the setting shown below. The global SMART settings under Settings/Disk settings allow to define what SMART values lead to this orange mark. 198 Offline_Uncorrectable belongs to them:

 

 

bla.jpg

 

Ahhhh.....something tells me I'll be learning something new every day for a longggg time. Thank You! :)

 

Well hopefully someone FAR more knowledgeable than me can look at this stuff and tell me if its decently safe to use this HDD.

Link to comment

SMART test completed successfully, so disk is good for now, but it failed two previous ones:

 

# 1  Extended offline    Completed without error       00%     21384         -
# 2  Extended offline    Completed: read failure       90%     21288         9432616
# 3  Short offline       Completed: read failure       10%     21127         9432616

 

For now acknowledge the SMART attributes but any more read errors in the near future best to replace it.

  • Like 1
Link to comment
18 minutes ago, johnnie.black said:

SMART test completed successfully, so disk is good for now, but it failed two previous ones:

 


# 1  Extended offline    Completed without error       00%     21384         -
# 2  Extended offline    Completed: read failure       90%     21288         9432616
# 3  Short offline       Completed: read failure       10%     21127         9432616

 

For now acknowledge the SMART attributes but any more read errors in the near future best to replace it.

 

Could that be because something happened like the computer shutting off during testing?

 

Is it odd that it didnt pass at one point and now it does?

 

Do you think I should make this one my parity disk as opposed to a data disk? That way if it gets flakey I just replace it and it recreates the parity data. That way it doesnt risk corrupting my actual data.

Link to comment
2 minutes ago, SPOautos said:

Do you think I should make this one my parity disk as opposed to a data disk? That way if it gets flakey I just replace it and it recreates the parity data. That way it doesnt risk corrupting my actual data.

I would say it should NOT be your parity disk.    If a disk fails then Unraid relies on being able to reliably read ALL the other drives plus parity to recreate the contents of the failed drive.   Therefore if you put your unreliable disk in as parity you have no confidence you can recover a failed array disk with its contents uncorrupted.

  • Like 1
Link to comment
1 hour ago, itimpi said:

I would say it should NOT be your parity disk.    If a disk fails then Unraid relies on being able to reliably read ALL the other drives plus parity to recreate the contents of the failed drive.   Therefore if you put your unreliable disk in as parity you have no confidence you can recover a failed array disk with its contents uncorrupted.

 

Okay, thinking about it, I have four 6TB drives total. I can use 3 for data and 1 for parity. I only have about 8Tb of data right now so I'll have one data drive just sitting empty which could be this one that's had errors before.  OR does Unraid automatically spread the data out across all three data drives and use them all?

Edited by SPOautos
Link to comment

The way that Unraid spreads data across drives depends on the settings you have set up for specific shares.

 

if you do not need all drives at the moment I would recommend that you do not even add them to the array, but keep it available for when you actually need it.    That could be as an additional drive if you need the space or as a replacement for a failed drive.  Each additional drive is a potential additional point of failure so why not avoid even that small chance at this point.

  • Like 1
Link to comment

I'm just a user, no SMART specialist. I can tell you what I would do in your situation. But it's up to you.

 

If I'm running single parity, I would throw that drive out of the array. I would RMA if it's possible.

 

If I'm running double parity I would keep that drive but I would have an eye on that disk.

 

As long there are no increasing failures/errors and the other disks are in perfect shape in a double parity system I would go that way.

 

I had disks with all kind of errors. When errors appeared once and didn't increase - there's a chance that they will run for a long time, On the other side I had lots of disks dying really fast.

 

So it's a lottery. There's no guarantee. Take backups.

 

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...