Jump to content

Errors on Disk - Time to replace?


Recommended Posts

I'd replace the drive. I've got a drive that's had "one" SMART error for years. You've got three (best I can tell) and in a small amount of time.

 

I would bet money that, if you replace it now, you won't loose a single byte of data. After it's replaced, you can test it further and decide if you want it back in your array. :)

 

6.

Link to comment

Data pro weighing in.

 

Your drive is reporting uncorrectable sectors. Twice at one address, once at another address. It hasn't marked the second address as a permanent-fail, thus only tagging 1 in the attributes column.

 

This is a big pre-warning state. Some drives may limp like this for years, but if this data or reliable performance are at all important, back it up and swap the drive.

 

Are you using Parity? If you're not, I'd like to note that this case was an early-warning case, and a Parity drive would've saved you even if you hadn't noticed.

Also an important question because it somewhat determines how you proceed, as does your experience with things like ssh and Linux in general.

Link to comment
1 hour ago, codefaux said:

Your drive is reporting uncorrectable sectors. Twice at one address, once at another address. It hasn't marked the second address as a permanent-fail, thus only tagging 1 in the attributes column.

 

This is a big pre-warning state. Some drives may limp like this for years, but if this data or reliable performance are at all important, back it up and swap the drive.

 

Are you using Parity? If you're not, I'd like to note that this case was an early-warning case, and a Parity drive would've saved you even if you hadn't noticed.

Also an important question because it somewhat determines how you proceed, as does your experience with things like ssh and Linux in general.

Thanks for the reply,

 

I don't know if I've read this correctly or not, but it appears that that the 1st and 2nd smart error happened at the same time "29801 hours" on the same block "3894526032"

The third error happened some time later at at "48162 hours" on block "3166961520"

So from what I understand there has been "18361 hours" or 765 days just over 2 years since the second and third errors.

 

I'm assuming I can put move any critical data onto the other drives in my array and leave this drive just for media content that can be easily retrieved again if the drive ever decides to totally fail.

 

I currently have a 4 drives, with one used for parity 

 

In regards to linux and ssh experience, I'm confident I'll manage.

 

Can you see any problem with this thought pattern?

 

Link to comment
3 minutes ago, madgino said:

I'm assuming I can put move any critical data onto the other drives in my array and leave this drive just for media content that can be easily retrieved again if the drive ever decides to totally fail.

You can, but remember that Unraid requires all other drives to be read correctly for a rebuild, i.e., if another disks fails and you need to rebuild it it may end up having some data corruption.

Link to comment

 

4 minutes ago, madgino said:

So from what I understand there has been "18361 hours" or 765 days just over 2 years since the second and third errors.

 

 

I hadn't thought to inspect the timestamp on the errors, but that does line up.

 

 

4 minutes ago, madgino said:

I'm assuming I can put move any critical data onto the other drives in my array and leave this drive just for media content that can be easily retrieved again if the drive ever decides to totally fail.

 

It seems you understand the data safety implications, but do consider that a failing disk will eventually begin to stall I/O heavily while it strains to pull bytes. It's safe and justifiable to leave on the condition that you keep an eye on the numbers, and if you start having I/O issues look there first.

 

6 minutes ago, madgino said:

I currently have a 4 drives, with one used for parity 

If you're using read-modify-write for Parity, a failing disk won't stall IO unless it is specifically involved in the IO. Turbo Mode, aka reconstruct-write, will read a block for each block written on other disks, and can cause IO stall and/or Parity integrity issues if the read fails. However, read-modify-write has its own drawbacks - each operation requires a full platter revolution, so your Parity will slow down overall IO.

 

2 minutes ago, JorgeB said:

You can, but remember that Unraid requires all other drives to be read correctly for a rebuild, i.e., if another disks fails and you need to rebuild it it may end up having some data corruption.

This is a very important fact to consider.

 

8 minutes ago, madgino said:

In regards to linux and ssh experience, I'm confident I'll manage.

 

Okay - the short of it is, first you need to learn how unRAID maps your disk names. My less than ideal configuration results in some mangling, so each shows something such as;

 

1AMCC_ZA1GXCX5000000000000

 

They all end in a block of twelve zeroes. They all start with 1AMCC_ -- which leaves the unique part in this case to be ZA1GXCX5. Using the SMART Identity information and a keen eye, that's the first part of my Parity drive's serial number. So, however I'm keeping track, I note the model, serial, what it shows up as, and Parity.

 

When you make The Swap, plug everything in and boot it. If you're lucky and both controllers choose to identify drives the same, you're done. It'll probably just start. If not, it should come up complaining about missing disks, with dropdowns. You'll have to again use naming convention reverse-engineering to figure out which is which.

 

This is where ssh and smartctl come in handy if naming conventions have changed -- if you're working with direct disks, you can query the disk directly, such as smartctl -i /dev/sdb or similar. -i is identity. -a is all and also dumps selftest logs, error logs, attributes, and capabilities.

 

If your controller is dumb like mine a RAID controller, it's gonna be a headache. First resort, see if you can cross-flash it to IT mode. That's fancy speak for "don't be a RAID controller, just expose disks directly". If not, you'll need extra parameters for smartctl to access the disks the dumb way like I am, and I'm not gonna go there unless we need to.

 

 

Once you know how to identify your disks, just match them in the drop-downs. I'm reasonably sure that the order doesn't matter, except that Parity disks must stay in Parity-land, Cache disks must stay in Cache-land, and Disks are Disks in any order. I could be wrong though, and it doesn't hurt to be exact.

 

 

Questions?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...