Jump to content

Parity drive dies?


NewDisplayName

Recommended Posts

Today i woke up and found two errors in log:

 

unRAID Parity disk SMART health [197]: 28-07-2018 01:55
Warning [UNRAID-SERVER] - current pending sector is 8
ST8000AS0002-1NA17Z_Z84168C5 (sdd)

 

unRAID Parity disk SMART health [198]: 28-07-2018 01:55
Warning [UNRAID-SERVER] - offline uncorrectable is 8
ST8000AS0002-1NA17Z_Z84168C5 (sdd)

 

Since thats my parity drive im a bit worried, what to do? Which is pretty new... (not a year old)


 

unraid-server-diagnostics-20180728-0706.zip

Link to comment

Time to consider if you should replace the drive. If you have configured regular parity checks then there should have been many checks of this sector before it failed - 8 uncorrectable logical 512-byte sectors in the SMART log means a single 4kB physical sector on the disk. There is of course a chance that the sector error isn't a real disk error but caused by vibration when writing the 4 kB block of data. Some reviewers seems to indicate that the drive has rotational vibration sensors, but I haven't seen Seagate explicitly make that claim.

 

If you go for replacing the drive, this would be a good time to figure out if you think warranty is good - if you want to send in your current parity drive and then wait for an unknown amount of time to get back a used drive of unknown state so you can rebuild the parity.

 

Or if you will pick up a brand new drive so you can instantly rebuild.


By the way - I'm curious why you selected a Shingled Magnetic Recording drive for parity, since they aren't as good as normal drives to handle large amounts of writes in a deterministic way.

 

Edit: Unless you decide to directly replace the drive, you should definitely consider doing a long SMART test of the drive.

Link to comment
23 minutes ago, nuhll said:

What about replacing that parity drive with another hdd and putting it in the array?


That's the route if the goal is just to avoid using a SMR drive as parity.

 

But when it comes to reliability, every single drive counts. If you need to rebuild a lost drive, then every sector of every drive will matter (up to the size of the drive to emulate/rebuild). So you need to decide if you think this was a single sector failure or if you think the drive will end up with more problematic sectors. One failed sector isn't enough evidence in either direction.

 

Anyway - if you don't do an immediate replace, you should still start an extended SMART test so the rest of the sectors on the drive will also be tested.

Link to comment

The extended test takes quite a number of minutes, since it reads all the surface.


If you read the SMART data, you'll notice this text:

Extended self-test routine
recommended polling time: 	 ( 940) minutes.

That gives you an indication about expected time for the extended test - assuming a well-working drive and no external accesses.

Link to comment
6 hours ago, pwm said:

The extended test takes quite a number of minutes, since it reads all the surface.


If you read the SMART data, you'll notice this text:


Extended self-test routine
recommended polling time: 	 ( 940) minutes.

That gives you an indication about expected time for the extended test - assuming a well-working drive and no external accesses.

Ah, okay, that explains why it take so long, its at 60% ill post when its finished.

 

But since its <6 month old, i guess i can sent it back.

Link to comment

Its finished, everything back to normal???

28-07-2018 22:36 unRAID Parity disk SMART message [198] Notice [UNRAID-SERVER] - offline uncorrectable returned to normal value ST8000AS0002-1NA17Z_Z84168C5 (sdd) normal  
28-07-2018 22:36 unRAID Parity disk SMART message [197] Notice [UNRAID-SERVER] - current pending sector returned to normal value ST8000AS0002-1NA17Z_Z84168C5 (sdd)

normal

 

Here is the smart report.

unraid-server-smart-20180729-0203.zip

Link to comment

Either the server happened to write new data to the problematic sector. Or during the smart test the drive managed to read out the problematic content so it could try and rewrite it to the same sector.

 

As I mentioned earlier, it isn't always possible to know if an uncorrectable error really is a physically damaged sector or the result of some disturbance (vibration, power glitch, ...) during the write. Writes are done blind - it isn't until the drive later tries to read out the data it will know if the write went ok or not. Which is why NAS and server drives have vibration sensors so they can abort and retry the write if they sense any vibration.

 

It is always very hard to know if a disk really is bad or has suffered a failed write, when it shows just a single bad sector (in this case one problematic physical sector but described as 8 logical sectors).

 

Most probably you can continue to use this drive. But you should at least consider if you want a shingled drive as parity drive given the fact that a shingled drive can't rewrite a single sector in a track - it has to perform a big rewrite. So it uses a smaller non-shingled area to cache writes and at a later time performs the full rewrite.

Link to comment
14 minutes ago, nuhll said:

I guess i will sent it back and get a fresh one anyway.

 

This isn't a dead-on-arrival drive. So don't expect to get a brand new one.


That it could repair the sector indicates the error was caused by a failed write and not a disk failure.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...