Jump to content

Notice [UNRAID] - reported uncorrect returned to normal value


tkenn1s

Recommended Posts

I just installed a couple 12T drives [purchased during Amazon's Prime Day] and precleared them. They've been sitting in the NAS, unassigned for a week or so. Recently, we had a power outage and my server shutdown [it's connected to a UPS, so, things shutdown cleanly]. When the server came back online, I received a notification for one of the drives "Notice [UNRAID] - reported uncorrect returned to normal value ST12000VN0007-2GS116_ZJV1JZA2 (sde)". I've attached a full SMART report, but, it doesn't look like there's any errors [assuming the "error" attributes are 48 bits]. Is this something I can safely ignore? Or, should I RMA the drive before I assign it into the array?

 

unraid-smart-20180804-1837.zip

Link to comment
1 hour ago, tkenn1s said:

I think that's what has me confused. The drive went "back" to normal; meaning that it was [or, at least unraid thought it was] in an abnormal state prior.

 

You can have sectors that gives ECC (Error Correcting Code) errors. But where the ECC allows the data to be corrected. So the disk can rewrite the content to restore the sector state to good.

 

You can also have sectors that has too many errors for ECC to be able to correct the data. This would then end up being flagged as offline uncorrectable. Such data can possibly also be fixed by rewriting the data from a known good source - such as unRAID using all other disks together with the parity computation formulas to recreate the original data. Copying a new file over a sector that was flagged as offline uncorrectable may also clear that error.

 

So there are ways that some of the error counters in the SMART data can decrease.

Link to comment

Doesn't seem that either of these should apply here, though. First, the drive isn't part of the array; it's not even formatted. So, there shouldn't be any data being written to it. Second, more importantly, at least from my naive reading of the SMART report, there weren't any ECC errors. Attribute 195 [Hardware ECC Recovered] shows that there were zero ECC recovered reads. Additionally, both attribute 1 and 7 also show zero read/seek errors.

 

I guess this was just some anomaly with unraid and is nothing to worry about?

Link to comment

Reported uncorrect is attribute 187. So it's this value that unRAID considers have returned back to zero from some previous value.

I leave it to unRAID developers to decide if unRAID have goofed or if your machine have actually seen a non-zero value for attribute 187.

 

3 minutes ago, tkenn1s said:

Attribute 195 [Hardware ECC Recovered] shows that there were zero ECC recovered reads.

 

Attribute 195 says:

195 Hardware_ECC_Recovered  -O-RC-   009   009   000    -    43047327

I'm not sure of the exact meaning of the raw value for this disk. 43047327 in decimal means 290 D99F in hexadecimal.

It is most probably a logarithmic rolling average of how many bits that have been repaired using ECC.

 

But whatever Seagate encodes in the raw number, they give a quite low value of 9 in the Value column and Worst columns.

And lower values are worse. But in this case, Seagate have never specified an alarm level.

 

What we can say is that there have most definitely been use of ECC to recover bits. No hard disk is so perfect that it doesn't need to make use of the ECC to correct bit errors.

Link to comment

From what I could find about Seagate's SMART reporting, there was this site -- http://sgros.blogspot.com/2013/01/seagate-disk-smart-values.html. The TL;DR is that Seagate encodes the error attributes as a 48-bit value, with the upper 16 bits representing the actual number of errors and the lower 32 bits representing the number of reads/seeks.

 

So, decimal 43047327 for attribute 197 is 0x0000 0290 D99F. Meaning 0 ECC recovered errors in nearly 43 million reads. The drive is new and only ran pre-clear, so, the relatively low number of reads would be about right. But, I would imagine those lower 32 bits will roll over at some point.

 

In any case, I'm not convinced the drive has actually shown any read/seek errors thus  far.

 

Link to comment

Did you add or remove any device prior to the last shutdown?

 

System notifications currently keep track of unassigned disks by the device identification, what might have happened and it happened to me several times is like this example:

 

You have a 4 disk array, disks sdb, sdc, sdd and sde, and two unassigned devices sdf and sdg, now say sdf has reported uncorrected errors (or any other monitored attribute), you'll get notified of that, but before or after shutting the server you remove that disk (sdf), at next power on sdg will become sdf and you'll get a new system notification that reported uncorrected errors for sdf returned to normal, i.e. 0.

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...