Jump to content
jakeandchase

Did my disk die? "Device is disabled. Contents emulated."

16 posts in this topic Last Reply

Recommended Posts

I received an email alert this afternoon stating the following:

Event: unRAID array errors
Subject: Warning [S-CHASSIS] - array has errors
Description: Array has 1 disk with read errors
Importance: warning

Disk 1 - ST3000VN000-1H4167_Z3101PN6 (sdb) (errors 15)

One of my disks now reads Device is disabled. Contents emulated and is marked as faulty (with a red cross) in the array status.

 

Should I attempt to rebuild the array onto the disk? What's the best way of doing so?

 

I am running a basic two disk RAID 1 (mirror) array.

 

See attached for my server diagnostics zip.

 

Please let me know if there is any other info I can provide.

 

Thanks

s-chassis-diagnostics-20170522-2158.zip

Share this post


Link to post

The Disk has a lot of Raw Read Errors - so it seems this Disk is dying or allready dead.

Also the Seek Error Rate is extremely high.

Power On Hours: 21.068 - also a lot

I would change the disk as soon as possible.

 

EDIT: The second disk has also lots of errors and will fail soon i guess...

Edited by Zonediver

Share this post


Link to post
1 minute ago, Zonediver said:

Check/change  your SATA-cable fist and see what happens.

 

No problem.

 

I will check that tomorrow. I have a HP N54L microserver so I'm not 100% sure on how it's all wired inside.

 

Would moving the drive to a different bay achieve the same result?

 

 

Share this post


Link to post
16 hours ago, johnnie.black said:

Move it to a different slot and rebuild to the same disk, if it fails again in the near future replace it.

Should I remove the troubled disk from the array first?

 

How do I unassign a disk?

Share this post


Link to post
1 hour ago, jakeandchase said:

How do I unassign a disk?

 

Shutdown the server, swap the disk to a different slot, power back up, unassign the disk, start the array, stop the array, reassign the disk, start the array to begin the rebuild.

Share this post


Link to post
On 5/22/2017 at 8:16 AM, Zonediver said:

The Disk has a lot of Raw Read Errors - so it seems this Disk is dying or allready dead.

Also the Seek Error Rate is extremely high.

Power On Hours: 21.068 - also a lot

I would change the disk as soon as possible.

 

EDIT: The second disk has also lots of errors and will fail soon i guess...

 

The Raw Read Error Rates are fine. The large raw value means nothing. It is likely binary data that we humans have no want to understand. The "value" and "worst" are at safe levels.

 

Similarly, the Seek Error Rates are also fine. The "worst" has dipped to 60, but with a "thresh" of 30, is far from dipped below manufacturer failure threshold. And although normalized values are lower than I might expect for drives of this age, the fact that both drives have very similar values would lead me to believe that this is normal for this model drive. (But I'd be monitoring this going forward).

 

The attributes we look closely at are the Reallocated Sectors and Pending Sectors. You also want to verify that none of the attributes are showing failing now or "in the past". None of these are concern here.

 

The only thing that worries me about these drives is that they are 3T Seagates, and they have a bad track record.

Share this post


Link to post

Raw Read Errors cant be fine...

One of my WD-RED has lots of them since two weeks. I checked it outside the Arry and found out, that the transfer rate is about 1,2 MB/s at this position.

The Disk has 11 Block-Errors so thats not fine and not normal.

 

23-Mai-2017_11-24.png

23-Mai-2017_11-35_Benchmark.png

23-Mai-2017_11-46_Health.png

Edited by Zonediver

Share this post


Link to post

@Zonediver

 

The SMART data is one dimension of drive health. It has its limitations. Just because the smart report looks ok, it doesn't mean the drive is healthy. If you saw something that led you to do more investigation (which may or may not have been what the data was telling you), and you identified some slow sectors in the process that you believe are bad enough to look to replace the drive, there may have been some serendipity involved. But it is good if you found a drive issue.

 

The "raw" values that appear on the smart report, are, with a few notable exceptions, not something a person can understand without technical specs not made available by the manufacturers. They are likely bit masks (first 3 bits mean this, next 2 bits mean that, etc., etc.) that vary by manufacturer and even by model. Turning this concatenation of bits into a decimal number is meaningless. These raw values are translated into a normalized scale, where normally 100 is "good" and the threshold for going bad is defined. Smart tracks both the current normalzed value and the worst the normalized value has ever gotten. Some attributes have raw values that are usable - things like reallocated sectors and current pending sectors - which are absolute counts and have been used in that consistent way for every drive I have looked at. These are some of the most useful attributes to track.

 

Most important in the SMART attributes is not the absolute value, but a comparison over time. For example, having a drive with 1000 reallocated sectors that are rock solid and never increase with time, is probably better than having a drive with 20 reallocated sectors, and that number if growning with every parity check.

 

I do not have a particularly warm place in my heart for WD or Seagate drives (although I have been pretty happy with the 8T Seagate archives given their price and my experience with them to date). I believe that the HGST's are the highest quality, and if cost were no object, I'd been buying those.

Share this post


Link to post

Yes that might be possible. This WD-RED is my first failing RED - and i use plenty of this type.

Over the last 17 years i saw dying drives from IBM and - of course - Seagate, but only "one" WD.

All my sorted out WD-Greens are still working well, but this "RED-failure" tells me, that the quality of WD goes down,

so yes, if i have the money, i would prefere HGST too.

Edited by Zonediver

Share this post


Link to post
1 hour ago, seagate_surfer said:

Hi, we are sorry to hear that you're experiencing issues with your Seagate drive

 

Hi surfer, please check your PM.

Share this post


Link to post
On 5/23/2017 at 1:01 PM, bjp999 said:

 

The Raw Read Error Rates are fine. The large raw value means nothing. It is likely binary data that we humans have no want to understand. The "value" and "worst" are at safe levels.

 

Similarly, the Seek Error Rates are also fine. The "worst" has dipped to 60, but with a "thresh" of 30, is far from dipped below manufacturer failure threshold. And although normalized values are lower than I might expect for drives of this age, the fact that both drives have very similar values would lead me to believe that this is normal for this model drive. (But I'd be monitoring this going forward).

 

The attributes we look closely at are the Reallocated Sectors and Pending Sectors. You also want to verify that none of the attributes are showing failing now or "in the past". None of these are concern here.

 

The only thing that worries me about these drives is that they are 3T Seagates, and they have a bad track record.

 

 

While not specifically replying to bjp999 rather using the post to redirect the focus to the Seagate error values since the thread took a couple turns.  The seek error rate and the raw read error rate are 48 bit values.  Convert the reported value to hexadecimal and the upper 16 bits is the number of errors and the lower 32 bits is the total number of seeks.  So for the ST3000VN000-1H4167 drive, seek error rate of 91742336 converted to hex is 0x00000577E080.  Upper 16: 0x0000 = 0 is zero seek errors over 0x0577E080 = 91,742,336 seeks.  The high fly write count is high, however, on both Seagate drives.  The advice from johnnie.black is good, or replace the drive and run preclear on this one and see if the high fly write count increases or not.  As bjp999 mentioned, reallocated sectors and pending sectors are the typical watched values, but I also look at high fly writes on Seagate drives.

Share this post


Link to post
2 hours ago, unevent said:

While not specifically replying to bjp999 rather using the post to redirect the focus to the Seagate error values since the thread took a couple turns.  The seek error rate and the raw read error rate are 48 bit values.  Convert the reported value to hexadecimal and the upper 16 bits is the number of errors and the lower 32 bits is the total number of seeks.  So for the ST3000VN000-1H4167 drive, seek error rate of 91742336 converted to hex is 0x00000577E080.  Upper 16: 0x0000 = 0 is zero seek errors over 0x0577E080 = 91,742,336 seeks.  The high fly write count is high, however, on both Seagate drives.  The advice from johnnie.black is good, or replace the drive and run preclear on this one and see if the high fly write count increases or not.  As bjp999 mentioned, reallocated sectors and pending sectors are the typical watched values, but I also look at high fly writes on Seagate drives.

 

Thanks for the insights on those attributes. Is that pretty generic across manufacturers, or strictly Seagate. I always see some high fly writes on Seagates and never had any correlation with drive failures. What do you look for with that attribute?

Share this post


Link to post
27 minutes ago, bjp999 said:

 

Thanks for the insights on those attributes. Is that pretty generic across manufacturers, or strictly Seagate. I always see some high fly writes on Seagates and never had any correlation with drive failures. What do you look for with that attribute?

 

Not sure how many other manufacturers use encoded values, I only know of Seagate.  From the Wikipedia article for high fly writes, there is a head flying-height sensor 'which detects when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive'.  Maybe a handful or so seem normal to me, but when the number is as high (50+) as what the two smart reports are saying I begin to wonder if there are issues with the mechanics of the drive and how many weak writes have been performed.  If it were mine, running 1-2 full preclear cycles and seeing if the number increases and by much would determine if the drive gets shelved or continue usage.

 

 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.