mckenna654 Posted May 22, 2017 Share Posted May 22, 2017 I received an email alert this afternoon stating the following: Event: unRAID array errors Subject: Warning [S-CHASSIS] - array has errors Description: Array has 1 disk with read errors Importance: warning Disk 1 - ST3000VN000-1H4167_Z3101PN6 (sdb) (errors 15) One of my disks now reads Device is disabled. Contents emulated and is marked as faulty (with a red cross) in the array status. Should I attempt to rebuild the array onto the disk? What's the best way of doing so? I am running a basic two disk RAID 1 (mirror) array. See attached for my server diagnostics zip. Please let me know if there is any other info I can provide. Thanks s-chassis-diagnostics-20170522-2158.zip Quote Link to comment
Zonediver Posted May 22, 2017 Share Posted May 22, 2017 (edited) The Disk has a lot of Raw Read Errors - so it seems this Disk is dying or allready dead. Also the Seek Error Rate is extremely high. Power On Hours: 21.068 - also a lot I would change the disk as soon as possible. EDIT: The second disk has also lots of errors and will fail soon i guess... Edited May 22, 2017 by Zonediver Quote Link to comment
mckenna654 Posted May 22, 2017 Author Share Posted May 22, 2017 1 minute ago, Zonediver said: Check/change your SATA-cable fist and see what happens. No problem. I will check that tomorrow. I have a HP N54L microserver so I'm not 100% sure on how it's all wired inside. Would moving the drive to a different bay achieve the same result? Quote Link to comment
JorgeB Posted May 22, 2017 Share Posted May 22, 2017 Move it to a different slot and rebuild to the same disk, if it fails again in the near future replace it. Quote Link to comment
mckenna654 Posted May 23, 2017 Author Share Posted May 23, 2017 16 hours ago, johnnie.black said: Move it to a different slot and rebuild to the same disk, if it fails again in the near future replace it. Should I remove the troubled disk from the array first? How do I unassign a disk? Quote Link to comment
JorgeB Posted May 23, 2017 Share Posted May 23, 2017 1 hour ago, jakeandchase said: How do I unassign a disk? Shutdown the server, swap the disk to a different slot, power back up, unassign the disk, start the array, stop the array, reassign the disk, start the array to begin the rebuild. Quote Link to comment
SSD Posted May 23, 2017 Share Posted May 23, 2017 On 5/22/2017 at 8:16 AM, Zonediver said: The Disk has a lot of Raw Read Errors - so it seems this Disk is dying or allready dead. Also the Seek Error Rate is extremely high. Power On Hours: 21.068 - also a lot I would change the disk as soon as possible. EDIT: The second disk has also lots of errors and will fail soon i guess... The Raw Read Error Rates are fine. The large raw value means nothing. It is likely binary data that we humans have no want to understand. The "value" and "worst" are at safe levels. Similarly, the Seek Error Rates are also fine. The "worst" has dipped to 60, but with a "thresh" of 30, is far from dipped below manufacturer failure threshold. And although normalized values are lower than I might expect for drives of this age, the fact that both drives have very similar values would lead me to believe that this is normal for this model drive. (But I'd be monitoring this going forward). The attributes we look closely at are the Reallocated Sectors and Pending Sectors. You also want to verify that none of the attributes are showing failing now or "in the past". None of these are concern here. The only thing that worries me about these drives is that they are 3T Seagates, and they have a bad track record. Quote Link to comment
Zonediver Posted May 23, 2017 Share Posted May 23, 2017 (edited) Raw Read Errors cant be fine... One of my WD-RED has lots of them since two weeks. I checked it outside the Arry and found out, that the transfer rate is about 1,2 MB/s at this position. The Disk has 11 Block-Errors so thats not fine and not normal. Edited May 23, 2017 by Zonediver Quote Link to comment
SSD Posted May 23, 2017 Share Posted May 23, 2017 @Zonediver The SMART data is one dimension of drive health. It has its limitations. Just because the smart report looks ok, it doesn't mean the drive is healthy. If you saw something that led you to do more investigation (which may or may not have been what the data was telling you), and you identified some slow sectors in the process that you believe are bad enough to look to replace the drive, there may have been some serendipity involved. But it is good if you found a drive issue. The "raw" values that appear on the smart report, are, with a few notable exceptions, not something a person can understand without technical specs not made available by the manufacturers. They are likely bit masks (first 3 bits mean this, next 2 bits mean that, etc., etc.) that vary by manufacturer and even by model. Turning this concatenation of bits into a decimal number is meaningless. These raw values are translated into a normalized scale, where normally 100 is "good" and the threshold for going bad is defined. Smart tracks both the current normalzed value and the worst the normalized value has ever gotten. Some attributes have raw values that are usable - things like reallocated sectors and current pending sectors - which are absolute counts and have been used in that consistent way for every drive I have looked at. These are some of the most useful attributes to track. Most important in the SMART attributes is not the absolute value, but a comparison over time. For example, having a drive with 1000 reallocated sectors that are rock solid and never increase with time, is probably better than having a drive with 20 reallocated sectors, and that number if growning with every parity check. I do not have a particularly warm place in my heart for WD or Seagate drives (although I have been pretty happy with the 8T Seagate archives given their price and my experience with them to date). I believe that the HGST's are the highest quality, and if cost were no object, I'd been buying those. 1 Quote Link to comment
Zonediver Posted May 23, 2017 Share Posted May 23, 2017 (edited) Yes that might be possible. This WD-RED is my first failing RED - and i use plenty of this type. Over the last 17 years i saw dying drives from IBM and - of course - Seagate, but only "one" WD. All my sorted out WD-Greens are still working well, but this "RED-failure" tells me, that the quality of WD goes down, so yes, if i have the money, i would prefere HGST too. Edited May 23, 2017 by Zonediver Quote Link to comment
mckenna654 Posted May 23, 2017 Author Share Posted May 23, 2017 I have changed slot and rebuilt the array with no issues. If it fails again i will post back. Thanks Quote Link to comment
seagate_surfer Posted May 31, 2017 Share Posted May 31, 2017 Hi, we are sorry to hear that you're experiencing issues with your Seagate drive. Just In case you encounter further issues with one of our drives, you can always contact our Customer Support or look into any warranty information here. Please feel free to reach out if you have any questions! Quote Link to comment
limetech Posted May 31, 2017 Share Posted May 31, 2017 1 hour ago, seagate_surfer said: Hi, we are sorry to hear that you're experiencing issues with your Seagate drive Hi surfer, please check your PM. Quote Link to comment
unevent Posted May 31, 2017 Share Posted May 31, 2017 On 5/23/2017 at 1:01 PM, bjp999 said: The Raw Read Error Rates are fine. The large raw value means nothing. It is likely binary data that we humans have no want to understand. The "value" and "worst" are at safe levels. Similarly, the Seek Error Rates are also fine. The "worst" has dipped to 60, but with a "thresh" of 30, is far from dipped below manufacturer failure threshold. And although normalized values are lower than I might expect for drives of this age, the fact that both drives have very similar values would lead me to believe that this is normal for this model drive. (But I'd be monitoring this going forward). The attributes we look closely at are the Reallocated Sectors and Pending Sectors. You also want to verify that none of the attributes are showing failing now or "in the past". None of these are concern here. The only thing that worries me about these drives is that they are 3T Seagates, and they have a bad track record. While not specifically replying to bjp999 rather using the post to redirect the focus to the Seagate error values since the thread took a couple turns. The seek error rate and the raw read error rate are 48 bit values. Convert the reported value to hexadecimal and the upper 16 bits is the number of errors and the lower 32 bits is the total number of seeks. So for the ST3000VN000-1H4167 drive, seek error rate of 91742336 converted to hex is 0x00000577E080. Upper 16: 0x0000 = 0 is zero seek errors over 0x0577E080 = 91,742,336 seeks. The high fly write count is high, however, on both Seagate drives. The advice from johnnie.black is good, or replace the drive and run preclear on this one and see if the high fly write count increases or not. As bjp999 mentioned, reallocated sectors and pending sectors are the typical watched values, but I also look at high fly writes on Seagate drives. 1 Quote Link to comment
SSD Posted May 31, 2017 Share Posted May 31, 2017 2 hours ago, unevent said: While not specifically replying to bjp999 rather using the post to redirect the focus to the Seagate error values since the thread took a couple turns. The seek error rate and the raw read error rate are 48 bit values. Convert the reported value to hexadecimal and the upper 16 bits is the number of errors and the lower 32 bits is the total number of seeks. So for the ST3000VN000-1H4167 drive, seek error rate of 91742336 converted to hex is 0x00000577E080. Upper 16: 0x0000 = 0 is zero seek errors over 0x0577E080 = 91,742,336 seeks. The high fly write count is high, however, on both Seagate drives. The advice from johnnie.black is good, or replace the drive and run preclear on this one and see if the high fly write count increases or not. As bjp999 mentioned, reallocated sectors and pending sectors are the typical watched values, but I also look at high fly writes on Seagate drives. Thanks for the insights on those attributes. Is that pretty generic across manufacturers, or strictly Seagate. I always see some high fly writes on Seagates and never had any correlation with drive failures. What do you look for with that attribute? Quote Link to comment
unevent Posted May 31, 2017 Share Posted May 31, 2017 27 minutes ago, bjp999 said: Thanks for the insights on those attributes. Is that pretty generic across manufacturers, or strictly Seagate. I always see some high fly writes on Seagates and never had any correlation with drive failures. What do you look for with that attribute? Not sure how many other manufacturers use encoded values, I only know of Seagate. From the Wikipedia article for high fly writes, there is a head flying-height sensor 'which detects when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive'. Maybe a handful or so seem normal to me, but when the number is as high (50+) as what the two smart reports are saying I begin to wonder if there are issues with the mechanics of the drive and how many weak writes have been performed. If it were mine, running 1-2 full preclear cycles and seeing if the number increases and by much would determine if the drive gets shelved or continue usage. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.