pras1011 Posted December 15, 2010 Share Posted December 15, 2010 On hdtune my Samsung HD204ui F4 is showing that it has a SMART issue where there is a problem with calibration retry count 2. But Samsung's own diagnostic tool says its fine. Which one should I believe? Quote Link to comment
Joe L. Posted December 15, 2010 Share Posted December 15, 2010 On hdtune my Samsung HD204ui F4 is showing that it has a SMART issue where there is a problem with calibration retry count 2. But Samsung's own diagnostic tool says its fine. Which one should I believe? Get a SMART report on that drive. smartctl -d ata -a /dev/sdX where sdX = the device for your disk. Then, look at the normalized value for the calibration retry parameter. If it is nearing the failure threshold for that parameter, RMA the drive. If not, stop worrying, use the drive. Joe L. Quote Link to comment
pras1011 Posted December 15, 2010 Author Share Posted December 15, 2010 HDTune gives you all of that info anyway. 100 is the threshold and the error amount is 2. Quote Link to comment
Joe L. Posted December 15, 2010 Share Posted December 15, 2010 HDTune gives you all of that info anyway. 100 is the threshold and the error amount is 2. What is the NORMALIZED value for that parameter? It sounds as if you are telling me the RAW value=2. Joe L. Quote Link to comment
pras1011 Posted December 15, 2010 Author Share Posted December 15, 2010 I am not sure what you mean by normailsed value but it reads: Current Worst threshold data status calibration retry count 252 252 0 2 warning Joe can you check my preclear logs? The first hdd in the log passed without any issues. Which hdd would you RMA? Thanks. preclear_info.txt Quote Link to comment
Joe L. Posted December 15, 2010 Share Posted December 15, 2010 I will find out shortly. Joe can you check my preclear logs? The first hdd in the log passed without any issues. Which hdd would you RMA? Thanks. I see absolutely nothing wrong with any of those disks. No RMA would be in order on any of them. Quote Link to comment
pras1011 Posted December 15, 2010 Author Share Posted December 15, 2010 The problem hdd has this error but I am not sure what you mean about normailised value: Current Worst threshold data status calibration retry count 252 252 0 2 warning Quote Link to comment
Joe L. Posted December 15, 2010 Share Posted December 15, 2010 The problem hdd has this error but I am not sure what you mean about normailised value: Current Worst threshold data status calibration retry count 252 252 0 2 warning The normalized values are the first two in the list you gave. The "current" normalized value is 252. The worst ever normalized value for that parameter is 252. The failure threshold for that parameter is 0. If the current value goes BELOW the failure threshold the disk fails that SMART test and is considered as FAILING_NOW. The "data" column in your list is a "raw" value that has meaning only to the manufacturer in most cases. Here is a sample of a smartctl output (as I suggested you get) so you can see what a failing attribute looks like: Also note, there is no standard among disks. This disk has 100 as its starting normalized value for calibration attempts once the drive gets a few hours use, and a setting of 253 as it leaves the factory. The only standard is if the normalized current value is above the failure threshold, the drive is consider good by the SMART report. The drive shown below has a normalized value of 84 for re-allocated sectors and a failure threshold of 140. It is FAILING_NOW. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 38319 3 Spin_Up_Time 0x0027 040 040 021 Pre-fail Always - 15000 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 257 5 Reallocated_Sector_Ct 0x0033 [b][color=red]084[/color][/b] 084 [b][color=red]140[/color][/b] Pre-fail Always [b][color=red]FAILING_NOW[/color][/b] 927 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 4019 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 2 193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 6210 194 Temperature_Celsius 0x0022 122 102 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 550 197 Current_Pending_Sector 0x0032 199 196 000 Old_age Always - 338 198 Offline_Uncorrectable 0x0030 200 198 000 Old_age Offline - 103 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 179 151 000 Old_age Offline - 4355 Quote Link to comment
pras1011 Posted December 15, 2010 Author Share Posted December 15, 2010 Ok. Thanks. I understand SMART now! So if the 253 current value starts dropping towards 0 threshold value then there is a problem. I get it. Thanks Joe! Quote Link to comment
pras1011 Posted December 15, 2010 Author Share Posted December 15, 2010 I am preclearing the "bad" hdd again. Would you RMA it? Quote Link to comment
Joe L. Posted December 15, 2010 Share Posted December 15, 2010 I am preclearing the "bad" hdd again. Would you RMA it? I somehow don't think you understand SMART yet. I did not see any reason to RMA any drive. Did I miss something that you are seeing? If so, post the line you are concerned about so I don't have to read your mind. Joe L. Quote Link to comment
pras1011 Posted December 18, 2010 Author Share Posted December 18, 2010 Nope I do understand. However I have noticed something odd. I cleared 5 hdds at the same time and preclear reported errors on all hdds. I then precleared 2 of the worst hdds from the 5 one at a time and no errors at all!! Maybe preclear can't handle more than 1 hdd at a time? Quote Link to comment
BRiT Posted December 18, 2010 Share Posted December 18, 2010 Preclear can handle multiple simultaneous clears. I too think you do not understand SMART. What specifically in those SMART reports make you think the drives are faulty? Quote Link to comment
pras1011 Posted December 18, 2010 Author Share Posted December 18, 2010 I need to clarify. I do understand SMART. Lets go beyond that now. Once preclear had finshed on all 5 hdds (using alt f1, f2 , f3 etc), preclear reported errors and information. I precleared again two of the hdds with the most errors and information one at a time and this time there was no errors and information. Nothing. Just a clear screen. This leads me to believe that there could either be something wrong with preclear and multiple disks or something wrong with my motherboard. Quote Link to comment
Joe L. Posted December 18, 2010 Share Posted December 18, 2010 The pre-clear process shows you the DIFFERENCES between a SMART report taken at the beginning of the clearing process and one taken at the end. If no differences, no output will result. If Any differences, you'll get output. The output does not indicate an error, just a difference. There are a handful of lines that will always be different, those are filtered out from the output of the "diff" command. For example, I expect the power-on-hours to be different, so you;ll never see it in the "diff" output. Furthermore a drive could be FAILING_NOW in the beginning SMART report and also in the end SMART report and because it did not change it would not show in the "diff" output. Basically, do not use just the "diff" output to determine if a disk is failing. Use it in combination with the full smart report. For each disk there are two separate SMART reports in your /tmp directory. In the same way, they are also logged in your syslog. You can decide if your disks are incrementing the smart parameters... it has absolutely nothing to do with the pre-clear processing. All that said... if there is poor quality cabling and you get cross-talk and induced noise because you tightly cabled them together, or a noisy power supply, or disks that vibrate and cause adjacent disks to have a more difficult time in reading their disks because of the transmitted vibration, then yes, pre-clearing multiple disks at the same time may uncover a hardware issue with your server. It may not be a single disk... It may only surface when the are all active together. It is your hardware. You get to "defend" it. Just don't go returning a disk because of a single "read" error. All disks have read errors... some report them some do not... Thy just re-try and re-read the sector. You'll have to experiment and learn how your server performs. You'll just need to be aware that all your disks will be active when performing an initial parity calc, or when performing a parity check. Pre-clear can handle multiple disks being cleared at the same time... but can your hardware? It is a reporting tool. You can analyze the output and decide on your own. (now that you know how to interpret the results) A "disk calibration" error might be cause if the disk temperature changed so drastically during the pre-clear process that the disk platters changed physical size enough the heads had to re-calibrate. You have to analyze your own situation. If a disk is failing and you suspect it only acts up when multiple disks are spinning, re-test it. Joe L. Quote Link to comment
BRiT Posted December 18, 2010 Share Posted December 18, 2010 No we can not move beyond that. You do not understand preclear and SMART. What "errors" did preclear report? It typically reports differences between the SMART report before and after. It is perfectly natural for there to be differences that are NOT errors. A blank report means there are no differences. A full report only means there were differences. It does NOT mean there are errors. For example: The raw values can change but they do not indicate an error. The nominal values can change but they do not indicate an error. The threshold values can change but they do not indicate an error. The maximum values can change but they do not indicate an error. The minimum values can change but they do not indicate an error. Quote Link to comment
pras1011 Posted December 18, 2010 Author Share Posted December 18, 2010 Thanks Joe. I have checked and re-checked the hdd with the calibration retry count issue and it seems fine. Only time will tell! Quote Link to comment
pras1011 Posted January 1, 2011 Author Share Posted January 1, 2011 As I don't know much about SMART can you check these pre preclear results? before_preclear_2.txt Quote Link to comment
Joe L. Posted January 1, 2011 Share Posted January 1, 2011 As I don't know much about SMART can you check these pre preclear results? The file you attached is not the pre-clear results. Those are the initial SMART reports taken at the start of the pre-clearing process. The pre-clear process takes another smart report at its end and then shows you the differences between the initial report and the post clear report. You would need to post the final results for anyone to be able to know how it did. Joe L. Quote Link to comment
pras1011 Posted January 1, 2011 Author Share Posted January 1, 2011 Hi Joe, I was aware of that but one of HDDs in the initial SMART check before preclear was unusual. Quote Link to comment
Joe L. Posted January 1, 2011 Share Posted January 1, 2011 Hi Joe, I was aware of that but one of HDDs in the initial SMART check before preclear was unusual. Which drive was "unusual"? I saw one re-allocated sector on one disk... I guess I did not look close enough. What did you see? Joe L. Quote Link to comment
pras1011 Posted January 1, 2011 Author Share Posted January 1, 2011 The one at the end. JK1130YAH8NG5T. It has extra errors on it. I was just wondering what these extra bits meant. Quote Link to comment
pras1011 Posted January 2, 2011 Author Share Posted January 2, 2011 I have finished the preclear. They all have the same preclear result. I think this is because I did Ctrl C to cancel a previous preclear. preclear_BKKT_W31T_NETT_NG5T.txt Quote Link to comment
pras1011 Posted January 4, 2011 Author Share Posted January 4, 2011 Does anyone know what this means? Offline data collection status: (0x80)^IOffline data collection activity was never started. Offline data collection status: (0x84)^IOffline data collection activity was suspended by an interrupting command from host. Quote Link to comment
Joe L. Posted January 4, 2011 Share Posted January 4, 2011 Does anyone know what this means? Offline data collection status: (0x80)^IOffline data collection activity was never started. Offline data collection status: (0x84)^IOffline data collection activity was suspended by an interrupting command from host. offline data collection is typically a requested "short" or "long" smart test, although I've seen disks perform tests on their own when they are otherwise idle. From what your output is saying, you've never requested either a long or short test of the drive. The "offline" collection is aborted when the disk is spun down (The interrupting command is the spin-down command). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.