Guzzi Posted August 12, 2009 Share Posted August 12, 2009 Hi, I made another 4 preclears of disks formerly used in another windows raid-5. the Script claims there are some differences pre and post - can you have a look on the message of seekerrorrate and comment on it if it is something to worry? tnx, Guzzi =========================================================================== = unRAID server Pre-Clear disk /dev/sdb = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 21:41:19 ============================================================================ == == Disk /dev/sdb has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 58c58 < 7 Seek_Error_Rate 0x000e 100 253 051 Old_age Always - 0 --- > 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 63c63 < 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 72598 --- > 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 72599 =========================================================================== = unRAID server Pre-Clear disk /dev/sdc = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 23:20:10 ============================================================================ == == Disk /dev/sdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 58c58 < 7 Seek_Error_Rate 0x000e 100 253 051 Old_age Always - 0 --- > 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 63c63 < 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 73100 --- > 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 73101 =========================================================================== = unRAID server Pre-Clear disk /dev/sdd = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 26:25:24 ============================================================================ == == Disk /dev/sdd has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 58c58 < 7 Seek_Error_Rate 0x000e 100 253 051 Old_age Always - 0 --- > 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 63c63 < 193 Load_Cycle_Count 0x0032 173 173 000 Old_age Always - 81301 --- > 193 Load_Cycle_Count 0x0032 173 173 000 Old_age Always - 81306 =========================================================================== = unRAID server Pre-Clear disk /dev/sde = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 25:00:20 ============================================================================ == == Disk /dev/sde has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x82) Offline data collection activity < was completed without error. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. 63c63 < 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 72723 --- > 193 Load_Cycle_Count 0x0032 176 176 000 Old_age Always - 72724 ============================================================================ Quote Link to comment
Joe L. Posted August 12, 2009 Share Posted August 12, 2009 For that particular model of disk, the "Load_Cycle_Count" raw value seems to be incremented by one each time the disk heads are taken out of the "parked" position. Of the three columns of values, the first is the current value, the second is the worst value (lowest ever encountered) and the third is the failure threshold. For "Load_cycle_count" the disk will be considered as "failed" when the current value reached zero. At that point, the disk will be considered to be "worn out" by the firmware. (I'm guessing here... but the math almost looks like it might work something like this) Let's theorize that for that disk, the "current value" starts when the disk is brand new at 200. Let's also guess that every 3000 "load-cycles" will decrement the "Current" value by 1. If we used those... the numbers come out somewhat close... If I was a manufacturer, I'd use powers of 2 and divide by 1024, but you get the idea. Each manufacturer has their own internal algorithm... Using my math, you have over 500,000 more head-load cycles before the overall wear and tear on the drive is expected to be an issue. Of course, this is entirely a prediction by the manufacturer. Will the drive be "defective" at that point... no, not necessarily, but it will have been subjected to some wear.... The same can be said of the "Seek_Error_Rate." The worst value encountered so far is "200" and the threshold to fail is 51. I'd say it is doing just fine. Joe L. Quote Link to comment
Guzzi Posted August 12, 2009 Share Posted August 12, 2009 Joe, Thanks for your answer - to be honest, I do not fully understand those values - understand your theory ... is there some sort of standard values to look after when checking drive health? The idea to check the drives is good - but interpreting the values is difficult (at least for me ;-)) - Is it correct that sector reallocation is the thing to have focus on? Quote Link to comment
Joe L. Posted August 12, 2009 Share Posted August 12, 2009 Joe, Thanks for your answer - to be honest, I do not fully understand those values - understand your theory ... is there some sort of standard values to look after when checking drive health? The three columns of values are all internal to the drive. Each manufacturer has their own "normal" values and threshold for failures. The "raw" value is also only known to the manufacturer. All we can go by is "trends" ... and, of course, if a value meets its threshold, the drive will then fail the SMART test. (but may still be perfectly normal... just reached an end-of-life wear limit thought to be appropriate by the manufacturer.) The idea to check the drives is good - but interpreting the values is difficult (at least for me ;-)) - Is it correct that sector reallocation is the thing to have focus on? Interpreting is difficult for everybody... For the most part, re-allocated sectors is the one thing we know we can focus on... but even there, according to seagate, modern drives have thousands of spare sectors. If 6 are re-allocated when you first exercise the drive, in my mind it is fine, unless the numbers of re-allocated sectors increase more and more each time you use the disk. Then, it is replacement time. Look here for a good summary of what and how to interpret what you are seeing: http://en.wikipedia.org/wiki/S.M.A.R.T. Joe L. Quote Link to comment
Guzzi Posted August 13, 2009 Share Posted August 13, 2009 thanks for the infos - did some reading, lot's of details. Hmmm, I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails ;-) - just kidding - I like the concept of the preclear script very much - once reading and writing the whole HD before using it in production IS a help to discover problems in advance. At least I found 2 harddiscs behaving strange - will have a closer look to them after doing my migration to the healthy drives. Quote Link to comment
Joe L. Posted August 13, 2009 Share Posted August 13, 2009 I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails Don't you mean... just installing and being surprised, if when something fails :( On MS-Windows, these disk issues are not visible. We happily go along until the OS will no longer boot, or we cannot open the critical document we need... Now, I'm sure many of those issues are bugs in programs... but some might not be... some are disk sectors that become unreadable. Remember, there are only two types of disk drives. No, not IDE and SATA. The two disk drive types are: 1. Those disks that have already failed. 2. Those disks that have not YET failed, but will... it's just a matter of time. Joe L. Quote Link to comment
Guzzi Posted August 13, 2009 Share Posted August 13, 2009 I'm not sure, if I wasn't happier in total before thinking about my HDs - just installing and being surprised, if something fails Don't you mean... just installing and being surprised, if when something fails :( [...] Yes, you're absolutely right - but you noticed my smiley also, didn't you ... It IS a positive thing to get those extended informations - I appreciate it - and as you might have seen to my last posts: at least 2 drives of my former windows raid-5 do not behave good - and I am more than happy to identify them and throw them out of my box. It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-) Quote Link to comment
Joe L. Posted August 13, 2009 Share Posted August 13, 2009 It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-) Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files... The cost of a few new drives is small compared to the amount of time and effort needed otherwise. I hope your data transfer goes smoothly once you have a set of disks to move it to. From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive. Joe L. Quote Link to comment
heffe2001 Posted August 14, 2009 Share Posted August 14, 2009 Ran Preclear on a new WD 750 Green drive, and this is the before & after: Before: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 199 199 021 Pre-fail Always - 5050 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 0 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 126 119 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 After: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 199 199 021 Pre-fail Always - 5050 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 21 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 113 111 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 The only one that concerns me is the Raw_Read_Error_Rate, looks like the VALUE doubled, but the WORST went down, although the RAW_VALUE is still 0? Anybody see any reason I shouldn't put this drive in my unraid box? Quote Link to comment
Joe L. Posted August 14, 2009 Share Posted August 14, 2009 Ran Preclear on a new WD 750 Green drive, and this is the before & after: Before: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 199 199 021 Pre-fail Always - 5050 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 0 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 126 119 000 Old_age Always - 24 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 After: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 199 199 021 Pre-fail Always - 5050 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 21 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 8 194 Temperature_Celsius 0x0022 113 111 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 The only one that concerns me is the Raw_Read_Error_Rate, looks like the VALUE doubled, but the WORST went down, although the RAW_VALUE is still 0? Anybody see any reason I shouldn't put this drive in my unraid box? From what I've read, the "VALUE" is the current value. The "WORST" is frequently initialized at the factory with a starting value of 253, and the "THRESH" is a value where (when reached) that parameter will be considered as failed. So, for your drive, the "VALUE" would need to go down to 51 for the RAW_READ_ERROR_RATE parameter to be an issue. Since you have just put the drive into service, you should see the RAW_READ_ERROR_RATE "VALUE" will stay pretty stable over time. If, at some point in the future you find it getting closer to the THRESH value, it would indicate some kind of problem getting worse. At that point in time, you can replace the drive proactively. All that has happened in the pre-clear process is that the "VALUE" and "WORST" are now changed from the initialized factory values to those that are reflecting how your drive is actually performing. As far as the "raw" value, it may never change from 0 for that drive. The internal method used to calculate most parameters is only known by the drive manufacturer. I see no reason why you should not add the drive to your array. Looks pretty good to me. Joe L. Quote Link to comment
Guzzi Posted August 15, 2009 Share Posted August 15, 2009 It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-) Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files... The cost of a few new drives is small compared to the amount of time and effort needed otherwise. I hope your data transfer goes smoothly once you have a set of disks to move it to. From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive. Joe L. I appreciate the help and the abilities of your tools - I didn't complain, just reported back my experience. Please don't misunderstand me - I am happy to discover the problems in advance instead of having the trouble later and yes, you're completely right - the price of a new disk is nothing compared to trouble of a machine and the data on it - that's why I replaced the failing drives quickly with new ones... Quote Link to comment
Joe L. Posted August 15, 2009 Share Posted August 15, 2009 It's just the thing, that I didn't expect all that extra trouble - my initial plan was just move drives from windows to unraid, move data and finished ;-) Well... sorry about causing you extra "trouble" but then I figure you might want to avoid extra issues that can be uncovered before you move your files... The cost of a few new drives is small compared to the amount of time and effort needed otherwise. I hope your data transfer goes smoothly once you have a set of disks to move it to. From what you've said, your RAID-5 array would have had to deal with the defects on those two old disks at some point... and it might not have been as easy to swap in a new larger drive. Joe L. I appreciate the help and the abilities of your tools - I didn't complain, just reported back my experience. Please don't misunderstand me - I am happy to discover the problems in advance instead of having the trouble later and yes, you're completely right - the price of a new disk is nothing compared to trouble of a machine and the data on it - that's why I replaced the failing drives quickly with new ones... I did not misunderstand... Just wanted to save you, and any others reading this thread from problems you might otherwise avoid. I've spent many hours loading my array, I'm sure every unRAID owner's experience is much the same. I appreciate feedback... good and bad... Most important, I learn from everyone's experience... there is just no way for me to duplicate everyone's hardware and errors experienced. if the script needs improvement, I'm the first to admit it. I saw the "smileys" in your previous post and understood their meaning. I know your plans for a quick migration of data were put aside when the old disks you intended to use did not test well... but with new replacements in place it should be much better. I am hoping by now you are starting your data migration. Joe L. Quote Link to comment
Tom2000 Posted August 21, 2009 Share Posted August 21, 2009 Hi, I think this should be the right thread to post my question. I just purchased two 1.5 TB Samsung SATA drives and tried to use the preclear.sh script to prepare the HD. What I normally do is to connect the HD to the external SATA port on the system and ran though the preclear.sh scripts one disk at a time. Unfortunately Both of them returned the same unsuccessful results, which shows below. =========================================================================== = unRAID server Pre-Clear disk /dev/sdk = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Elapsed Time: 7:56:59 ============================================================================ == == SORRY: Disk /dev/sdk MBR could NOT be precleared == ============================================================================ 0+0 records in 0+0 records out 0 bytes (0 B) copied, 2.8617e-05 s, 0.0 kB/s 0000000 The only difference is that when I ran the first HD, the syslog was filled up with the following message like 600MB. I then deleted the syslog and do a touch command to created a new syslog. I then ran the preclear.sh on my second drives and the syslog remains size of 0. Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32563752 Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00 Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32579816 Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00 Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32595880 Aug 20 03:37:23 Tower kernel: sd 9:0:0:0: [sdk] Result: hostbyte=0x04 driverbyte=0x00 Aug 20 03:37:23 Tower kernel: end_request: I/O error, dev sdk, sector 32611944 With the message such as "SORRY: Disk /dev/sdk MBR could NOT be precleared", does it imply that both of my HD are defect? I appreciate if anyone can chime-in what I should do next? --Tom Quote Link to comment
Joe L. Posted August 21, 2009 Share Posted August 21, 2009 That message indicates that the cleared "signature" expected in the first 512 bytes of the disk was not found when the disk was read after being written to. Your error messages seem to indicate that reading the drive is failing. I'd stop the array, power down, and check the cabling. Odds are either it is not seated properly, or one of the cables to the drive is defective. You can use the preclear_disk.sh -t /dev/sdk command to test if the pre-clear was successful. (With all the errors, it might not have been) It will run in a few seconds and let you know if the disk is cleared. It also sounds as if you are hot-plugging the external disks... DO NOT... the SATA drives may be, but unRAID is NOT. You could cause yourself all kinds of grief. (I apologize if you had both plugged in at the same time, but it sounds as if you had one disk connected, and then the other.) The syslog filling with disk errors is not a good sign... I'd run smartctl -d ata -a /dev/sdk on the drive, to see what the full SMART report says. Joe L. Quote Link to comment
Tom2000 Posted August 21, 2009 Share Posted August 21, 2009 Hi Joe, Thanks for the analysis. Please see below for the smartctl command output and it seems to be fine to me. I did the hot-plug on the external SATA cable.Thanks for pointing out that to me since I did not know I am not supposed to do that. I am using putty to connect to unRAID server. Whenever I execute the command "preclear_disk.sh -t /dev/sdk", the session just terminated right away. I think I will go ahead stop the array and restart the server, and then run the preclear_disk.sh again. Thanks, --Tom ------------------------------------------------------------------------ root@Tower:~# smartctl -d ata -a /dev/sdk smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD154UI Serial Number: S1Y6J1KS744099 Firmware Version: 1AG01118 User Capacity: 1,500,301,910,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Fri Aug 21 12:44:04 2009 Local time zone must be set--see zic m ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (19393) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 34) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 071 071 011 Pre-fail Always - 9640 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 10 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 075 075 000 Old_age Always - 25 (Lifetime Min/Max 25/26) 194 Temperature_Celsius 0x0022 075 075 000 Old_age Always - 25 (Lifetime Min/Max 25/27) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 223195953 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 253 253 000 Old_age Always - 5 200 Multi_Zone_Error_Rate 0x000a 253 253 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
Joe L. Posted August 21, 2009 Share Posted August 21, 2009 I think I will go ahead stop the array and restart the server, and then run the preclear_disk.sh again. It sounds as if the out-of-memory kernel process is killing processes on your server. deleting the syslog does not free the space it uses if there is still a process that has an open file-descriptor writing to it. The blocks are freed only after there are no more references to it, and a open file-descriptor is a reference. Some programs actually take advantage of this behavior and create a temp file, open it for reading and writing, then delete it. Until the file-descriptors are closed, the temp file is still readable and writable by that program... The memory (and space) is automatically freed when the program exists. To stop the old syslog process, and restart it, type the following: /etc/rc.d/rc.syslog restart It should free up the memory and you should then see the new syslog file you created start to be used. Quote Link to comment
Joe L. Posted August 21, 2009 Share Posted August 21, 2009 As far as the hot-plug causing harm... Look through this thread After a hot plug, and a reboot when it did not work as expected, the user accidentally started a "parity check" with a drive that was not mounted. It ran for a minute or two before he stopped it It read "zeros" from the un-mounted drive and changed parity accordingly... Later, when a replacement drive was installed, those zeros were written to it instead of the normal file-system structures. Basically, he had wiped his data, from both parity and the drive. That hot-plug initiated actions that resulted in one of the few cases I know of where unRAID lost data. All that said, stop your array, reboot, and you'll probably be fine. Oh yeah... don't hot-plug... always stop the array and power down. Joe L. Quote Link to comment
Tom2000 Posted August 21, 2009 Share Posted August 21, 2009 Hi Joe, Thanks again for the explanations. Those are good notes that I will keep to maintain my unRAID server. I am still waiting for preclear_disk.sh script to finish and will update later on. --Tom Quote Link to comment
Tom2000 Posted August 22, 2009 Share Posted August 22, 2009 Hi Joe, The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right? Thanks, --Tom =========================================================================== = unRAID server Pre-Clear disk /dev/sdj = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 14:36:38 ============================================================================ == == Disk /dev/sdj has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 71c71 < 190 Airflow_Temperature_Cel 0x0022 075 075 000 Old_age Always - 25 (Lifetime Min/Max 25/26) --- > 190 Airflow_Temperature_Cel 0x0022 072 072 000 Old_age Always - 28 (Lifetime Min/Max 25/28) 77c77 < 200 Multi_Zone_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 ============================================================================ Quote Link to comment
Joe L. Posted August 22, 2009 Share Posted August 22, 2009 Hi Joe, The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right? Thanks, --Tom =========================================================================== = unRAID server Pre-Clear disk /dev/sdj = cycle 1 of 1 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 14:36:38 ============================================================================ == == Disk /dev/sdj has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 71c71 < 190 Airflow_Temperature_Cel 0x0022 075 075 000 Old_age Always - 25 (Lifetime Min/Max 25/26) --- > 190 Airflow_Temperature_Cel 0x0022 072 072 000 Old_age Always - 28 (Lifetime Min/Max 25/28) 77c77 < 200 Multi_Zone_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 ============================================================================ After 14+ hours your disk temperature went from 25C to 28C. I'd say that in itself is not too serious but it does say you have some serious fans. The S.M.A.R.T. wiki here indicates that attribute 200 is 200 C8 Write Error Rate / Multi-Zone Error Rate The total number of errors when writing a sector. You started with the default initialized value of 253, and after a full 14+ hour pre-clear cycle, it has a normalized value of 100. The failure threshold is 0. You are nowhere close to the failure threshold value, so unless it changes over time, you are fine there too. Joe L. Quote Link to comment
jbuszkie Posted August 23, 2009 Author Share Posted August 23, 2009 I just ran 2 disks single cycle. One disk was fine the other was not so much. Do you agree that this might be an RMA canidate? I'm running a sencond cycle to be sure.. S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 57c57 < 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 --- > 1 Raw_Read_Error_Rate 0x000f 099 099 051 Pre-fail Always - 5005 66c66 < 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 --- > 13 Read_Soft_Error_Rate 0x000e 099 099 000 Old_age Always - 4648 69c69 < 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 --- > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4952 71c71 < 190 Airflow_Temperature_Cel 0x0022 070 070 000 Old_age Always - 30 (Lifetime Min/Max 30/30) --- > 190 Airflow_Temperature_Cel 0x0022 068 067 000 Old_age Always - 32 (Lifetime Min/Max 30/33) 74c74 < 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 --- > 197 Current_Pending_Sector 0x0012 092 092 000 Old_age Always - 331 78c78 < 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 201 Soft_Read_Error_Rate 0x000a 097 097 000 Old_age Always - 228 ============================================================================ Quote Link to comment
Joe L. Posted August 23, 2009 Share Posted August 23, 2009 I just ran 2 disks single cycle. One disk was fine the other was not so much. Do you agree that this might be an RMA canidate? I'm running a sencond cycle to be sure.. S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 57c57 < 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 --- > 1 Raw_Read_Error_Rate 0x000f 099 099 051 Pre-fail Always - 5005 66c66 < 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0 --- > 13 Read_Soft_Error_Rate 0x000e 099 099 000 Old_age Always - 4648 69c69 < 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 --- > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 4952 71c71 < 190 Airflow_Temperature_Cel 0x0022 070 070 000 Old_age Always - 30 (Lifetime Min/Max 30/30) --- > 190 Airflow_Temperature_Cel 0x0022 068 067 000 Old_age Always - 32 (Lifetime Min/Max 30/33) 74c74 < 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 --- [glow=yellow,2,300]> 197 Current_Pending_Sector 0x0012 092 092 000 Old_age Always - 331[/glow] [glow=pink,2,300] <-- This looks like a good candidate for an RMA to me.[/glow] 78c78 < 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 --- > 201 Soft_Read_Error_Rate 0x000a 097 097 000 Old_age Always - 228 ============================================================================ I see an RMA in your future. Not sure why the 331 sectors were not re-allocated already, unless the failures were in the post-read, and no subsequent "write" has happened since then to those sectors. Joe L. Quote Link to comment
stoner Posted August 23, 2009 Share Posted August 23, 2009 Hi all, First post so do excuse me if I my questions have been asked before... I am currently trying out the free version (before I put my money down) and running it on an old HP P4 1.9 Ghz server. I have 3 new Seagate Barracuda ES 750Gb hdds which I am preclearing. So far the SMART reports look fine although my temps are a bit high as I live awfully near the Equator (ambient temp is 30+...bah). As it's an old server, I am running the 3 hdds off a PCI SATA card. When preclearing a single drive, I am getting about 30-35 mb/s. Same for 2 drives and down to 15 mb/s for all 3 drives. In terms of hours, that's 13 for 1-2 drives and 23 for 3 drives. Based on the what I have read in the threads, I should be taking about 10 hours for a single drive alone. Not sure if the line below that says "Write cache: disabled" is normal.. [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA I have attached the syslog when preclearing all 3 drives below. /dev/hda is a CF card in an IDE adaptor so you can ignore the errors. The other thing I am concerned about is that one of the hdds (/dev/sdc) makes a squeaking sound when starting up. No more weird sounds after that. I am told that this is normal for Seagate hdds and nothing to worry about. Still a bit worried as the other 2 do not make the same sound starting up. So should I be concerned? Thanks for everyone's attention. If this is the wrong thread, do let me know and I will start a new one instead. Quote Link to comment
RobJ Posted August 24, 2009 Share Posted August 24, 2009 I am currently trying out the free version (before I put my money down) and running it on an old HP P4 1.9 Ghz server. I have 3 new Seagate Barracuda ES 750Gb hdds which I am preclearing. So far the SMART reports look fine although my temps are a bit high as I live awfully near the Equator (ambient temp is 30+...bah). As it's an old server, I am running the 3 hdds off a PCI SATA card. When preclearing a single drive, I am getting about 30-35 mb/s. Same for 2 drives and down to 15 mb/s for all 3 drives. In terms of hours, that's 13 for 1-2 drives and 23 for 3 drives. Based on the what I have read in the threads, I should be taking about 10 hours for a single drive alone. Not sure if the line below that says "Write cache: disabled" is normal.. [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA That is not normal, both read and write caching are usually enabled. I don't have any ideas as to what to make of it though. You are using a board based on VIA chipsets, which is usually problematic. I can't say it won't work, but I have seen little success with VIA based boards here. I know you said the 3 drives are "Seagate Barracuda ES 750Gb" drives, but they don't look like anything I have ever seen before. Neither the Linux kernel or Smartctl 5.38 were able to identify the manufacturer. They are identified by SMART as (with different serial numbers): Device Model: GB0750C4414 Serial Number: 5QD51Y9T Firmware Version: HPG4 Perhaps someone with experience with the ES series of drives can help here. The other thing I am concerned about is that one of the hdds (/dev/sdc) makes a squeaking sound when starting up. No more weird sounds after that. I am told that this is normal for Seagate hdds and nothing to worry about. Still a bit worried as the other 2 do not make the same sound starting up. So should I be concerned? Squeaking sounds are definitely not normal either, I don't think I have ever heard a hard drive squeak. Are you positive that the squeaks are coming from the drive, and not a fan? Quote Link to comment
stoner Posted August 24, 2009 Share Posted August 24, 2009 That is not normal, both read and write caching are usually enabled. I don't have any ideas as to what to make of it though. You are using a board based on VIA chipsets, which is usually problematic. I can't say it won't work, but I have seen little success with VIA based boards here. Sounds bad. Guess I have to find out more on this. Read speeds are 70+ mb/s using hdparm -tT so no problems there. It's the write speed that is pathetic. Although once Unraid is up, write speeds will probably be limited more by the network speed. I know you said the 3 drives are "Seagate Barracuda ES 750Gb" drives, but they don't look like anything I have ever seen before. Neither the Linux kernel or Smartctl 5.38 were able to identify the manufacturer. They are identified by SMART as (with different serial numbers): Device Model: GB0750C4414 Serial Number: 5QD51Y9T Firmware Version: HPG4 Perhaps someone with experience with the ES series of drives can help here. These are actually OEM Seagate drives for HP so you wouldn't find the model number on Seagate's website. Squeaking sounds are definitely not normal either, I don't think I have ever heard a hard drive squeak. Are you positive that the squeaks are coming from the drive, and not a fan? Yup. Pretty sure as I put my ear to the hdd when pressing the power on button. Everything else appears to be okay so I am a bit reluctant to exchange for a new one which may or may not be better (if the squeaking sound is not a major problem). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.