RobJ Posted March 17, 2009 Share Posted March 17, 2009 Tom had to deal with this quite awhile back, it's somewhere in the Release Notes. We used to have occasional posts about temps not showing for certain drives, and we would try to work through why SMART was not enabled for that drive. Tom decided he might as well always enable it first, and I don't think we have seen problems like that since. It baffles me why there is even an option to have SMART disabled, in any drive. You don't have to use the SMART data. What advantage could there possibly be to having SMART disabled? (I'm not referring to unRAID, just drives in general.) And I really find it incomprehensible that a tool like Drobo (possibly the closest and most similar competitor to unRAID) would have SMART turned off on the drives that GoChris pulled! Does that make any sense at all? Can you imagine choosing to run a tool like unRAID without SMART data? Quote Link to comment
Tom2000 Posted March 26, 2009 Share Posted March 26, 2009 Hi, I am running unRAID pro version 4.4 and ran into problem tonight. I just purchased two 1TG hard disk. One is Western Digital WD10EADS and the other one is Samsung Spinpoint F1 HD103UJ. I use the preclear_disk.sh script to clear WD hard drive yesterday and it was running smoothly and finished today for two rounds. Then today before I ran the preclear_disk.sh script on Samsung, I have the problem of two disks showing unformatted. I searched the forum and installed the powerdown scripts. After rebooting the server, I started to run the preclear_disk.sh script on Samsung and encounter the following problems: 1. The preclear_disk.sh script complain that some libraries from smartctl can not be found, so I re-install the cxxlibs-6.0.8-i486-4.tgz package again. It seems to me that I need to re-install the package if I reboot the server. Is it right? 2. There are tons of error showing in the syslog and actually make the unRaid system not functional. I am no longer able to copy or delete files in the system. I think that is due to the continuous error on the system. Here are the messages which repeat like crazy in the syslog. Mar 26 03:20:56 Tower kernel: sd 7:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Mar 26 03:20:56 Tower kernel: end_request: I/O error, dev sdg, sector 1953520064 Mar 26 03:20:56 Tower kernel: sd 7:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Mar 26 03:20:56 Tower kernel: end_request: I/O error, dev sdg, sector 1953520064 I put the new disk in an enclosure and run the disk to the external SATA port on the machine. I want to clear the disk first before actually installing in the system. I thought it might be a problem on the physical drive, so I connect the hard drive to my Window XP laptop via the USB port. Then I partition and format the drive. It runs oK and I also copied a few files in the disk. I deleted a lot of error message on the syslog to make it smaller size. The syslog is attached in this post. Please take a look at the log and let me know how I can fix the problem. Your help is very much appreciated. --Tom Quote Link to comment
Joe L. Posted March 26, 2009 Share Posted March 26, 2009 Hi, I am running unRAID pro version 4.4 and ran into problem tonight. I just purchased two 1TG hard disk. One is Western Digital WD10EADS and the other one is Samsung Spinpoint F1 HD103UJ. I use the preclear_disk.sh script to clear WD hard drive yesterday and it was running smoothly and finished today for two rounds. Then today before I ran the preclear_disk.sh script on Samsung, I have the problem of two disks showing unformatted. I searched the forum and installed the powerdown scripts. After rebooting the server, I started to run the preclear_disk.sh script on Samsung and encounter the following problems: 1. The preclear_disk.sh script complain that some libraries from smartctl can not be found, so I re-install the cxxlibs-6.0.8-i486-4.tgz package again. It seems to me that I need to re-install the package if I reboot the server. Is it right? Correct...you need to re-install it each time you reboot. This is fixed in the 4.5-beta3 release (the missing library is no longer missing) 2. There are tons of error showing in the syslog and actually make the unRaid system not functional. I am no longer able to copy or delete files in the system. I think that is due to the continuous error on the system. Here are the messages which repeat like crazy in the syslog. Mar 26 03:20:56 Tower kernel: sd 7:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Mar 26 03:20:56 Tower kernel: end_request: I/O error, dev sdg, sector 1953520064 Mar 26 03:20:56 Tower kernel: sd 7:0:0:0: [sdg] Result: hostbyte=0x04 driverbyte=0x00 Mar 26 03:20:56 Tower kernel: end_request: I/O error, dev sdg, sector 1953520064 Looks like communications to the drive stopped at some point, as it seemed to complain about each sector in turn it tried to access. I put the new disk in an enclosure and run the disk to the external SATA port on the machine. I want to clear the disk first before actually installing in the system. I thought it might be a problem on the physical drive, so I connect the hard drive to my Window XP laptop via the USB port. Then I partition and format the drive. It runs oK and I also copied a few files in the disk. I deleted a lot of error message on the syslog to make it smaller size. The syslog is attached in this post. Please take a look at the log and let me know how I can fix the problem. Your help is very much appreciated. --Tom Partitioning writes to the first sector only... It tells you very little about the true health of the drive. (other than it can read and write the first 512 bytes) Formatting a disk only write to a small handful of the sectors on a disk. It is very possible for it to be successful and still have problems reading and writing to other sectors on the disk not involved in formatting. It sounds a lot like you had a bad connection to the drive when it was attached to the unRAID array... either a bad cable, of a loose connection, or a bad drive tray connection. Odds are the drive is OK. Yes, when you have 1TB of bytes on a disk, trying to log a failure writing/reading every sector will quickly fill the syslog and use up all memory. You should run a smartctl report on the drive (or run it through another preclear_disk cycle, as it does a pre and post smartctl report on the drives.) Joe L. Quote Link to comment
Tom2000 Posted March 26, 2009 Share Posted March 26, 2009 Hi Joe, Thanks for your reply. I have re-installed the drive in the enclosure, but it seems to be behave the same. Since I have only one SATA enclosure and cable, I might just installed the drive in the system and run the preclear_disk.sh again. I ran a smartctl -H against it and failed: -------------------- root@Tower:/boot/packages# smartctl -H /dev/sdg smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Short INQUIRY response, skip product id A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. ------------------ I ran another "smartctl -al"l command and here is the result: ---------------- root@Tower:/boot/packages# smartctl --all /dev/sdg smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: SAMSUNG HD103UJ Serial Number: S13PJ9DS302065 Firmware Version: 1AA01113 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Thu Mar 26 12:26:15 2009 Local time zone must be set--see zic m ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (11788) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 197) minutes. Conveyance self-test routine recommended polling time: ( 21) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0007 078 078 011 Pre-fail Always - 7590 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 11 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 0 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11 13 Read_Soft_Error_Rate 0x000e 253 253 000 Old_age Always - 0 183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 184 Unknown_Attribute 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 078 077 000 Old_age Always - 22 (Lifetime Min/Max 22/22) 194 Temperature_Celsius 0x0022 078 077 000 Old_age Always - 22 (Lifetime Min/Max 22/22) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 405 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 3 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@Tower:/boot/packages# ------------------------------- Thanks, --Tom Quote Link to comment
RobJ Posted March 28, 2009 Share Posted March 28, 2009 It sounds a lot like you had a bad connection to the drive when it was attached to the unRAID array... either a bad cable, of a loose connection, or a bad drive tray connection. Completely agree. Plus, the UDMA_CRC_Error_Count increased from 1 to 3, which is also indicative of cable or other interface issue. Most of those syslog errors occurred after the drive was disabled at 03:20:40, which is like 'pulling the plug'. It's generally fatal, and you can ignore all errors that subsequently occur. I would not bother with any further testing until you can replace that SATA cable, or discover something loose in the power cabling or connectors. The drive itself looks fine. Quote Link to comment
oconnellc Posted March 29, 2009 Share Posted March 29, 2009 Any ideas why the preclear script wouldn't run? I got a new WD 'Green' drive and tried to run preclear and after typing 'Yes', I got the preclear screen, but it was just frozen and nothing happened. So, I decided to just go ahead and add the drive to the array, to see what would happen. Unraid added the drive, but I had to wait about 4 hours while unraid cleared the disk. No problems there. So, after that happened, I removed it from the array and tried to run preclear again. I ran just fine, until it got to the last step of reading the disk for the final time and froze 88% of the way through. So, I stopped it and tried to run preclear again. This time, it froze and wouldn't run. So, I added it back to the array and I waited for another 4 hours while unraid cleared it again. This surprised me as I figured unraid should see the disk as cleared. So, after waiting the 4 hours, I tried to run preclear again. Again, no luck. So, I decided to do a 'smartctl --test=long /dev/sdb' and so now I have to wait 255 minutes. Any ideas what is going on? Quote Link to comment
Joe L. Posted March 31, 2009 Share Posted March 31, 2009 Any ideas why the preclear script wouldn't run? I got a new WD 'Green' drive and tried to run preclear and after typing 'Yes', I got the preclear screen, but it was just frozen and nothing happened. So, I decided to just go ahead and add the drive to the array, to see what would happen. Unraid added the drive, but I had to wait about 4 hours while unraid cleared the disk. No problems there. So, after that happened, I removed it from the array and tried to run preclear again. I ran just fine, until it got to the last step of reading the disk for the final time and froze 88% of the way through. So, I stopped it and tried to run preclear again. This time, it froze and wouldn't run. So, I added it back to the array and I waited for another 4 hours while unraid cleared it again. This surprised me as I figured unraid should see the disk as cleared. So, after waiting the 4 hours, I tried to run preclear again. Again, no luck. So, I decided to do a 'smartctl --test=long /dev/sdb' and so now I have to wait 255 minutes. Any ideas what is going on? If the preclear script is failing to complete it indicates some issue with reading or writing the drive. I would first suspect the SATA cable. I'd replace it. You might see errors in the syslog corresponding to the times the freezes occur. Another thing to check... Make sure you have properly set the voltage on the system memory in your BIOS. Many motherboards do not set it properly, and often memory needs very specific timing check your memory and BIOS settigs for it too. All kinds of strange errors will occur when system memory is unable to store the correct values. Joe L. Quote Link to comment
oconnellc Posted March 31, 2009 Share Posted March 31, 2009 Ok. Thanks for the tip. It is a new drive, so I went into the guts of the thing, unplugged and replugged the SATA cable back into the drive and made sure I pushed extra hard to push it both into the drive and the mobo. Then, for grins, I decided to try again and preclear is now running. I'm 15GB into reading a 1TB drive. So, I should know more by morning. If I'm reading this right, either the cable is bad, or I didn't have it plugged in 'well' the first time. So, for my own edification, I would appreciate if anyone could answer a question or two. If the either of these two is correct (bad cable or badly connected), why did everything other than preclear seem to work. The drive got exported and I could add it to the array. unraid would preclear it (something that took several hours and resulted in many many writes to the drive). I was also able to complete most of a preclear cycle on the disk the first time around. Shouldn't that preclear cycle have failed at the same place (at the beginning, instead of several hours in)? Is this a function of quantity, not quality? Could I have read 1 byte from the drive every day forever, but as soon as I tried to read a 3 GB video file, would I have crashed and burned? Thanks again for your help and the cool utility. Chris Quote Link to comment
oconnellc Posted April 8, 2009 Share Posted April 8, 2009 So, I bought a new SATA cable, but just for grins decided to give the old one one last try by plugging it into a different plug on the mobo. I set preclear up to run for 4 cycles and it worked. Go figure. However, I get a message when the last cycle ran and I don't know how to interpret it. Does this make sense to anyone? =========================================================================== = unRAID server Pre-Clear disk /dev/sdc = cycle 4 of 4 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 44:59:41 ============================================================================ == == Disk /dev/sdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x82) Offline data collection activity < was completed without error. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. ============================================================================ Thanks, Chris Quote Link to comment
Joe L. Posted April 8, 2009 Share Posted April 8, 2009 So, I bought a new SATA cable, but just for grins decided to give the old one one last try by plugging it into a different plug on the mobo. I set preclear up to run for 4 cycles and it worked. Go figure. However, I get a message when the last cycle ran and I don't know how to interpret it. Does this make sense to anyone? =========================================================================== = unRAID server Pre-Clear disk /dev/sdc = cycle 4 of 4 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 44:59:41 ============================================================================ == == Disk /dev/sdc has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x82) Offline data collection activity < was completed without error. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. ============================================================================ Thanks, Chris Basically... The process took 45 hours for 4 pre-read/clear/post-read cycles, during which you kept the disk *very* busy... The process did not increase any error count between the SMART report done prior to the first cycle and the one done after the last. If you were to look in your syslog you will find both of the SMART reports in their entirety. You will find that the power on time and temperature changed between the two reports, but otherwise the reports are nearly identical. Did you by chance have a "Long" or "Short" status test queued up when you started the preclear process? (As far as I know, the "Offline data collection activity" refers to those two activities) In any case, it looks like a nicely working disk. Your new SATA cable is working well. Joe L. Quote Link to comment
oconnellc Posted April 8, 2009 Share Posted April 8, 2009 Did you by chance have a "Long" or "Short" status test queued up when you started the preclear process? (As far as I know, the "Offline data collection activity" refers to those two activities) In any case, it looks like a nicely working disk. Your new SATA cable is working well. That is nice to hear. I did set up at least on smart status report. I don't remember when exactly, but I'm guessing that that could be what you are referring to. Also, just to make sure (I'm anal and paranoid, a combination that bothers my wife to no end), I'm actually still using the old cable. I just pulled the old cable out of one of the plugs on the mobo and put it into a different one. I'm wondering if the plug on the mobo could be bad? Thanks again for your help. How is it that free support on a board like this is better than paid support for so many products? Chris Quote Link to comment
bman Posted April 24, 2009 Share Posted April 24, 2009 Maybe it's just because I'm not a linux expert, but is there a way to easily set up a batch preclear.sh to perform this on multiple drives? I know this is not normally needed, but as the script is useful for pre-screening drives for failure, it would be nice to do this on all the new drives I just received in order to ensure a smooth server setup when I get back to work from the weekend. Just a thought. Quote Link to comment
Joe L. Posted April 26, 2009 Share Posted April 26, 2009 Maybe it's just because I'm not a linux expert, but is there a way to easily set up a batch preclear.sh to perform this on multiple drives? I know this is not normally needed, but as the script is useful for pre-screening drives for failure, it would be nice to do this on all the new drives I just received in order to ensure a smooth server setup when I get back to work from the weekend. Just a thought. There are several ways to do this: 1. use multiple "telnet" sessions to log onto unRAID. Run one preclear_disk.sh script in each session. (This is what I usually do) 2. Log into the system console using mutiple "consoles" (Control-Alt-F1 through Control-Alt-F6 will switch between the six availale system consoles) Run one preclear_disk.sh per console. (Switch between them as needed to review their progress) 3. Install and run "screen" a program designed to allow you to have as many virtual "screens" as desired and switch between them with a hot-key-sequence. It is described in this post: http://lime-technology.com/forum/index.php?topic=2817.msg24825#msg24825 Once you invoke it with "screen" you can start up a preclear_disk.sh, then type "Control-A c" to get a new console, start another preclear_disk.sh, type "Control-A c" to again get a new virtual screen, start a third preclear_disk.sh, etc. You can at any time type "Control-A n" or "Control-A p" to switch to the next or previous virtual screen to track their progress. You can type "Control-A ?" to get a list of possible commands to manage the screen consoles. A brief tutorial on how to use screen is here: http://www.rackaid.com/resources/linux-tutorials/general-tutorials/using-screen/ You can even detach from screen, allowing you to close the telnet session and re-attach later. To detach type "Control-A d" Then, as a later time, type screen -r to re-attach. Another good article on "screen" can be found here: http://www.linuxjournal.com/article/6340 It can do a lot more. You can "name" the screen sessions, list the sessions Control-A " (Control-A followed by a "quote") Joe L. Quote Link to comment
bman Posted April 27, 2009 Share Posted April 27, 2009 There are several ways to do this: 1. use multiple "telnet" sessions to log onto unRAID. Run one preclear_disk.sh script in each session. (This is what I usually do) Ahh, why didn't I think of that? Still a little overwhelmed, I guess Thanks for the tips. Byron Quote Link to comment
Rea1ity56 Posted April 29, 2009 Share Posted April 29, 2009 Maybe it's just because I'm not a linux expert, but is there a way to easily set up a batch preclear.sh to perform this on multiple drives? I know this is not normally needed, but as the script is useful for pre-screening drives for failure, it would be nice to do this on all the new drives I just received in order to ensure a smooth server setup when I get back to work from the weekend. Just a thought. There are several ways to do this: 1. use multiple "telnet" sessions to log onto unRAID. Run one preclear_disk.sh script in each session. (This is what I usually do) 2. Log into the system console using mutiple "consoles" (Control-Alt-F1 through Control-Alt-F6 will switch between the six availale system consoles) Run one preclear_disk.sh per console. (Switch between them as needed to review their progress) 3. Install and run "screen" a program designed to allow you to have as many virtual "screens" as desired and switch between them with a hot-key-sequence. It is described in this post: http://lime-technology.com/forum/index.php?topic=2817.msg24825#msg24825 Once you invoke it with "screen" you can start up a preclear_disk.sh, then type "Control-A c" to get a new console, start another preclear_disk.sh, type "Control-A c" to again get a new virtual screen, start a third preclear_disk.sh, etc. You can at any time type "Control-A n" or "Control-A p" to switch to the next or previous virtual screen to track their progress. You can type "Control-A ?" to get a list of possible commands to manage the screen consoles. A brief tutorial on how to use screen is here: http://www.rackaid.com/resources/linux-tutorials/general-tutorials/using-screen/ You can even detach from screen, allowing you to close the telnet session and re-attach later. To detach type "Control-A d" Then, as a later time, type screen -r to re-attach. Another good article on "screen" can be found here: http://www.linuxjournal.com/article/6340 It can do a lot more. You can "name" the screen sessions, list the sessions Control-A " (Control-A followed by a "quote") Joe L. This script is great. I just received two 1.5TB Seagate drives. Ran 2 cycles on one drive in about 24 hours and running one more for a clear mind. I didn't know about multiple consoles until readin this lol so now I have the second drive running 3 cycles. You guys have been great. This thread really cleared some things up that I didn't understand about the smart report. Quote Link to comment
JimmyJoe Posted May 3, 2009 Share Posted May 3, 2009 Fantastic thread. Many thanks for the excellent script and great information. This has been very helpful for me and a great burn-in tool for my new unraid server and drives! Thanks again. Quote Link to comment
BryantD Posted May 4, 2009 Share Posted May 4, 2009 How do you stop the pre-clear? When I rebooted my system I forgot to re-install smart tools so it won't have the beginning and end comparisions. Quote Link to comment
Joe L. Posted May 4, 2009 Share Posted May 4, 2009 How do you stop the pre-clear? When I rebooted my system I forgot to re-install smart tools so it won't have the beginning and end comparisions. Type Control-C Hold the control key down and press the letter "C" Quote Link to comment
BryantD Posted May 4, 2009 Share Posted May 4, 2009 How do you stop the pre-clear? When I rebooted my system I forgot to re-install smart tools so it won't have the beginning and end comparisions. Type Control-C Hold the control key down and press the letter "C" Thanks. One day I think I'll mess with my Go script......... Quote Link to comment
BryantD Posted May 5, 2009 Share Posted May 5, 2009 Well I ran the preclear_disk.sh script on my new 1.5T Seagate. It took 12:26.47 to complete successfully. Awesome program. None of the Smart changes were of the important variety so I feel good about that. Thats Joe L. (and all the other people who do so much here), this forum is 2nd to none. Quote Link to comment
jbuszkie Posted May 14, 2009 Share Posted May 14, 2009 I am running this on my new 1.5TB Maxtor Green. It did one full pass that seemed to work. On the second pass, it didn't finish. I looked in /tmp for the smart logs, but it appears to have been deleted. where should I look to see what happened? I remember seeing something about not being able to do something with the MBR. My putty session got killed when I rebooted. I'm going to try again and see if it was just some sort of fluke. My syslog file is 500Meg! With a whole ton of these: May 14 04:48:23 Tower kernel: end_request: I/O error, dev sdd, sector 2930137216 May 14 04:48:23 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 And I see this May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930043648 May 14 04:48:22 Tower kernel: __ratelimit: 78016 callbacks suppressed May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255456 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255457 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255458 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255459 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255460 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255461 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255462 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255463 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255464 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255465 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930044672 As well Is there anything I should look for? Jim Quote Link to comment
RobJ Posted May 14, 2009 Share Posted May 14, 2009 These are just followup to the original real error. Locate the first error sequences involving sdd or sd 6:0:0:0. Also determine which drive sdd is, whether it is your new Maxtor Green, or a different drive that has decided to fail now. Quote Link to comment
Joe L. Posted May 14, 2009 Share Posted May 14, 2009 I am running this on my new 1.5TB Maxtor Green. It did one full pass that seemed to work. On the second pass, it didn't finish. I looked in /tmp for the smart logs, but it appears to have been deleted. where should I look to see what happened? I remember seeing something about not being able to do something with the MBR. My putty session got killed when I rebooted. I'm going to try again and see if it was just some sort of fluke. My syslog file is 500Meg! With a whole ton of these: May 14 04:48:23 Tower kernel: end_request: I/O error, dev sdd, sector 2930137216 May 14 04:48:23 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 And I see this May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930043648 May 14 04:48:22 Tower kernel: __ratelimit: 78016 callbacks suppressed May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255456 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255457 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255458 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255459 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255460 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255461 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255462 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255463 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255464 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255465 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930044672 As well Is there anything I should look for? Jim Assuming that /dev/sdd is your new disk... it looks like it stopped responding. Might be a loose cable, either power or data... It is very easy for some sata cables to come loose. If not loose, then odds are the disk died an early death. Can you do a hdparm -I /dev/sdd or smartctl -a -d ata /dev/sdd and get anything back at all? If the disk did die an early death... sorry, but the script did exactly as designed... it helped identify an early failure. Be happy it failed before you added it to your array... It takes a lot more time to replace it once it has data on it. Joe L. Quote Link to comment
jbuszkie Posted May 14, 2009 Share Posted May 14, 2009 Assuming that /dev/sdd is your new disk... it looks like it stopped responding. Might be a loose cable, either power or data... It is very easy for some sata cables to come loose. If not loose, then odds are the disk died an early death. Can you do a hdparm -I /dev/sdd or smartctl -a -d ata /dev/sdd and get anything back at all? If the disk did die an early death... sorry, but the script did exactly as designed... it helped identify an early failure. Be happy it failed before you added it to your array... It takes a lot more time to replace it once it has data on it. After the reboot it seems to be happily be running I'm at 98% of the pre-read... Ok.. Change that... I guess it isn't happy... I'm getting more of those errors on the zeroing.. Here is a snippet of the log. The snippet starts at close to the end of the pre read and captures the start of the zeroing.. I'm in the middle of a power cycle (remotly so I may not get it back). I'll have to look to see if the very first pass of this test behaved well... I'll post the smart results when the computer reboots.. Quote Link to comment
jbuszkie Posted May 14, 2009 Share Posted May 14, 2009 Here is my hdparm info and smart test info. Interestingly.. After the power cycle my disk 2 was "missing" I power cycled again and it came back? Now I just have to see if disk 2 is on the same controller as my new disk.. /dev/sdd: ATA device, with non-removable media Model Number: WDC WD15EADS-00H7B0 Serial Number: WD-WCAUP0018631 Firmware Revision: 05.00K05 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5 Standards: Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 2930277168 device size with M = 1024*1024: 1430799 MBytes device size with M = 1000*1000: 1500301 MBytes (1500 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * SATA-I signaling speed (1.5Gb/s) * SATA-II signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters DMA Setup Auto-Activate optimization * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Long Sector Access (AC1) * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 412min for SECURITY ERASE UNIT. 412min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2ad0035dd NAA : 5 IEEE OUI : 14ee Unique ID : 2ad0035dd Checksum: correct root@Tower:~# root@Tower:~# root@Tower:~# smartctl -a -d ata /dev/sdd smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD15EADS-00H7B0 Serial Number: WD-WCAUP0018631 Firmware Version: 05.00K05 User Capacity: 1,500,301,910,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu May 14 13:33:22 2009 GMT+5 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40500) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 139 139 051 Pre-fail Always - 14844 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 9 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 0 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 9 194 Temperature_Celsius 0x0022 127 121 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 195 195 000 Old_age Always - 1311 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Now do I have to be concerned about the "Current_Pending_Sector " Number? Seems like that should be 0 for a new good drive.. Could a bad controller have any effect on that number? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.