January 10, 201313 yr Hi there, I was starting to preclear my 3TB drive yesturday evening. This morning the script started to wirte the zeros. Half an hour ago, I noticed stange sounds from the Server: HDD is spinning up, than a strange sound (like the HDD head is parked or something), than spinning up.... every 3-5 seconds. Looked at the syslog, there were thousands of: Jan 10 19:49:58 Tower kernel: lost page write due to I/O error on sdb Jan 10 19:49:58 Tower kernel: Buffer I/O error on device sdb, logical block 505131004 massages. I have attached the full syslog so you can see what happened before. In unRaid under MyMain, the device is listed as sdc (was sdb before) and RAW and nothing going on where it said "Zeroing 65%...", although i can still see the script working via telnet, but it does copy 2 MB zeros every refresh(5 seconds), discontinued by periods of time where it writes nothing... (Had 60 MB/s before) Can it be that there is a problem with the power cable? can I remove it and plug another one in, does the script continue at the same point where it left of? SYSLOG: http://pastebin.com/aE2KKyM4
January 10, 201313 yr Hi there, I was starting to preclear my 3TB drive yesturday evening. This morning the script started to wirte the zeros. Half an hour ago, I noticed stange sounds from the Server: HDD is spinning up, than a strange sound (like the HDD head is parked or something), than spinning up.... every 3-5 seconds. Looked at the syslog, there were thousands of: Jan 10 19:49:58 Tower kernel: lost page write due to I/O error on sdb Jan 10 19:49:58 Tower kernel: Buffer I/O error on device sdb, logical block 505131004 massages. I have attached the full syslog so you can see what happened before. In unRaid under MyMain, the device is listed as sdc (was sdb before) and RAW and nothing going on where it said "Zeroing 65%...", although i can still see the script working via telnet, but it does copy 2 MB zeros every refresh(5 seconds), discontinued by periods of time where it writes nothing... (Had 60 MB/s before) Can it be that there is a problem with the power cable? can I remove it and plug another one in, does the script continue at the same point where it left of? SYSLOG: http://pastebin.com/aE2KKyM4 Not likely to be a power cable unless the connection was intermittent. The CRC errors are more likely noise pickup, or a poorly seated SATA cable, or a poorly shielded one, (or cables tie-wrapped together to look neat and MAXIMIZE noise pickup from one to another and MAXIMIZE possible CRC errors) Could easily be a power supply unable to keep up with the number of disks attached if that results in noisy power supply voltages. The preclear will not remember where it left off. it will start from the beginning. It could easily be the disk died an early death if the power supply is not the issue. Only way to know more is to get a SMART report. Joe L.
January 10, 201313 yr Author smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3000DM001-1CH166 Serial Number: Z1F1LF77 Firmware Version: CC24 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Jan 10 22:17:04 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 89) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 162710504 3 Spin_Up_Time 0x0003 098 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 217 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 106384 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 26 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 21 183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 062 045 Old_age Always - 32 (Lifetime Min/Max 32/36) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 211 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 257 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always - 32 (0 16 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 3702261809171 241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4037965720 242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 6010719253 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. clicked on "sm" link in MyMain. This looks okay to me.
January 10, 201313 yr smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: ST3000DM001-1CH166 Serial Number: Z1F1LF77 Firmware Version: CC24 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Thu Jan 10 22:17:04 2013 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 89) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 162710504 3 Spin_Up_Time 0x0003 098 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 217 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 106384 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 26 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 21 183 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 068 062 045 Old_age Always - 32 (Lifetime Min/Max 32/36) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 211 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 257 194 Temperature_Celsius 0x0022 032 040 000 Old_age Always - 32 (0 16 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 3702261809171 241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4037965720 242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 6010719253 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. clicked on "sm" link in MyMain. The power-off retract count in the SMART report indicates a power issue, unless you've power cycled the disk 211 times in the past 20 hours it has been running.
January 10, 201313 yr Author No I have definitely not power cycled the disk 211 times. I put it in and started preclearing, maybe after one reboot. So you're saying that the disk looses power and gains it back, that was the sound I was hearing, I suppose... But why is the preclear script continuing? You said it stops, if the power is disconnected.
January 10, 201313 yr No I have definitely not power cycled the disk 211 times. I put it in and started preclearing, maybe after one reboot. So you're saying that the disk looses power and gains it back, that was the sound I was hearing, I suppose... But why is the preclear script continuing? You said it stops, if the power is disconnected. Modern power supplies can have over-current protection on all of their voltage buses. When the over-current condition is detected, the bus is basically turned off. However, it will come back up within a very short period of time. (Think of a circuit breaker in your power panel. You are standing next to the panel and every time a breaker trips out, you reset it.) The 12 volt buses are often an issue with unRAID as many people have not selected power supplies with a single 12 volt rail. Most power supplies have multiple 12 volt rails (cheaper to build them that way). While they often advertise the 12 volt supply to be 30A, it is often split into two or more 'rails'. As an example, this (hypothetical) power supply actually has two 12 volt rails of 15 amperes. If the load on one rail is reaches 15.01 amperes, the over-current protection will instantly kick in and shut that rail down. This will be true even if the other rail has absolutely no load on it! (Oh, the other voltage rails are completely unaffected by what is happening on that 12 volt. They continue to operate normally.)
January 10, 201313 yr Author I tend to believe that this is not the problem. I have 3 disks installed, two IDE and 1 SATA, no DVD drives or something like that...
January 10, 201313 yr No I have definitely not power cycled the disk 211 times. I put it in and started preclearing, maybe after one reboot. So you're saying that the disk looses power and gains it back, that was the sound I was hearing, I suppose... But why is the preclear script continuing? You said it stops, if the power is disconnected. The script has no way to know if the disk has lost power, other than the read/write commands to it fails. It will not just "stop", on its own but if you stop it now (Control-C) it will start back at the beginning of the process when you re-invoke it. (Since it is failing, you might as well abort it. ) Joe L.
January 10, 201313 yr I tend to believe that this is not the problem. I have 3 disks installed, two IDE and 1 SATA, no DVD drives or something like that... With only three drives on the 12V bus, I agree with you. However, I would now check the power connector (or change to another one) that is connected to the Seagate drive that you are clearing. Something is causing those 'power offs'... And a bad connection or bad crimp would be the next place I would be looking. One final note, it might be the power supply that is going bad. It sounds like this is an older computer and these things don't last forever.
January 10, 201313 yr Author With only three drives on the 12V bus, I agree with you. However, I would now check the power connector (or change to another one) that is connected to the Seagate drive that you are clearing. Something is causing those 'power offs'... And a bad connection or bad crimp would be the next place I would be looking. One final note, it might be the power supply that is going bad. It sounds like this is an older computer and these things don't last forever. The script has no way to know if the disk has lost power, other than the read/write commands to it fails. It will not just "stop", on its own but if you stop it now (Control-C) it will start back at the beginning of the process when you re-invoke it. (Since it is failing, you might as well abort it. ) Joe L. The strange thing is, it was still writing zeros to the disk, even after the power losses. slowly but it was. Nevermind, I stopped it, changed the cable and starting again soon. I hope it works. Maybe I should realy invest in a new supply, just saw that the power supply only has 250W, a bit to poor if I want to run 4 or more drivers, I suppose? Thank you guys again for your continuous, professional and especially warm support! You truely are gold for this community! :-)
Archived
This topic is now archived and is closed to further replies.