Joe L. Posted July 6, 2011 Share Posted July 6, 2011 the unmenu shows not successful, here is the report and screenshot That is the final SMART report. It still shows no error. Where is the preclear report itself. (I have no idea what the myMain does, so can offer no help with the screen-shot) What version of the preclear_disk.sh did you use? Type: preclear_disk.sh -v to find out. So far, you've shown me nothing that says the disk failed the pre-clear. A SMART report does not indicate anything about the pre-clear process. The preclear_reports folder should have had three files. The initial SMART report, the final SMART report, and the preclear results. Post the contents of the results file, not the SMART reports. Joe L. Quote Link to comment
SSD Posted July 6, 2011 Share Posted July 6, 2011 (I have no idea what the myMain does, so can offer no help with the screen-shot) myMain echoes the status returned from preclear. My guess is there was a post-read verification error, but will have to look at the actual preclear report to be sure. Quote Link to comment
MrLondon Posted July 6, 2011 Share Posted July 6, 2011 now put in all 3 files into the zip file. preclear_finish__S2HGJ1AZ801624_2011-07-05.zip Quote Link to comment
Joe L. Posted July 6, 2011 Share Posted July 6, 2011 now put in all 3 files into the zip file. It says that the post-read detected the disk did not have all zeros when read. This could be caused by almost anything from Bad memory, a bad disk, a bad disk controller, a bad motherboard chipset (early Nforce had this), or a bad power supply. These types of errors are exactly why the test for zeros was added to the preclear-post-read. They cause hair-loss. (because you will pull your hair out trying to find elusive parity errors if the drive is added to the array) About the only hardware you can eliminate is the "mouse" It is unlikely to be the cause of the errors. I see another preclear_disk.sh run in your future. Joe L. Quote Link to comment
MrLondon Posted July 6, 2011 Share Posted July 6, 2011 I cleared 4 drives at the same time, 3 completed successfully and all 4 drives are connected to the same 8 port controller AOC-SASLP-MV8, therefore it's unlikely that it's the controller, same for the power supply and chipset. This is a i3 processor with Intel Chipset. So will run a single preclear on the drive again, lets see if that completes. Quote Link to comment
Joe L. Posted July 6, 2011 Share Posted July 6, 2011 I cleared 4 drives at the same time, 3 completed successfully and all 4 drives are connected to the same 8 port controller AOC-SASLP-MV8, therefore it's unlikely that it's the controller, same for the power supply and chipset. This is a i3 processor with Intel Chipset. So will run a single preclear on the drive again, lets see if that completes. I'm happy it might be as simple as that. As I said, in the past those with similar drives that "randomly" returned inconsistent values (other than zeros) when subsequently read would drive their array owners insane, as once the disk is in the array the only symptom would be random parity errors when parity is checked, and there would be absolutely no way, other than by process of elimination, to figure out the hardware that was faulty. Joe L. Quote Link to comment
SSD Posted July 6, 2011 Share Posted July 6, 2011 now put in all 3 files into the zip file. As I had thought, the post-read verify failed. What preclear does is first read the entire disk, then zero the entire disk, and then read the entire disk verifying that it is full of zeros. What happened is that last step found some location(s) where the data was not zero. Could have litterally been 1 bit in 20 Trillion - that's all it takes. As Joe L. says, there are numerous things that can cause this. I'd recommend running an overnight memory test as a starting point. If you have bad or misconfigured memory, it can cause problems during the write or the post-read phase. Could also be a bad or loose data cable or some incompatibility. These can be hard to find, but not always. Do the memory test and based on the results we can suggest additional tests to try to narrow it down. Quote Link to comment
hurricanehrndz Posted July 6, 2011 Share Posted July 6, 2011 I don't understand why preclear is reporting a smartctl problem. When I run smarctl manually it seems to work fine. root@192:/boot# preclear_disk.sh /dev/sdc Pre-Clear unRAID Disk /dev/sdc ################################################################## 1.11 smartctl may not be able to run on /dev/sdc with the -d ata option. however this should not affect the clearing of a disk. smartctl exit status = 4 smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate DB35.3 Series Device Model: ST3160215SCE Serial Number: 5RX1MFTP Firmware Version: 3.ACF User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jun 29 13:55:16 2011 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Error SMART Status command failed Please get assistance from http://smartmontools.sourceforge.net/ Register values returned from SMART Status command are: ST =0x40 ERR=0x00 NS =0x00 SC =0xa0 CL =0x9e CH =0xa1 SEL=0x40 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.$ Do you wish to continue? (Answer Yes to continue. Capital 'Y', lower case 'es'): root@192:/boot# smartctl -d ata -a /dev/sdc smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate DB35.3 Series Device Model: ST3160215SCE Serial Number: 5RX1MFTP Firmware Version: 3.ACF User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jun 29 13:55:48 2011 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (15556) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 115 072 006 Pre-fail Always - 92652310 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 79 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 498343952 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12374 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 79 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 064 044 045 Old_age Always In_the_past 36 (Lifetime Min/Max 35/36) 194 Temperature_Celsius 0x0022 036 056 000 Old_age Always - 36 (0 11 0 0) 195 Hardware_ECC_Recovered 0x001a 115 064 000 Old_age Always - 92652310 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@192:/boot# Why am I seeing this problem? When you run the command, are you checking its exit status? the preclear script is, and your smartctl is exiting with a non-zero exit status. Try the 1.12 version of the preclear_disk.sh script attached to this post: http://lime-technology.com/forum/index.php?topic=4068.msg128289#msg128289 It is version 1.12... I had originally posted it there to allow those with 3TB drives to give it a try, but it also should fix your issue too. (as long as your disk will report with just smartctl -a /dev/sdX ) I would really appreciate some help because I am still receiving the error with version 12. Now the disk that is being tested is connected to SASLP card. Yet I receive a normal output when I run the command smartctl -a /dev/sdb. Furthermore I wrote a small bash script that looks like follows: http://pastebin.com/42nSRWXZ And it exits with a code of 0. Yet when I modified the your preclear script and inserted an echo line between lines 1456 and 1457 that said "echo $smartstat" I got an exit code 4. Can you please advise me as to what exactly is occurring and how I may resolve the issue. Quote Link to comment
Joe L. Posted July 6, 2011 Share Posted July 6, 2011 I don't understand why preclear is reporting a smartctl problem. When I run smarctl manually it seems to work fine. root@192:/boot# preclear_disk.sh /dev/sdc Pre-Clear unRAID Disk /dev/sdc ################################################################## 1.11 smartctl may not be able to run on /dev/sdc with the -d ata option. however this should not affect the clearing of a disk. smartctl exit status = 4 smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate DB35.3 Series Device Model: ST3160215SCE Serial Number: 5RX1MFTP Firmware Version: 3.ACF User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jun 29 13:55:16 2011 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Error SMART Status command failed Please get assistance from http://smartmontools.sourceforge.net/ Register values returned from SMART Status command are: ST =0x40 ERR=0x00 NS =0x00 SC =0xa0 CL =0x9e CH =0xa1 SEL=0x40 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.$ Do you wish to continue? (Answer Yes to continue. Capital 'Y', lower case 'es'): root@192:/boot# smartctl -d ata -a /dev/sdc smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate DB35.3 Series Device Model: ST3160215SCE Serial Number: 5RX1MFTP Firmware Version: 3.ACF User Capacity: 160,041,885,696 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Jun 29 13:55:48 2011 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (15556) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 115 072 006 Pre-fail Always - 92652310 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 79 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 498343952 9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12374 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 79 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 064 044 045 Old_age Always In_the_past 36 (Lifetime Min/Max 35/36) 194 Temperature_Celsius 0x0022 036 056 000 Old_age Always - 36 (0 11 0 0) 195 Hardware_ECC_Recovered 0x001a 115 064 000 Old_age Always - 92652310 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 Data_Address_Mark_Errs 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@192:/boot# Why am I seeing this problem? When you run the command, are you checking its exit status? the preclear script is, and your smartctl is exiting with a non-zero exit status. Try the 1.12 version of the preclear_disk.sh script attached to this post: http://lime-technology.com/forum/index.php?topic=4068.msg128289#msg128289 It is version 1.12... I had originally posted it there to allow those with 3TB drives to give it a try, but it also should fix your issue too. (as long as your disk will report with just smartctl -a /dev/sdX ) I would really appreciate some help because I am still receiving the error with version 12. Now the disk that is being tested is connected to SASLP card. Yet I receive a normal output when I run the command smartctl -a /dev/sdb. Furthermore I wrote a small bash script that looks like follows: http://pastebin.com/42nSRWXZ And it exits with a code of 0. Yet when I modified the your preclear script and inserted an echo line between lines 1456 and 1457 that said "echo $smartstat" I got an exit code 4. Can you please advise me as to what exactly is occurring and how I may resolve the issue. have you tried the "-D" option to preclear_disk.sh ?? Quote Link to comment
hurricanehrndz Posted July 6, 2011 Share Posted July 6, 2011 Yes. Same results exit code 4, when it should be zero. Quote Link to comment
Joe L. Posted July 6, 2011 Share Posted July 6, 2011 Yes. Same results exit code 4, when it should be zero. I've no idea... Best I can offer is to suggest you invoke it as sh -xv preclear_disk.sh /dev/sdX and see what it is doing. I certainly cannot fix your exit status if your smart command is exiting abnormally. Joe L. Quote Link to comment
hurricanehrndz Posted July 7, 2011 Share Posted July 7, 2011 Yes. Same results exit code 4, when it should be zero. I've no idea... Best I can offer is to suggest you invoke it as sh -xv preclear_disk.sh /dev/sdX and see what it is doing. I certainly cannot fix your exit status if your smart command is exiting abnormally. Joe L. Thanks Joe L. Just thought I asked since it only occurs within the script, not from the command line or from within my test script which is practically identical to the offending lines in the pre-clear script. Anyhow I'll do some more test and run the command as you suggested. Edit: Did more tests, I added a 5 second sleep at line 1450 and that seem to resolve the issue. Quote Link to comment
Joe L. Posted July 7, 2011 Share Posted July 7, 2011 Yes. Same results exit code 4, when it should be zero. I've no idea... Best I can offer is to suggest you invoke it as sh -xv preclear_disk.sh /dev/sdX and see what it is doing. I certainly cannot fix your exit status if your smart command is exiting abnormally. Joe L. Thanks Joe L. Just thought I asked since it only occurs within the script, not from the command line or from within my test script which is practically identical to the offending lines in the pre-clear script. Anyhow I'll do some more test and run the command as you suggested. Edit: Did more tests, I added a 5 second sleep at line 1450 and that seem to resolve the issue. yes, but it sounds like the logic is now looking at the exit status of the added "sleep" command, and it (sleep) always is successful at sleeping. Happy it works for you, but it does not solve the issue. It will not affect the preclear regardless, as it is just used to get the disk temperature. Quote Link to comment
hurricanehrndz Posted July 7, 2011 Share Posted July 7, 2011 I don't believe so, since sleep is run before line 1451 which is echo $clearscreen$goto_top${bold}Pre-Clear unRAID Disk $theDisk$norm So the exit code being stored in $smartstat should be that of smartctl. If this is not the case please forgive my statement. On another note, I noticed that when I did run pre-clear that all the reports look valid so its only this line that causes issues for people and gives them a confusing error that could possibly not be an error, like in my particular case. Uploaded with ImageShack.us PS: Here is the pre-clear report from this disk using v11, before I though off inserting the 5 second delay: http://pastebin.com/JJBE52gp Pre-clear issued the error at the start of preclear "smartctl may not be able to run on...." so on, yet as you can see the rpt clearly shows the smartctl was able to run. Thanks again Joe L. for putting up with my pestering. I love your script though, and like to contribute whenever possible. Quote Link to comment
Joe L. Posted July 7, 2011 Share Posted July 7, 2011 I don't believe so, since sleep is run before line 1451 which is echo $clearscreen$goto_top${bold}Pre-Clear unRAID Disk $theDisk$norm So the exit code being stored in $smartstat should be that of smartctl. If this is not the case please forgive my statement. On another note, I noticed that when I did run pre-clear that all the reports look valid so its only this line that causes issues for people and gives them a confusing error that could possibly not be an error, like in my particular case. Uploaded with ImageShack.us All I can say is I'm happy it helped, but adding the "sleep 5" where you did should have had no effect of the subsequent smartctl exit status, yet you say it does. The only way that could happen is it smartctl is not properly initializing one of the variables used internally and somehow inheriting some memory contents from a prior command that is not zero. Perhaps it is some side effect of running the hdparm command on the disk, just a few lines above, to get its size, and the disk too some time to recover from that operation... I really don't know. It certainly does no harm to add the sleep where you did. Initially, I thought you were adding it between the invocation of smartctl and the evaluation of its exit status. Joe L. Quote Link to comment
jamesbt Posted July 9, 2011 Share Posted July 9, 2011 hi. i am a noob to unraid and pre-clear. i mainly wanted to use unraid and pre-clear to clear the drive and verify that its 'good to go' for other uses. so i run unraid from usb stick and do one drive at a time on a pretty decent machine w/ 5 gigs of ram. (memtested ans passed) i ran pre-clear from 'console' w/o remoting. i did one 2TB drive and it all worked well, and took a little over 24 hrs, which seemed normal and seemed easy todo. now i am trying to pre-clear a different drive, 2TB WD20EADS (32MB Cache) and Pre-Read and Post-Read steps take very very long time. First time i attempted, it took few days and it seems like it froze on Post Read, so i rebooted and tried again. now its been about 18 hrs and Pre-Read is only 1% done. bytes read and elapsed time don't update often. So... should i forget about this drive then? is it fubared now? Quote Link to comment
Joe L. Posted July 9, 2011 Share Posted July 9, 2011 hi. i am a noob to unraid and pre-clear. i mainly wanted to use unraid and pre-clear to clear the drive and verify that its 'good to go' for other uses. so i run unraid from usb stick and do one drive at a time on a pretty decent machine w/ 5 gigs of ram. (memtested ans passed) i ran pre-clear from 'console' w/o remoting. i did one 2TB drive and it all worked well, and took a little over 24 hrs, which seemed normal and seemed easy todo. now i am trying to pre-clear a different drive, 2TB WD20EADS (32MB Cache) and Pre-Read and Post-Read steps take very very long time. First time i attempted, it took few days and it seems like it froze on Post Read, so i rebooted and tried again. now its been about 18 hrs and Pre-Read is only 1% done. bytes read and elapsed time don't update often. So... should i forget about this drive then? is it fubared now? we don't know, you did not attach a system log. Quote Link to comment
jamesbt Posted July 9, 2011 Share Posted July 9, 2011 we don't know, you did not attach a system log. since i didn't know how to get to it, i figured it out, restarted pre-clear again and here is a log after a few minutes of pre-read step. syslog02.txt Quote Link to comment
Joe L. Posted July 9, 2011 Share Posted July 9, 2011 you probably already saw this, but there are tons of un-readable sectors on that disk. Get a smart report to see the full statistics. smartctl -a /dev/sda look for sectors pending re-allocation. Jan 2 16:33:13 Tower kernel: res 51/40:00:f0:c1:9a/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:13 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:13 Tower kernel: ata3.00: error: { UNC } Jan 2 16:33:18 Tower kernel: ata3.00: configured for UDMA/133 Jan 2 16:33:18 Tower kernel: ata3: EH complete Jan 2 16:33:20 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 2 16:33:20 Tower kernel: ata3.00: BMDMA stat 0x4 Jan 2 16:33:20 Tower kernel: ata3.00: failed command: READ DMA EXT Jan 2 16:33:20 Tower kernel: ata3.00: cmd 25/00:08:f0:c1:9a/00:00:ad:00:00/e0 tag 0 dma 4096 in Jan 2 16:33:20 Tower kernel: res 51/40:00:f0:c1:9a/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:20 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:20 Tower kernel: ata3.00: error: { UNC } Jan 2 16:33:23 Tower kernel: ata3.00: configured for UDMA/133 Jan 2 16:33:23 Tower kernel: sd 4:0:0:0: [sda] Unhandled sense code Jan 2 16:33:23 Tower kernel: sd 4:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08 Jan 2 16:33:23 Tower kernel: sd 4:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor] Jan 2 16:33:23 Tower kernel: Descriptor sense data with sense descriptors (in hex): Jan 2 16:33:23 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jan 2 16:33:23 Tower kernel: ad 9a c1 f0 Jan 2 16:33:23 Tower kernel: sd 4:0:0:0: [sda] ASC=0x11 ASCQ=0x4 Jan 2 16:33:23 Tower kernel: sd 4:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 ad 9a c1 f0 00 00 08 00 Jan 2 16:33:23 Tower kernel: end_request: I/O error, dev sda, sector 2912600560 Jan 2 16:33:23 Tower kernel: Buffer I/O error on device sda, logical block 364075070 Jan 2 16:33:23 Tower kernel: ata3: EH complete Jan 2 16:33:36 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 2 16:33:36 Tower kernel: ata3.00: BMDMA stat 0x5 Jan 2 16:33:36 Tower kernel: ata3.00: failed command: READ DMA Jan 2 16:33:36 Tower kernel: ata3.00: cmd c8/00:00:c8:56:41/00:00:00:00:00/e0 tag 0 dma 131072 in Jan 2 16:33:36 Tower kernel: res 51/40:4f:77:57:41/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:36 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:36 Tower kernel: ata3.00: error: { UNC } Jan 2 16:33:40 Tower kernel: ata3.00: configured for UDMA/133 Jan 2 16:33:40 Tower kernel: ata3: EH complete Jan 2 16:33:43 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 2 16:33:43 Tower kernel: ata3.00: BMDMA stat 0x5 Jan 2 16:33:43 Tower kernel: ata3.00: failed command: READ DMA Jan 2 16:33:43 Tower kernel: ata3.00: cmd c8/00:00:c8:56:41/00:00:00:00:00/e0 tag 0 dma 131072 in Jan 2 16:33:43 Tower kernel: res 51/40:4f:77:57:41/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:43 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:43 Tower kernel: ata3.00: error: { UNC } Jan 2 16:33:47 Tower kernel: ata3.00: configured for UDMA/133 Jan 2 16:33:47 Tower kernel: ata3: EH complete Jan 2 16:33:49 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 2 16:33:49 Tower kernel: ata3.00: BMDMA stat 0x5 Jan 2 16:33:49 Tower kernel: ata3.00: failed command: READ DMA Jan 2 16:33:49 Tower kernel: ata3.00: cmd c8/00:00:c8:56:41/00:00:00:00:00/e0 tag 0 dma 131072 in Jan 2 16:33:49 Tower kernel: res 51/40:4f:77:57:41/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:49 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:49 Tower kernel: ata3.00: error: { UNC } Jan 2 16:33:52 Tower kernel: ata3.00: configured for UDMA/133 Jan 2 16:33:52 Tower kernel: ata3: EH complete Jan 2 16:33:55 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jan 2 16:33:55 Tower kernel: ata3.00: BMDMA stat 0x5 Jan 2 16:33:55 Tower kernel: ata3.00: failed command: READ DMA Jan 2 16:33:55 Tower kernel: ata3.00: cmd c8/00:00:c8:56:41/00:00:00:00:00/e0 tag 0 dma 131072 in Jan 2 16:33:55 Tower kernel: res 51/40:4f:77:57:41/40:00:ad:00:00/e0 Emask 0x9 (media error) Jan 2 16:33:55 Tower kernel: ata3.00: status: { DRDY ERR } Jan 2 16:33:55 Tower kernel: ata3.00: error: { UNC } Quote Link to comment
jamesbt Posted July 10, 2011 Share Posted July 10, 2011 you probably already saw this, but there are tons of un-readable sectors on that disk. Get a smart report to see the full statistics. smartctl -a /dev/sda look for sectors pending re-allocation. thanks for help. to be honest i'm not sure how to read smart report or what it means. attached here. smartlog.txt Quote Link to comment
heffneil Posted July 10, 2011 Share Posted July 10, 2011 I am running preclear and I used screen. I reconnected and it looks frozen. Problem is I don't know if it truly is locked up or if this is a problem with the interface. When I run ps -al I see the following: 0 S 0 3150 2665 0 80 0 - 880 wait pts/1 00:00:26 preclear_disk.s 1 S 0 24628 3150 0 80 0 - 880 pipe_w pts/1 00:00:00 preclear_disk.s 1 S 0 24634 24628 0 80 0 - 880 wait pts/1 00:00:00 preclear_disk.s 0 D 0 24635 24634 50 80 0 - 2457 - pts/1 03:37:41 dd 0 R 0 24636 24634 24 80 0 - 434 pipe_w pts/1 01:44:07 sed 0 S 0 24637 24634 0 80 0 - 524 pipe_w pts/1 00:00:00 awk 4 S 0 24668 24654 0 80 0 - 627 pause pts/0 00:00:00 screen 4 R 0 24691 24670 0 80 0 - 519 - pts/2 00:00:00 ps Not sure if I screwed this up or if this drive is suspect. It was laying around and I thought in previous installs it was a problem so I wanted to run it through its paces before I considered using it again. Any suggestions would be greatly appreciated. Thanks, Neil Quote Link to comment
Joe L. Posted July 10, 2011 Share Posted July 10, 2011 you probably already saw this, but there are tons of un-readable sectors on that disk. Get a smart report to see the full statistics. smartctl -a /dev/sda look for sectors pending re-allocation. thanks for help. to be honest i'm not sure how to read smart report or what it means. attached here. 5 Reallocated_Sector_Ct 0x0033 170 170 140 Pre-fail Always - 236 197 Current_Pending_Sector 0x0032 197 196 000 Old_age Always - 1029 There are 1029 sectors pending re-allocation and 236 that have already been re-allocated. (Those pending are waiting for a subsequent "write" so the disk can know what should be in the sector it re-locates.) That disk has failed. Do not trust it with your data. RMA it. Joe L. Quote Link to comment
ricsouthcott Posted July 11, 2011 Share Posted July 11, 2011 i am a noob to unraid , been trying to preclear some 2tb western digital green drives they keep stoping with segmentation fault syslog is attached syslog.txt Quote Link to comment
Joe L. Posted July 11, 2011 Share Posted July 11, 2011 i am a noob to unraid , been trying to preclear some 2tb western digital green drives they keep stoping with segmentation fault syslog is attached Looks to me like you are either running out of memory, or, have faulty memory, or memory where the voltage, timing, or clock speed is not set properly in the BIOS. (Some BIOS get it right automatically, some do not) I suggest a memory test first. Jul 11 10:59:51 Tower kernel: preclear_disk.s[2005]: segfault at 0 ip 0804e595 sp bf9e20d0 error 6 in bash[8048000+a0000] Jul 11 11:28:14 Tower kernel: preclear_disk.s[9047]: segfault at 7e5f810 ip 07e5f810 sp bf8a8dfc error 14 in bash[8048000+a0000] Jul 11 11:28:14 Tower kernel: preclear_disk.s[9046]: segfault at 7e5f810 ip 07e5f810 sp bf8a8dcc error 14 in bash[8048000+a0000] Jul 11 11:28:14 Tower kernel: preclear_disk.s[9052]: segfault at 7e5f810 ip 07e5f810 sp bf8a8fdc error 14 in bash[8048000+a0000] Jul 11 11:44:40 Tower kernel: preclear_disk.s[11567]: segfault at 7e5f810 ip 07e5f810 sp bf8a8dfc error 14 in bash[8048000+a0000] Jul 11 11:44:40 Tower kernel: preclear_disk.s[11566]: segfault at 7e5f810 ip 07e5f810 sp bf8a8dcc error 14 in bash[8048000+a0000] Jul 11 11:44:40 Tower kernel: preclear_disk.s[11572]: segfault at 7e5f810 ip 07e5f810 sp bf8a8fdc error 14 in bash[8048000+a0000] Jul 11 11:46:16 Tower kernel: swap_free: Bad swap file entry 00002000 Jul 11 11:46:16 Tower kernel: BUG: Bad page map in process preclear_disk.s pte:4000000000000 pmd:126bc9067 Jul 11 11:46:16 Tower kernel: addr:b78a1000 vm_flags:08000075 anon_vma:(null) mapping:f7086f78 index:144 Jul 11 11:46:16 Tower kernel: vma->vm_ops->fault: filemap_fault+0x0/0x305 Jul 11 11:46:16 Tower kernel: vma->vm_file->f_op->mmap: generic_file_mmap+0x0/0x3f Jul 11 11:46:16 Tower kernel: Pid: 11805, comm: preclear_disk.s Not tainted 2.6.32.9-unRAID #8 Jul 11 11:46:16 Tower kernel: Call Trace: Jul 11 11:46:16 Tower kernel: [<c1057e18>] print_bad_pte+0x182/0x194 Jul 11 11:46:16 Tower kernel: [<c1058cb5>] unmap_vmas+0x42f/0x64c Jul 11 11:46:16 Tower kernel: [<c105c802>] exit_mmap+0x8a/0x102 Jul 11 11:46:16 Tower kernel: [<c10227ac>] mmput+0x28/0x96 Jul 11 11:46:16 Tower kernel: [<c1025b53>] exit_mm+0xd3/0xdb Jul 11 11:46:16 Tower kernel: [<c1026bbf>] do_exit+0x152/0x508 Jul 11 11:46:16 Tower kernel: [<c106d659>] ? fput+0x17/0x19 Jul 11 11:46:16 Tower kernel: [<c1026fdc>] do_group_exit+0x67/0x8d Jul 11 11:46:16 Tower kernel: [<c1027011>] sys_exit_group+0xf/0x13 Jul 11 11:46:16 Tower kernel: [<c1002935>] syscall_call+0x7/0xb Jul 11 11:46:16 Tower kernel: Disabling lock debugging due to kernel taint Jul 11 12:04:33 Tower kernel: preclear_disk.s[14549]: segfault at 29e5dd8e ip 0808a400 sp bf8a6180 error 4 in bash[8048000+a0000] Jul 11 12:08:56 Tower kernel: preclear_disk.s[15210]: segfault at 803db25 ip 0803db25 sp bf8a9040 error 14 in bash[8048000+a0000] Quote Link to comment
ricsouthcott Posted July 11, 2011 Share Posted July 11, 2011 using Corsair 4GB (2x2GB) DDR3 1333MHz/PC3-10666 XMS3 DHX Memory Kit ram and a GByte GA-880GA-UD3H MoBo AM3 motherboard settings are default Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.