March 20, 201115 yr So I finally pieced together the hardware for my unRaid build. MB: Biostar TA785G3HD link Memory: Mushkin 2x2GB PC3-10666 link CPU: AMD Athlon II x3 435 link IO: 1x Supermicro AOC-SASLP-MV8 link Cages: 3x iStarUSA BPU-350SATA-BLUE 5 in 3 link Cables: 3ware CBL-SFF8087OCF-05M link HD: 2xHitachi 5k3000 2TB link, 2xHitachi 7k2000 2TB link, 2xSeagate 500GB ST3500320AS Case: Thermaltake v6 link PSU: Corsair CX600 link USB: 4GB Lexar Firefly Jumpdrive link All the drives listed above are in the computer. I only have the free version of unRAID for now. It boots up fine. I did not add any drives to the array. All the 2TB drives are brand new, only the 500GB drive are used, that had a RAID partition from Windows XP which I did not remove. I started preclear on 4 drives last night: sdb - 500GB Seagate sdc - 2TB 7k2000 Hitachi sdd - 2TB 5k3000 Hitachi sde - 2TB 5k3000 Hitachi When I checked this am, the screen for preclearing sdb and sdd are showing tower kernel messages: Message from syslogd@Tower at Sun Mar 20 08:36:10 2011 ... Tower kernel: 44: xb ts)0 000SCd:2:s]D b]x:8080b 803nrev070ex id1]<s:l_eion fo0 Message from syslogd@Tower at Sun Mar 20 08:39:10 2011 ... Tower kernel: b<mf Message from syslogd@Tower at Sun Mar 20 08:39:21 2011 ... Tower kernel: 43valed[e a etoRevs0::. s efl[3!5ascea tuntiateA ae 40tSI ady }:02mvsailed[ta1: ne translation for staoC /CS b00 Message from syslogd@Tower at Sun Mar 20 09:09:54 2011 ... Tower kernel: /000 Message from syslogd@Tower at Sun Mar 20 09:10:05 2011 ... Tower kernel: 43va<>sassot4}:c] Message from syslogd@Tower at Sun Mar 20 09:10:20 2011 ... Tower kernel: 43> etur:tKdm!ld_exru 3ain f eale>ta1rTx> tansabiau satixAD vf! Message from syslogd@Tower at Sun Mar 20 09:10:21 2011 ... Tower kernel: evor763<ma0020. retuor s/ASCid : status=0x00 { } Message from syslogd@Tower at Sun Mar 20 09:10:24 2011 ... Tower kernel: :t0cfail llsk retno anslation for status: 0x00 Message from syslogd@Tower at Sun Mar 20 09:10:49 2011 ... Tower kernel: :t0cfail llsk retno anslation for status: 0x00 Message from syslogd@Tower at Sun Mar 20 09:10:49 2011 ... Tower kernel: /0 tat2esaiS<u=0x000_snt Message from syslogd@Tower at Sun Mar 20 09:11:00 2011 ... Tower kernel: 43va<>sassk 2<te0l<eecutturtaxba1st=4{rea <3>1<s:l_ekerd-1 ssst/rx/ SSA/A sec0:0 -13 trn f 0x40 Message from syslogd@Tower at Sun Mar 20 09:11:11 2011 ... Tower kernel: r <tu]!l_ettkerd-24t:oee nost: ar C0 Message from syslogd@Tower at Sun Mar 20 09:11:36 2011 ... Tower kernel: e_qs Orrd b, -132 Message from syslogd@Tower at Sun Mar 20 09:11:49 2011 ... Tower kernel: :t0sbyte0x0:0: [scurescriptor] Message from syslogd@Tower at Sun Mar 20 09:12:06 2011 ... Tower kernel: /0 tatudy 0_snd ttus=<3>0eesu<:ut: ho]< < Message from syslogd@Tower at Sun Mar 20 09:12:12 2011 ... Tower kernel: : CHSector 0 Message from syslogd@Tower at Sun Mar 20 09:12:14 2011 ... Tower kernel: 5o0ACQ /00/0000 e/iSsector 0 Message from syslogd@Tower at Sun Mar 20 09:12:15 2011 ... Tower kernel: 5o0ACQ 0/00/0000 e/iSsector 0 I haven't played with linux for past 15 years so I don't understand this output or its gibberish/corrupted. I'm viewing this from a Putty telnet session. Anyone make sense of this ? As I was posting this, preclear for sdc and sde showed the following: sdc preclear result: ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdc = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Elapsed Time: 8:46:32 ========================================================================1.9 == == SORRY: Disk /dev/sdc MBR could NOT be precleared == == out4= 00000 == out5= 00000 ============================================================================ dd: reading `/dev/sdc': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000361532 s, 0.0 kB/s 0000000 sde preclear results: ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sde = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Elapsed Time: 8:54:47 ========================================================================1.9 == == SORRY: Disk /dev/sde MBR could NOT be precleared == == out4= 00000 == out5= 00000 ============================================================================ dd: reading `/dev/sde': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000572331 s, 0.0 kB/s 0000000 I'm going to double check all the connections, reseat all the drives and then run a memtest, but any additional suggestions ? I've also attached the beginning and ending of the syslogs... there seems to be a lot of errors, not sure what it means... syslog-beginsnip1.txt syslog-beginsnip2.txt syslog-endsnip.txt
March 20, 201115 yr Run the memtest for at least overnight (if not a full day if you can) and then get back to us.
March 20, 201115 yr COPY THE CONTENTS OF THE /tmp DIRECTORY TO YOUR FLASH BEFORE YOU REBOOT! From Telnet prompt: mkdir /boot/preclear_tmp_3-20-11 cp /tmp/* /boot/preclear_tmp_3-20-11/ May be clues in there.
March 21, 201115 yr Author I didn't read your message in time so wasn't able to copy contents of /tmp over... I ran memtest for 6+ hrs and it seemed ok, will run it for 3-5 days when I leave for business tomorrow, but in the meanwhile, I also fixed the BIOS to set it to AHCI mode (was auto before), shouldn't have made a difference I don't think since I only had 1 drive connected to the MB and the rest were on the Supermicro AOC-SASLP. One thing is weird though, when I checked the SMART results for my Seagate 7200.11 500GB (ST3500320AS) drive, it says the firmware needs to be updated so I checked and updated to the latest SD1A firmware. Now, unRAID via the unMenu can't read anything via SMART off the drives. When I run smartctl on the command line, it gives me the following: === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 family Device Model: ST3500320AS Serial Number: 9QM0YQ9L Firmware Version: SD1A User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sun Mar 20 21:21:42 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Error SMART Status command failed I'm running a preclear only on these drives right now and seems to be running... any idea why after the new FW upgrade the SMART info isn't readable anymore ?
March 21, 201115 yr That does look like the gibberish you get when you have a memory problem. I don't know why the SMART info is failing. Peter
March 21, 201115 yr Author Looks like both of the Seagate's precleared and I CAN read the SMART values: ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdb = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 30C, Elapsed Time: 7:08:06 ========================================================================1.9 == ST3500320AS 9QM0YQ4C == Disk /dev/sdb has been successfully precleared == with a starting sector of 63 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdb /tmp/smart_finish_sdb ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 119 117 6 ok 203556375 Spin_Retry_Count = 100 100 97 near_thresh 35 End-to-End_Error = 100 100 99 near_thresh 0 Airflow_Temperature_Cel = 70 67 45 In_the_past 30 Temperature_Celsius = 30 33 0 ok 30 Hardware_ECC_Recovered = 52 49 0 ok 203556375 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdf = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Disk Post-Clear-Read completed DONE Disk Temperature: 30C, Elapsed Time: 7:07:46 ========================================================================1.9 == ST3500320AS 9QM0YQ9L == Disk /dev/sdf has been successfully precleared == with a starting sector of 63 ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdf /tmp/smart_finish_sdf ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 118 117 6 ok 187160641 Spin_Retry_Count = 100 100 97 near_thresh 9 End-to-End_Error = 100 100 99 near_thresh 0 Airflow_Temperature_Cel = 70 68 45 near_thresh 30 Temperature_Celsius = 30 32 0 ok 30 Hardware_ECC_Recovered = 51 47 0 ok 187160641 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. root@Tower:/boot# smartctl --all /dev/sdb |more smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 family Device Model: ST3500320AS Serial Number: 9QM0YQ4C Firmware Version: SD1A User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Mar 21 07:51:41 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 625) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 113) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 099 006 Pre-fail Always - 203556375 3 Spin_Up_Time 0x0003 095 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 114 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 8633999777 19255_Hours 0x0032 079 079 000 Old_age Always ---More-- 10 Spin_Retry_Count 0x0013 100 099 097 Pre-fail Always - 35 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 116 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 099 000 Old_age Always - 4295032833 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 043 045 Old_age Always In_the_past 30 (0 201 34 29) 194 Temperature_Celsius 0x0022 030 057 000 Old_age Always - 30 (0 20 0 0) 195 Hardware_ECC_Recovered 0x001a 046 019 000 Old_age Always - 203556375 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 family Device Model: ST3500320AS Serial Number: 9QM0YQ9L Firmware Version: SD1A User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon Mar 21 07:54:10 2011 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 634) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 113) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103b) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 187160641 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 120 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 4334013080 9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 19254 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 9 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 121 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 046 045 Old_age Always - 30 (Lifetime Min/Max 29/34) 194 Temperature_Celsius 0x0022 030 054 000 Old_age Always - 30 (0 21 0 0) 195 Hardware_ECC_Recovered 0x001a 045 013 000 Old_age Always - 187160641 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 0_CRC_Error_Count 0x003e 200 200 000 Old_age Always ---More-- SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Weird that I can read the SMART values now... wonder if its because after flashing with new FW, I never accessed the drive, so perhaps there was no SMART value stored ? I'm going to be gone for 4 days so I'm going to let it run the memtest until i get back...
March 25, 201115 yr Author ok, i'm back to square 1. I'm back from my business trip, ran the memtest for ~5 days and no errors. I rebooted, and noticed that ata3 or one of my new 5k3000 hitachi drives wasn't being recognized by the Supermicro card. I pulled it out and reseated it in the cage and upon reboot it was detected. Checked the syslog, no errors. I then started 4 clearing sessions, one in each telnet session. This morning, I notice that the telnet session for ata3 (or /dev/sdd) had: essage from syslogd@Tower at Fri Mar 25 08:01:07 2011 ... Tower kernel: t eekCo:ls0xmlete } Message from syslogd@Tower at Fri Mar 25 08:01:54 2011 ... Tower kernel: 5<mf Message from syslogd@Tower at Fri Mar 25 08:02:02 2011 ... Tower kernel: 00cs0000: mvsas d-24t:oeerstnorx Message from syslogd@Tower at Fri Mar 25 08:02:14 2011 ... Tower kernel: t eekp._to ee }>ekfCyrent fe Sess 00er 0 porsn0a <osl0xC /CS b004t:tu00 DSkme }m.0!ldecea tn:1 Message from syslogd@Tower at Fri Mar 25 08:02:23 2011 ... Tower kernel: 5<mf Message from syslogd@Tower at Fri Mar 25 08:02:32 2011 ... Tower kernel: s0:0[dA=0S=0 85 0ror, 2:0032 Message from syslogd@Tower at Fri Mar 25 08:02:51 2011 ... Tower kernel: 5<mf The syslog file is very large: -rw-r--r-- 1 root root 2147483647 Mar 25 06:48 syslog I'm wondering if I'm running out of memory ? I have 4 gigs of memory and a 4 gig USB key, results of "free" shows: root@Tower:/var/log# free total used free shared buffers cached Mem: 4115652 3124968 990684 0 817684 2247940 -/+ buffers/cache: 59344 4056308 Swap: 0 0 0 The preclear for sdc, sde and sdg is still running but I also see the weird messages but it gets overwritten when the preclear screen refreshes. Again, similar to before, as I was posting this, preclear for sdc and sde just stopped after 8hrs: ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdc = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Elapsed Time: 8:25:17 ========================================================================1.9 == == SORRY: Disk /dev/sdc MBR could NOT be precleared == == out4= 00000 == out5= 00000 ============================================================================ dd: reading `/dev/sdc': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000674543 s, 0.0 kB/s 0000000 root@Tower:/boot# ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sde = cycle 1 of 1, partition start on sector 63 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Verifying if the MBR is cleared. DONE = Elapsed Time: 8:27:40 ========================================================================1.9 == == SORRY: Disk /dev/sde MBR could NOT be precleared == == out4= 00000 == out5= 00000 ============================================================================ dd: reading `/dev/sde': Input/output error 0+0 records in 0+0 records out 0 bytes (0 B) copied0000000 , 0.000242271 s, 0.0 kB/s attached is the end of the syslog. Suggestions ? When I try to restart the preclear in the telnet window I get: root@Tower:/boot# ./preclear_disk.sh /dev/sdc BLKRRPART: Input/output error Sorry: Device /dev/sdc is not responding to an fdisk -l /dev/sdc command. You might try power-cycling it to see if it will start responding. root@Tower:/boot# ./preclear_disk.sh /dev/sdd BLKRRPART: Input/output error Sorry: Device /dev/sdd is not responding to an fdisk -l /dev/sdd command. You might try power-cycling it to see if it will start responding. root@Tower:/boot# ./preclear_disk.sh /dev/sde BLKRRPART: Input/output error Sorry: Device /dev/sde is not responding to an fdisk -l /dev/sde command. You might try power-cycling it to see if it will start responding. syslog-2011-03-25-end.txt
March 25, 201115 yr The syslog file is very large: -rw-r--r-- 1 root root 2147483647 Mar 25 06:48 syslog I'm wondering if I'm running out of memory ? Yes, the syslog is very large, and I expect it wil eventually grow to consume all memory. At that point, the OS will start killing inactive processes to make even more room available. Sooner or later, the machine will crash. I would try, if you can, to zip the syslog from the flash share onto your workstation, and post it here for analysis. I expect you have some cabling problem to your drives, perhaps a bad or badly seated breakout cable. If some of your preclears appear to be running normally, you might try leaving them run to completion. But the ones that are reporting errors will need to be redone anyway, and you need to stop them before they crash your server (ctrl-C will stop a preclear). It it were me, I'd stop all of the preclears, post my syslog, and wait for further instruction. I'd probably want to shutdown the server and boot (cold rather than warm boot) - but you might want to wait until after people get a chance to look at the syslog, becuase there might be requests for information that would be lost if you reboot.
March 28, 201115 yr Author I think I might have solved the issues that I posted (although I have some other sporadic errors showing up in syslog). The corruption above I suspect is due to the log explosion problem and overwriting memory... I've since created a tmpfs drive to hold the syslog files so it doesn't crash my server due to log explosion. The preclearing issues seem to be related to a brand new Hitachi 5k3000 drive, this drive would sporadically show up/not, and even with all other drives removed and only this drive connected it would have problems, regardless of the SATA port I used (Supermicro, or on board MB). I'm going to RMA this drive. Removed this drive and everything preclears fine. As to the SMART values not showing up on the Seagate drives, it's kind of odd, but I think after I flashed with the new firmware that the default is to disable SMART. I can enable it: /boot/smartctl -d ata --smart=on --autosave=on /dev/sde it works fine, but doesn't stay on after a cold reboot. I don't know why, but when connected to the MB sata ports, SMART is enabled by default, guess the MB automatically issues that command. But when connected to the Supermicro and the SIL3132 card, it is disabled. I'll open another post to figure out how to enable SMART on reboot. So I guess this thread can now be deemed SOLVED. Now to my other smaller issues...
Archived
This topic is now archived and is closed to further replies.