Two Pre-clears Failed, whats going on


Recommended Posts

Hi Guys I need some expert opinion on whats going on,  I started a pre-clear on 4 drives and 2 of 4 preclear ended early below is a snippet of the syslog and there's a link to full syslog http://www.sobon.ca/Unraid/syslog-2012-10-25.txt

 

 

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ========================================================================1.13 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == invoked as: ./preclear_disk.sh -A /dev/sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == ST2000DL003-9VT166 5YD5VCJP (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Disk /dev/sdl has been successfully precleared (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == with a starting sector of 64 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Ran 1 cycle (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Using :Read block size = Bytes (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Last Cycle's Pre Read Time : 3:41:00 (150 MB/s) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Last Cycle's Zeroing time : 0:00:27 ( MB/s) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Last Cycle's Post Read Time : ( MB/s) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Last Cycle's Total Time : (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Total Elapsed Time 3:41:28 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Disk Start Temperature: 28C (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == Current (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ============================================================================ (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ** Changed attributes in files: /tmp/smart_start_sdl /tmp/smart_finish_sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Raw_Read_Error_Rate = 111 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Spin_Up_Time = 90 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Start_Stop_Count = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Reallocated_Sector_Ct = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Seek_Error_Rate = 68 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Power_On_Hours = 94 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Spin_Retry_Count = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Power_Cycle_Count = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Runtime_Bad_Block = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: End-to-End_Error = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Reported_Uncorrect = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Command_Timeout = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: High_Fly_Writes = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Airflow_Temperature_Cel = 72 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: G-Sense_Error_Rate = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Power-Off_Retract_Count = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Load_Cycle_Count = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Temperature_Celsius = 28 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Hardware_ECC_Recovered = 29 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Current_Pending_Sector = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Offline_Uncorrectable = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: UDMA_CRC_Error_Count = 200 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Head_Flying_Hours = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Total_LBAs_Written = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Total_LBAs_Read = 100 ok (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: No SMART attributes are FAILING_NOW (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 0 sectors were pending re-allocation before the start of the preclear. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: a change of 0 in the number of sectors pending re-allocation. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 0 sectors had been re-allocated before the start of the preclear. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: a change of 0 in the number of sectors re-allocated. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART overall-health status = (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ============================================================================ (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ============================================================================ (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == S.M.A.R.T Initial Report for /dev/sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Disk: /dev/sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: === START OF INFORMATION SECTION === (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Device Model: ST2000DL003-9VT166 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Serial Number: 5YD5VCJP (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Firmware Version: CC3C (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: User Capacity: 2,000,398,934,016 bytes (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Device is: Not in smartctl database [for details use: -P showall] (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ATA Version is: 8 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ATA Standard is: ATA-8-ACS revision 4 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Local Time is: Thu Oct 25 14:00:45 2012 EDT (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART support is: Available - device has SMART capability. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART support is: Enabled (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: === START OF READ SMART DATA SECTION === (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART overall-health self-assessment test result: PASSED (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: General SMART Values: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Offline data collection status: (0x82)^IOffline data collection activity (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^Iwas completed without error. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^IAuto Offline Data Collection: Enabled. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Self-test execution status: ( 0)^IThe previous self-test routine completed (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^Iwithout error or no self-test has ever (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^Ibeen run. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Total time to complete Offline (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: data collection: ^I^I ( 612) seconds. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Offline data collection (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: capabilities: ^I^I^I (0x7b) SMART execute Offline immediate. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^IAuto Offline data collection on/off support. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISuspend Offline collection upon new (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^Icommand. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^IOffline surface scan supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISelf-test supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^IConveyance Self-test supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISelective Self-test supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART capabilities: (0x0003)^ISaves SMART data before entering (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^Ipower-saving mode. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISupports SMART auto save timer. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Error logging capability: (0x01)^IError logging supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^IGeneral Purpose Logging supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Short self-test routine (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: recommended polling time: ^I ( 1) minutes. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Extended self-test routine (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: recommended polling time: ^I ( 255) minutes. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Conveyance self-test routine (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: recommended polling time: ^I ( 2) minutes. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SCT capabilities: ^I (0x30b7)^ISCT Status supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISCT Feature Control supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ^I^I^I^I^ISCT Data Table supported. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART Attributes Data Structure revision number: 10 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Vendor Specific SMART Attributes with Thresholds: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 30108296 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 3 Spin_Up_Time 0x0003 090 085 000 Pre-fail Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 29 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 7 Seek_Error_Rate 0x000f 068 060 030 Pre-fail Always - 7009913 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5279 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 21 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 190 Airflow_Temperature_Cel 0x0022 072 056 045 Old_age Always - 28 (Min/Max 24/30) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 29 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 194 Temperature_Celsius 0x0022 028 044 000 Old_age Always - 28 (0 20 0 0) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 195 Hardware_ECC_Recovered 0x001a 029 011 000 Old_age Always - 30108296 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 77227806953798 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2724079113 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3759955724 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART Error Log Version: 1 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: No Errors Logged (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART Self-test log structure revision number 1 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: No self-tests have been logged. [To run self-tests, use: smartctl -t] (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SMART Selective self-test log data structure revision number 1 (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 1 0 0 Not_testing (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 2 0 0 Not_testing (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 3 0 0 Not_testing (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 4 0 0 Not_testing (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: 5 0 0 Not_testing (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Selective self-test flags (0x0): (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: After scanning selected spans, do NOT read-scan remainder of disk. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: If Selective self-test is pending on power-up, resume after 0 minute delay. (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ============================================================================ (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

Oct 25 17:41:46 Tower last message repeated 2 times

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: ============================================================================ (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == S.M.A.R.T Final Report for /dev/sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: == (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Disk: /dev/sdl (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net (Misc)

Oct 25 17:41:46 Tower preclear_disk-diff[1903]: (Misc)

 

Backround it's a Virtualized Unraid 5.0-rc8a runing on ESXI 4.1 with 3 x aoc-saslp-mv8 with PCI passthrough, the system ran stable until one of my drives died which led me to a full rebuild as more than one drive seems affected.

Link to comment

The syslog "diff" stuff you posted is very difficult to wade through...

 

I cannot determine why you think the preclear failed from it.

 

To help you, please zip and attach the reports for those two disks from the

/boot/preclear_reports

folder.

 

For each disk there is a SMART report from before the preclear, a SMART report from after, and a summary report.

For me to spend ANY time, I need to see the three reports.

 

From your syslog, it appears that /dev/sdl has stopped responding to the disk controller.  No smart report is being produced.  (Could be bad power connection, bad SATA connection, or drive that has failed, or disk controller port that has failed, bad cable to disk, power supply inadequate to power all disks)

 

Joe L.

Link to comment

to add to this those drives, now appear like they just disappeared, someone yanked them:

 

Drives%20missing.jpg

Or, as I already said, they lost power, or the disk controller died, or both disks died.  Or both SATA cables pulled loose.  It happens.

Look up "bathtub-curve" on google.  Your drives might be perfect examples, although unlikely for two to die at nearly the same time unless from the same defective manufacturing batch..  Far more likely for one to die, and lock up a disk controller chipset they share with another disk.

 

Or, you show 13 or 14 disks... Is your power supply capable?  (What exact make/model?)  Do the new disks share a power splitter? or a backplane?

 

Joe L.

Link to comment

It's got me stumped, yes they share the same backplane, but there's 3 other disks unafected, the only thing I can think of is the controller card, however it doesn't make any sense why onyl the two disks have gone.

 

The power supply should be more than adequate supermicro triple redundant 750w and it's pulled it's weigh through a lot more drives, previously.

 

I'm going to do a cold boot of the ESXI host and look over the cabling etc. it's one of those days i want to pack up my toys and go home... lol

Link to comment

Just an update, took a look inside the box everything checked out the only thing I decided to change is move the card the drives failed down a slot, since they are both 4x it doesn't really matter however I remeber having a "Diabling IRQ#18" issue back in the day which owuld grind things to a snails pace, I believe it was related to that particular slot.

The only work around I found was to add the "noirqdebug" in the syslinux.cfg, however I don't want to explore that today.

 

Powered up and chumming away, the drives that appeared powered off are now seen again and preclears are chumming away

preclears.jpg

Link to comment

On the 25th I said:

Your drives might be perfect examples, although unlikely for two to die at nearly the same time unless from the same defective manufacturing batch..  Far more likely for one to die, and lock up a disk controller chipset they share with another disk.

 

Joe L.

Link to comment

Thx Joe!

 

I'm not out of the woods yet, now that i'm back on track I started getting a errors in syslog:

Oct 28 14:17:01 Tower kernel: Mem-Info:

Oct 28 14:17:01 Tower kernel: DMA per-cpu:

Oct 28 14:17:01 Tower kernel: CPU    0: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    1: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    2: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    3: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    4: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    5: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    6: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: CPU    7: hi:    0, btch:  1 usd:  0

Oct 28 14:17:01 Tower kernel: Normal per-cpu:

Oct 28 14:17:01 Tower kernel: CPU    0: hi:  186, btch:  31 usd: 178

Oct 28 14:17:01 Tower kernel: CPU    1: hi:  186, btch:  31 usd: 104

Oct 28 14:17:01 Tower kernel: CPU    2: hi:  186, btch:  31 usd:  27

Oct 28 14:17:01 Tower kernel: CPU    3: hi:  186, btch:  31 usd:  22

Oct 28 14:17:01 Tower kernel: CPU    4: hi:  186, btch:  31 usd: 163

Oct 28 14:17:01 Tower kernel: CPU    5: hi:  186, btch:  31 usd:  41

Oct 28 14:17:01 Tower kernel: CPU    6: hi:  186, btch:  31 usd: 166

Oct 28 14:17:01 Tower kernel: CPU    7: hi:  186, btch:  31 usd:  60

Oct 28 14:17:01 Tower kernel: HighMem per-cpu:

Oct 28 14:17:01 Tower kernel: CPU    0: hi:  186, btch:  31 usd: 183

Oct 28 14:17:01 Tower kernel: CPU    1: hi:  186, btch:  31 usd: 182

Oct 28 14:17:01 Tower kernel: CPU    2: hi:  186, btch:  31 usd: 178

Oct 28 14:17:01 Tower kernel: CPU    3: hi:  186, btch:  31 usd: 165

Oct 28 14:17:01 Tower kernel: CPU    4: hi:  186, btch:  31 usd:  52

Oct 28 14:17:01 Tower kernel: CPU    5: hi:  186, btch:  31 usd: 171

Oct 28 14:17:01 Tower kernel: CPU    6: hi:  186, btch:  31 usd:  50

Oct 28 14:17:01 Tower kernel: CPU    7: hi:  186, btch:  31 usd:  20

Oct 28 14:17:01 Tower kernel: active_anon:8967 inactive_anon:34 isolated_anon:0

Oct 28 14:17:01 Tower kernel:  active_file:41721 inactive_file:636147 isolated_file:0

Oct 28 14:17:01 Tower kernel:  unevictable:44399 dirty:0 writeback:0 unstable:0

Oct 28 14:17:01 Tower kernel:  free:5398 slab_reclaimable:10233 slab_unreclaimable:6374

Oct 28 14:17:01 Tower kernel:  mapped:2245 shmem:52 pagetables:299 bounce:0

Oct 28 14:17:01 Tower kernel: DMA free:3504kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:680kB inactive_file:5988kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15780kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4260kB slab_unreclaimable:1372kB kernel_stack:80kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:01 Tower kernel: lowmem_reserve[]: 0 869 3032 3032

Oct 28 14:17:01 Tower kernel: Normal free:6356kB min:3736kB low:4668kB high:5604kB active_anon:2080kB inactive_anon:4kB active_file:133404kB inactive_file:581028kB unevictable:120kB isolated(anon):0kB isolated(file):128kB present:890008kB mlocked:0kB dirty:0kB writeback:0kB mapped:64kB shmem:8kB slab_reclaimable:36672kB slab_unreclaimable:24124kB kernel_stack:1152kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:01 Tower kernel: lowmem_reserve[]: 0 0 17303 17303

Oct 28 14:17:01 Tower kernel: HighMem free:11732kB min:512kB low:2836kB high:5160kB active_anon:33788kB inactive_anon:132kB active_file:32800kB inactive_file:1957432kB unevictable:177476kB isolated(anon):0kB isolated(file):0kB present:2214820kB mlocked:0kB dirty:0kB writeback:0kB mapped:8916kB shmem:200kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1164kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:01 Tower kernel: lowmem_reserve[]: 0 0 0 0

Oct 28 14:17:01 Tower kernel: DMA: 112*4kB 2*8kB 0*16kB 1*32kB 1*64kB 5*128kB 3*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3504kB

Oct 28 14:17:01 Tower kernel: Normal: 1573*4kB 3*8kB 8*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6604kB

Oct 28 14:17:01 Tower kernel: HighMem: 2217*4kB 40*8kB 11*16kB 14*32kB 10*64kB 8*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 11732kB

Oct 28 14:17:01 Tower kernel: 722321 total pagecache pages

Oct 28 14:17:01 Tower kernel: 0 pages in swap cache

Oct 28 14:17:01 Tower kernel: Swap cache stats: add 0, delete 0, find 0/0

Oct 28 14:17:01 Tower kernel: Free swap  = 0kB

Oct 28 14:17:01 Tower kernel: Total swap = 0kB

Oct 28 14:17:01 Tower kernel: 786416 pages RAM

Oct 28 14:17:01 Tower kernel: 558082 pages HighMem

Oct 28 14:17:01 Tower kernel: 7748 pages reserved

Oct 28 14:17:01 Tower kernel: 394509 pages shared

Oct 28 14:17:01 Tower kernel: 388105 pages non-shared

Oct 28 14:17:02 Tower kernel: swapper/0: page allocation failure: order:2, mode:0x4020

Oct 28 14:17:02 Tower kernel: Pid: 0, comm: swapper/0 Not tainted 3.4.11-unRAID #1 (Errors)

Oct 28 14:17:02 Tower kernel: Call Trace: (Errors)

Oct 28 14:17:02 Tower kernel:  [<c1062eb6>] warn_alloc_failed+0xbd/0xcf (Errors)

Oct 28 14:17:02 Tower kernel:  [<c106372e>] __alloc_pages_nodemask+0x47c/0x4a5 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c10637ad>] __get_free_pages+0xf/0x21 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c10821c8>] __kmalloc+0x2c/0xf0 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a0072>] pskb_expand_head+0xc1/0x224 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a054b>] __pskb_pull_tail+0x41/0x21e (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a4421>] ? netif_skb_features+0x84/0x8e (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a788f>] dev_hard_start_xmit+0x213/0x31b (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12c34f7>] ? ip_finish_output+0x220/0x258 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12b6184>] sch_direct_xmit+0x54/0x13f (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a7a95>] dev_queue_xmit+0xfe/0x286 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12c34f7>] ip_finish_output+0x220/0x258 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12c35c2>] ip_output+0x93/0x9a (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12c2a5a>] ip_local_out+0x1b/0x1e (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12c2f3e>] ip_queue_xmit+0x2ad/0x2ee (Errors)

Oct 28 14:17:02 Tower kernel:  [<c129e519>] ? __skb_clone+0x22/0xbf (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d27b6>] tcp_transmit_skb+0x4d3/0x508 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d4993>] tcp_write_xmit+0x2f2/0x3ec (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d4ad1>] __tcp_push_pending_frames+0x18/0x6f (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d1481>] tcp_rcv_established+0xfa/0x575 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d6978>] tcp_v4_do_rcv+0x47/0x13a (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12d6e4e>] tcp_v4_rcv+0x3e3/0x665 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12bf240>] ip_local_deliver_finish+0xba/0x192 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12bf379>] ip_local_deliver+0x61/0x66 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12bef01>] ip_rcv_finish+0x23d/0x253 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12bf150>] ip_rcv+0x239/0x26f (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a5189>] __netif_receive_skb+0x223/0x259 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a6767>] netif_receive_skb+0x66/0x6c (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a682b>] napi_skb_finish+0x1e/0x34 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a6c98>] napi_gro_receive+0xe7/0xef (Errors)

Oct 28 14:17:02 Tower kernel:  [<f84d3c39>] e1000_receive_skb+0x46/0x4c [e1000] (Errors)

Oct 28 14:17:02 Tower kernel:  [<f84d43ae>] e1000_clean_rx_irq+0x2b7/0x357 [e1000] (Errors)

Oct 28 14:17:02 Tower kernel:  [<f84d7061>] e1000_clean+0x3c/0x198 [e1000] (Errors)

Oct 28 14:17:02 Tower kernel:  [<c12a6d6e>] net_rx_action+0x59/0x12c (Errors)

Oct 28 14:17:02 Tower kernel:  [<c1027056>] __do_softirq+0x6b/0xe5 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c1026feb>] ? irq_enter+0x41/0x41 (Errors)

Oct 28 14:17:02 Tower kernel:  <IRQ>  [<c1026e9f>] ? irq_exit+0x32/0x58

Oct 28 14:17:02 Tower kernel:  [<c1003506>] ? do_IRQ+0x7c/0x90 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c13208e9>] ? common_interrupt+0x29/0x30 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c100820f>] ? default_idle+0x1c/0x2c (Errors)

Oct 28 14:17:02 Tower kernel:  [<c10083e8>] ? cpu_idle+0x4b/0x65 (Errors)

Oct 28 14:17:02 Tower kernel:  [<c130f8f0>] ? rest_init+0x58/0x5a (Errors)

Oct 28 14:17:02 Tower kernel:  [<c14547a3>] ? start_kernel+0x286/0x28b (Errors)

Oct 28 14:17:02 Tower kernel:  [<c14540a0>] ? i386_start_kernel+0xa0/0xa7 (Errors)

Oct 28 14:17:02 Tower kernel: Mem-Info:

Oct 28 14:17:02 Tower kernel: DMA per-cpu:

Oct 28 14:17:02 Tower kernel: CPU    0: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    1: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    2: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    3: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    4: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    5: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    6: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    7: hi:    0, btch:  1 usd:  0

Oct 28 14:17:02 Tower kernel: Normal per-cpu:

Oct 28 14:17:02 Tower kernel: CPU    0: hi:  186, btch:  31 usd:  33

Oct 28 14:17:02 Tower kernel: CPU    1: hi:  186, btch:  31 usd:  36

Oct 28 14:17:02 Tower kernel: CPU    2: hi:  186, btch:  31 usd: 155

Oct 28 14:17:02 Tower kernel: CPU    3: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    4: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    5: hi:  186, btch:  31 usd:  41

Oct 28 14:17:02 Tower kernel: CPU    6: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    7: hi:  186, btch:  31 usd:  26

Oct 28 14:17:02 Tower kernel: HighMem per-cpu:

Oct 28 14:17:02 Tower kernel: CPU    0: hi:  186, btch:  31 usd:  23

Oct 28 14:17:02 Tower kernel: CPU    1: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    2: hi:  186, btch:  31 usd:  31

Oct 28 14:17:02 Tower kernel: CPU    3: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    4: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    5: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    6: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: CPU    7: hi:  186, btch:  31 usd:  0

Oct 28 14:17:02 Tower kernel: active_anon:9065 inactive_anon:34 isolated_anon:0

Oct 28 14:17:02 Tower kernel:  active_file:41721 inactive_file:636525 isolated_file:0

Oct 28 14:17:02 Tower kernel:  unevictable:44399 dirty:0 writeback:0 unstable:0

Oct 28 14:17:02 Tower kernel:  free:6277 slab_reclaimable:10233 slab_unreclaimable:6374

Oct 28 14:17:02 Tower kernel:  mapped:2245 shmem:52 pagetables:372 bounce:0

Oct 28 14:17:02 Tower kernel: DMA free:3504kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:680kB inactive_file:5988kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15780kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4260kB slab_unreclaimable:1372kB kernel_stack:80kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:02 Tower kernel: lowmem_reserve[]: 0 869 3032 3032

Oct 28 14:17:02 Tower kernel: Normal free:5336kB min:3736kB low:4668kB high:5604kB active_anon:2080kB inactive_anon:4kB active_file:133404kB inactive_file:584108kB unevictable:120kB isolated(anon):0kB isolated(file):0kB present:890008kB mlocked:0kB dirty:0kB writeback:0kB mapped:64kB shmem:8kB slab_reclaimable:36672kB slab_unreclaimable:24124kB kernel_stack:1152kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:02 Tower kernel: lowmem_reserve[]: 0 0 17303 17303

Oct 28 14:17:02 Tower kernel: HighMem free:16268kB min:512kB low:2836kB high:5160kB active_anon:34180kB inactive_anon:132kB active_file:32800kB inactive_file:1956004kB unevictable:177476kB isolated(anon):0kB isolated(file):0kB present:2214820kB mlocked:0kB dirty:0kB writeback:0kB mapped:8916kB shmem:200kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1456kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 28 14:17:02 Tower kernel: lowmem_reserve[]: 0 0 0 0

Oct 28 14:17:02 Tower kernel: DMA: 112*4kB 2*8kB 0*16kB 1*32kB 1*64kB 5*128kB 3*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3504kB

Oct 28 14:17:02 Tower kernel: Normal: 1298*4kB 2*8kB 8*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5496kB

Oct 28 14:17:02 Tower kernel: HighMem: 3346*4kB 108*8kB 11*16kB 14*32kB 10*64kB 8*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16792kB

Oct 28 14:17:02 Tower kernel: 722704 total pagecache pages

Oct 28 14:17:02 Tower kernel: 0 pages in swap cache

Oct 28 14:17:02 Tower kernel: Swap cache stats: add 0, delete 0, find 0/0

Oct 28 14:17:02 Tower kernel: Free swap  = 0kB

Oct 28 14:17:02 Tower kernel: Total swap = 0kB

Oct 28 14:17:02 Tower kernel: 786416 pages RAM

Oct 28 14:17:02 Tower kernel: 558082 pages HighMem

Oct 28 14:17:02 Tower kernel: 7748 pages reserved

Oct 28 14:17:02 Tower kernel: 396288 pages shared

Oct 28 14:17:02 Tower kernel: 387973 pages non-shared

Link to comment

Is there a way to address those issues, it have 3gb assigned and cache dirs not running?

Typically, it is "low" memory that runs out.

Run fewer processes.

Add more memory.

Tune kernel parameters.

Add a swap file.

 

Type

free -l

to see memory status.

 

If it only occurs when preclearing drives, use the

-r

-w

-b

options to the preclear script to limit its memory usage.

preclear_disk.sh -r 65536 -w 65536 -b 2048 /dev/sdX

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.