Jump to content

jbuszkie

Members
  • Posts

    696
  • Joined

  • Last visited

Everything posted by jbuszkie

  1. Joe, I just got a new unraid MB and CPU and I'm currently testing it with two new Samsung 1.5T drives. I'm preclearing both and I'm not getting nowhere near the speeds you are. If yours was a PCI based system.. Mine is a new pci-e based system. I only have the two drives attached. Syslog says they are runnign in 3.0Gbs... But they are both going at a rate of about 25% every 4 hours for the preread. Even when I just did one drive I was getting 2GB/min ~ 34MB/s. I would expect a lot better than that! Right now I'm getting about 25.6MB/s. Am I missing something. In the log I see: Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Home page is http://smartmontools.sourceforge.net/ Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: === START OF INFORMATION SECTION === Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Device Model: SAMSUNG HD154UI Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Serial Number: S1Y6J1KS743788 Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Firmware Version: 1AG01118 Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: User Capacity: 1,500,301,910,016 bytes Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Device is: In smartctl database [for details use: -P show] Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: ATA Version is: 8 Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: ATA Standard is: ATA-8-ACS revision 3b Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Local Time is: Fri Aug 21 23:17:43 2009 EDT Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details. Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: SMART support is: Available - device has SMART capability. Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: SMART support is: Enabled Aug 21 23:17:43 Tower2 preclear_disk-start[14626]: What is this -F samsung? I'm running in AHCI mode set in the bios. Anything else I'm missing?
  2. Each cycle is just about 12hours. I'm in no immediate rush so I just popped off another cycle. Maybe an interesting additiion to the script would be to save the smart data after every cycle so we can see when the events happend. When I ran the 1st 3 cycles I don't know if the events happened in the 1st, 2nd, or 3rd cycle.. Jim
  3. Done! I started a new thread that can be devoted to just questions about the results of the script. Hopefully all the gurus will monitor that thread too! Thanks again, Joe, for a great script! Results discussion thread can be found here
  4. In an effort to keep the Preclear script thread more about questions about the script itself, I've started another thread here to discuss the results. The preclear thread is peppered with result questions and questions about the script and is now 15 pages long! So I'm thinking that a seperate thread was warranted. So I'll start it off... If it stays at 5, in my opinion, no problem. If it increases over time, then you might want to use the RMA process. Odds are good it will stabilize. I have one 250Gig drive that has had 100 relocated sectors since the first time I ran smartctl on it. That number has never changed on that disk. I'd say, download the new version of preclear_disk.sh and run another set of test cycles and see if it shows an increase in re-allocated sectors. (the new version stress-tests the drive more. The old one had a bug that prevented the random cylinders from being read in addition to the linear read that was properly occurring) If the number stays at 5, fine, if not another test cycle might be in order. At that point you have all the evidence you need if an RMA is warranted. You might want to start a thread with your preclear experience. It will allow the questions about the output to all be in one spot. Joe L. Ok.. I ran one more full cycle with the new verions of the script and I got no reallocated sector changes. Should I run once more or do you think I'm good now and can put the disk into service? So... first 3 cycles. - 5 reallocated sectors 4th cycle - no more reallocated sectors. Jim
  5. After running 3 interations on my new 1TB green disk I had < 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 --- > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 5 64c64 < 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 --- > 196 Reallocated_Event_Count 0x0032 199 199 000 Old_age Always - 1 Are 5 reallocated sectors anything to worry about.. I was hoping for 0! This is still running on the old version of the script.. Maybe I should try the new version.. (I started my test the morning before Joe posted the new version!) I did start a cycle again on a different controller (one cycle this time - and still the old script) Another thought... Should we start a new thread for preclear disk result questions and keep this thread for questions/comments about the functionality of preclear? Jim
  6. If you do use the mail programs listed from the posts above (from unraid_notify and it's mail offshoot) you will have to use the -m [email protected] command line parameter It will not default to the e-mail address in the unraid_notify.config file. Maybe we can change the mail script to handle "root" as a recipient someday.. I'm still hoping that brianbone will update the package into a seperate mail and unraid_notify package! Jim
  7. The thirst for adventure is outweighed by my thirst for more disk space! My cache drive is filling up because there is no room left on the array!
  8. Whew! 56 hours of testing later, my replacement 1.5T disk passed 3 cycles! Thanks again for a great script! It helped catch the first disk being bad!! =========================================================================== = unRAID server Pre-Clear disk /dev/sdd = cycle 3 of 3 = Disk Pre-Clear-Read completed DONE = Step 1 of 10 - Copying zeros to first 2048k bytes DONE = Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE = Step 3 of 10 - Disk is now cleared from MBR onward. DONE = Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE = Step 5 of 10 - Clearing MBR code area DONE = Step 6 of 10 - Setting MBR signature bytes DONE = Step 7 of 10 - Setting partition 1 to precleared state DONE = Step 8 of 10 - Notifying kernel we changed the partitioning DONE = Step 9 of 10 - Creating the /dev/disk/by* entries DONE = Step 10 of 10 - Testing if the clear has been successful. DONE = Disk Post-Clear-Read completed DONE Elapsed Time: 56:18:58 ============================================================================ == == Disk /dev/sdd has been successfully precleared == ============================================================================ S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 19,20c19,20 < Offline data collection status: (0x80) Offline data collection activity < was never started. --- > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. 54c54 < 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 --- > 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 ============================================================================
  9. So you are saying you are also known as google.com? Awww.. come on! Where's your thirst for adventure! Think of it as a challenge! I dare you to try! just kidding! Yeah.. Adding something to display_progress() function might be easy enough.. I'll tinker once I have some time.. but my guess is that I'll be done with the parity upgrade and forget about this until the next time! Oh.. and it is well commented.. good job!
  10. Nice! Let me rephrase.. how hard would it be for you to modify this script to e-mail updates as the test is progressing :-) I've already got unraid notify working.. Maybe I will dig into that.. someday.. For now I'm having to connect back into the session every couple of hours.. :-) hmm.. redirect the output to tee and send the contents to bashmail.. hhmm... all with variables to disable it... Jim
  11. Dumb question here.. I got got my replacement drive and it's on it's second pass. (18 hours for the first pass!) Since this is my new parity drive, do I really gain anything by running this script? (other than testing the drive) It's going to have to write all the parity anyway when I assign it, correct? oh.. And how hard would it be to modify this script to e-mail me updates as the test is progressing? Jim
  12. Yeah... It's WD not Maxtor like I said in my original post.. My replacement drive should arrive today! Jim
  13. Oh... and btw... Thanks for the excellent script!!! It probably saved me a lot of headaches!
  14. Any tests you want me to try before I RMA the drive? It happily went through the pre-read and the zeroing out without reporting an error. I imagine it would have shown some problem when it did the smart diff, right?
  15. Here is my hdparm info and smart test info. Interestingly.. After the power cycle my disk 2 was "missing" I power cycled again and it came back? Now I just have to see if disk 2 is on the same controller as my new disk.. /dev/sdd: ATA device, with non-removable media Model Number: WDC WD15EADS-00H7B0 Serial Number: WD-WCAUP0018631 Firmware Revision: 05.00K05 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5 Standards: Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 2930277168 device size with M = 1024*1024: 1430799 MBytes device size with M = 1000*1000: 1500301 MBytes (1500 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * 64-bit World wide name * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * SATA-I signaling speed (1.5Gb/s) * SATA-II signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters DMA Setup Auto-Activate optimization * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Long Sector Access (AC1) * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) unknown 206[12] (vendor specific) unknown 206[13] (vendor specific) Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 412min for SECURITY ERASE UNIT. 412min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 50014ee2ad0035dd NAA : 5 IEEE OUI : 14ee Unique ID : 2ad0035dd Checksum: correct root@Tower:~# root@Tower:~# root@Tower:~# smartctl -a -d ata /dev/sdd smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD15EADS-00H7B0 Serial Number: WD-WCAUP0018631 Firmware Version: 05.00K05 User Capacity: 1,500,301,910,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu May 14 13:33:22 2009 GMT+5 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40500) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 139 139 051 Pre-fail Always - 14844 3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 9 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 0 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 9 194 Temperature_Celsius 0x0022 127 121 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 195 195 000 Old_age Always - 1311 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Now do I have to be concerned about the "Current_Pending_Sector " Number? Seems like that should be 0 for a new good drive.. Could a bad controller have any effect on that number?
  16. After the reboot it seems to be happily be running I'm at 98% of the pre-read... Ok.. Change that... I guess it isn't happy... I'm getting more of those errors on the zeroing.. Here is a snippet of the log. The snippet starts at close to the end of the pre read and captures the start of the zeroing.. I'm in the middle of a power cycle (remotly so I may not get it back). I'll have to look to see if the very first pass of this test behaved well... I'll post the smart results when the computer reboots..
  17. I am running this on my new 1.5TB Maxtor Green. It did one full pass that seemed to work. On the second pass, it didn't finish. I looked in /tmp for the smart logs, but it appears to have been deleted. where should I look to see what happened? I remember seeing something about not being able to do something with the MBR. My putty session got killed when I rebooted. I'm going to try again and see if it was just some sort of fluke. My syslog file is 500Meg! With a whole ton of these: May 14 04:48:23 Tower kernel: end_request: I/O error, dev sdd, sector 2930137216 May 14 04:48:23 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 And I see this May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930043648 May 14 04:48:22 Tower kernel: __ratelimit: 78016 callbacks suppressed May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255456 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255457 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255458 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255459 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255460 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255461 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255462 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255463 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255464 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: Buffer I/O error on device sdd, logical block 366255465 May 14 04:48:22 Tower kernel: lost page write due to I/O error on sdd May 14 04:48:22 Tower kernel: sd 6:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 May 14 04:48:22 Tower kernel: end_request: I/O error, dev sdd, sector 2930044672 As well Is there anything I should look for? Jim
×
×
  • Create New...