Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

It shows there was probably nothing wrong with the drive at all.

 

If the drive was at fault (with unreadable sectors)you would have had sectors pending re-allocation, or already re-allocated at the START of the preclear.

 

Without a syslog showing the specific errors, nobody can even tell why the read-errors occurred,. 

Regardless, nobody can guess if the drive is safe to re-use.  The preclear report looks perfectly fine.

Link to comment

So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started.  The same error came up before the drive reballed before. Any input before i retire this drive?

Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:20 SERVER kernel: Result: hostbyte=0x00 driverbyte=0x08
Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:20 SERVER kernel: Sense Key : 0x3 [current] [descriptor]
Feb 12 18:54:20 SERVER kernel: Descriptor sense data with sense descriptors (in hex):
Feb 12 18:54:20 SERVER kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 12 18:54:20 SERVER kernel: 07 d3 d5 50
Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:20 SERVER kernel: ASC=0x11 ASCQ=0x4
Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp] CDB:
Feb 12 18:54:20 SERVER kernel: cdb[0]=0x28: 28 00 07 d3 d5 50 00 00 08 00
Feb 12 18:54:20 SERVER kernel: end_request: I/O error, dev sdp, sector 131323216
Feb 12 18:54:20 SERVER kernel: Buffer I/O error on device sdp, logical block 16415402
Feb 12 18:54:20 SERVER kernel: ata16: EH complete
Feb 12 18:54:23 SERVER kernel: ata16: failed to read log page 10h (errno=-5)
Feb 12 18:54:23 SERVER kernel: ata16.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6
Feb 12 18:54:23 SERVER kernel: ata16.00: edma_err_cause=00000084 pp_flags=00000003, dev error, EDMA self-disable
Feb 12 18:54:23 SERVER kernel: ata16.00: failed command: READ FPDMA QUEUED
Feb 12 18:54:23 SERVER kernel: ata16.00: cmd 60/08:00:50:d5:d3/00:00:07:00:00/40 tag 0 ncq 4096 in
Feb 12 18:54:23 SERVER kernel: res 41/40:04:50:d5:d3/40:00:07:00:00/40 Emask 0x9 (media error)
Feb 12 18:54:23 SERVER kernel: ata16.00: status: { DRDY ERR }
Feb 12 18:54:23 SERVER kernel: ata16.00: error: { UNC }
Feb 12 18:54:23 SERVER kernel: ata16: hard resetting link
Feb 12 18:54:24 SERVER kernel: ata16: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Feb 12 18:54:24 SERVER kernel: ata16.00: configured for UDMA/133
Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] Unhandled sense code
Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:24 SERVER kernel: Result: hostbyte=0x00 driverbyte=0x08
Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:24 SERVER kernel: Sense Key : 0x3 [current] [descriptor]
Feb 12 18:54:24 SERVER kernel: Descriptor sense data with sense descriptors (in hex):
Feb 12 18:54:24 SERVER kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Feb 12 18:54:24 SERVER kernel: 07 d3 d5 50
Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp]
Feb 12 18:54:24 SERVER kernel: ASC=0x11 ASCQ=0x4
Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] CDB:
Feb 12 18:54:24 SERVER kernel: cdb[0]=0x28: 28 00 07 d3 d5 50 00 00 08 00
Feb 12 18:54:24 SERVER kernel: end_request: I/O error, dev sdp, sector 131323216
Feb 12 18:54:24 SERVER kernel: Buffer I/O error on device sdp, logical block 16415402
Feb 12 18:54:24 SERVER kernel: ata16: EH complete

 

 

Link to comment

Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice

 

Thanks for any advice.

 

preclear_start

Disk: /dev/sdb
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F3J6S6
LU WWN Device Id: 5 000c50 06a5ae556
Firmware Version: CC27
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Feb 14 03:30:24 2014 GMT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				was never started.
				Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		(  105) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				No Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 327) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       194593040
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       12
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       419986
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       44
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       12
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   067   045    Old_age   Always       -       29 (Min/Max 17/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       45
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       41h+39m+16.926s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       17581599648
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       27037241817

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

preclear_finish

Disk: /dev/sdb
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1CH166
Serial Number:    W1F3J6S6
LU WWN Device Id: 5 000c50 06a5ae556
Firmware Version: CC27
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Feb 15 19:05:38 2014 GMT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				was never started.
				Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		(  105) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				No Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 327) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x3085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       125433192
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       12
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       792073
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       84
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       12
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   070   067   045    Old_age   Always       -       30 (Min/Max 17/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       46
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       81h+09m+32.314s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       29302666080
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       56755436299

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

preclear_rpt

========================================================================1.14
== invoked as: ./preclear_disk.sh -A -M 3 -c 2 /dev/sdb
== ST3000DM001-1CH166   W1F3J6S6
== Disk /dev/sdb has been successfully precleared
== with a starting sector of 1 
== Ran 2 cycles
==
== Using :Read block size = 8388608 Bytes
== Last Cycle's Pre Read Time  : 5:49:09 (143 MB/s)
== Last Cycle's Zeroing time   : 5:08:45 (161 MB/s)
== Last Cycle's Post Read Time : 11:44:01 (71 MB/s)
== Last Cycle's Total Time     : 16:53:45
==
== Total Elapsed Time 39:35:14
==
== Disk Start Temperature: 29C
==
== Current Disk Temperature: 30C, 
==
============================================================================
** Changed attributes in files: /tmp/smart_start_sdb  /tmp/smart_finish_sdb
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   117     118            6        ok          125433192
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
          High_Fly_Writes =    99     100            0        ok          1
  Airflow_Temperature_Cel =    70      71           45        near_thresh 30
      Temperature_Celsius =    30      29            0        ok          30
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 2.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 2.
0 sectors were pending re-allocation after post-read in cycle 1 of 2.
0 sectors were pending re-allocation after zero of disk in cycle 2 of 2.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change. 
============================================================================

Link to comment

So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started.  The same error came up before the drive reballed before. Any input before i retire this drive?

 

It's hard to conclude too much from just a short syslog excerpt.  It's best to attach the entire syslog, zipped, plus a SMART report for the drive.  There is evidence of a bad sector, plus some other failure, but I'd rather not make any conclusions without seeing the very first error reported, plus the SMART info.

Link to comment

Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds.

 

The important number for those is the VALUE, which for both is 100, as in 100% perfect, can't be any more perfect than that.  It's not that they are close to the threshold, but that the manufacturer has factory set the thresholds to start so close to 100, who knows why.  Your SMART reports for that drive appear to be perfect, nothing to worry about.

Link to comment

Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds.

 

The important number for those is the VALUE, which for both is 100, as in 100% perfect, can't be any more perfect than that.  It's not that they are close to the threshold, but that the manufacturer has factory set the thresholds to start so close to 100, who knows why.  Your SMART reports for that drive appear to be perfect, nothing to worry about.

 

That's good to hear, thanks for the help!

Link to comment
Quote from: nacat78 on February 12, 2014, 05:13:48 PM

 

    So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started.  The same error came up before the drive reballed before. Any input before i retire this drive?

 

 

It's hard to conclude too much from just a short syslog excerpt.  It's best to attach the entire syslog, zipped, plus a SMART report for the drive.  There is evidence of a bad sector, plus some other failure, but I'd rather not make any conclusions without seeing the very first error reported, plus the SMART info.

 

No worries, i put the drive in two other systems on different cards and it starts to work then it redballs again no matter where it is - definitely the drive. thanks for feedback.... its to bad the drive is out of warranty though.... anyway thanks again.

Link to comment

Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice

 

Thanks for any advice.

Your disk is perfectly fine.

 

Some initial values are put purposely close to their affiliated failure threshold by the manufacturer. 

Example: Even a few spin-up-failures (subsequently requiring a re-try) would indicate a mechanical issue with the drive.  Therefore, the failure threshold is very close to the starting value for that parameter  on that disk.

Link to comment

Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice

 

Thanks for any advice.

Your disk is perfectly fine.

 

Some initial values are put purposely close to their affiliated failure threshold by the manufacturer. 

Example: Even a few spin-up-failures (subsequently requiring a re-try) would indicate a mechanical issue with the drive.  Therefore, the failure threshold is very close to the starting value for that parameter  on that disk.

 

That makes sense. Got the drive in my array now, all working fine. Thanks.

Link to comment

My cache drive recently expired so I went out and bought a new one. While pre-clearing the drive, the system spit out a bunch of stuff I can't interpret (I'm not particularly Linux-savvy) and then the system became non-responsive through telnet and the web browser. The system has been doing this non-responsive-at-random crap for a while, but I attributed it to the cache drive being messed up.

 

Anyways, this is what it spit out during the preclear; thoughts?

 

http://pastebin.com/A4UEjje5

Link to comment

Hi, I just added my 4th drive to the array and completed the 3rd round of preclear, can you please let me know if everything is fine with this disk? I am attaching the preclear finish report. Also having a one more issue, this disk is not showing in the device list and hence not able to add to the array....I had the trail version and recently purchased the plus revision...These are the few reference i see in syslog for the new HDD (SDD)

 

Feb 18 08:35:47 Manitower kernel: scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EFRX-68E 80.0 PQ: 0 ANSI: 5 (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] 4096-byte physical blocks (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Write Protect is off (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA (Drive related)
Feb 18 08:35:47 Manitower kernel:  sdd: sdd1 (Drive related)
Feb 18 08:35:47 Manitower kernel:  sdc: sdc1 (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 2:0:0:0: [sdc] Attached SCSI disk (Drive related)
Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Attached SCSI disk (Drive related)

 

[EDIT] Attaching the syslog

preclear_finish_WD-WCC4M1568296_2014-02-18.txt

syslog-2014-02-18.zip

Link to comment

Background: A few months ago, I moved my UnRaid system from an older GIGABYTE GA-EP45-UD3R motherboard w/ a Q6600 processor w/ 4GB ram to an ASRock z77 Extreme4-M w/ Intel Celeron G1610 and 4GB ram to allow me to expand beyond the 16 drive limit I had at the time. 8 drives were using the SATA ports on the Gigabyte motherboard and the other 8 were on a AOC-SAS2LP-MV8 card. On the ASRock, I'm using 2x AOC-SAS2LP-MV8 cards (1 half-full currently) plus 3 motherboard SATA II ports for 15 drives (14 plus the 1 I'm preclearing. I have 3 more awaiting preclear). UnRaid itself works perfectly. I have no issues with multiple reads/writes going on at the same time (often streaming full Bluray rips). Parity checks happen on a monthly schedule and are running around 10 hours currently. I'm not running anything additional except UnMenu for a couple things (like the parity check).

 

In the past (Gigabyte board, onboard + SAS2LP card ports used), I was able to run preclears without any issue at all. I could run a single drive multiple times using -c and I could also create additional screens and do 2, maybe 3 drives at once. I've been running UnRaid ver 5.0-rc16c without any changes at all since it was running on the Gigabyte board. I just moved the stick to the new board.

 

Problem:

With the new board setup, it seems I am only able to run a preclear without the "-c" option or my load avg rises to like 50, the web GUI becomes unresponsive, "Clean Shutdown" commandline script doesn't work (though to be fair, I haven't tried it under normal circumstances yet either) and I end up having to power the server down hard by holding the power button. When it boots, I do the 10 hour parity check and everything is good.... until I decide to try preclear again with -c :).

 

Here is the preclear command I'm using:

root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk

 

For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3. I make sure the parity check is NOT running while doing a preclear.

 

However when I use the following command to only clear the disk once, it has never failed (except when the disk itself actually fails).

 

root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 /dev/sdk

 

I ran this command once 2 days ago, the preclear worked great, unraid remained fully responsive. Then yesterday I added the "-c 3" as mentioned above and after 12 hours, the load (5 min avg?) was 49.89 and all I could do is telnet in. This has happened with 3 different drives thus far, so I don't think it's the drive, port/card or drive cage at fault.

 

So my questions are two fold:

  • Does anyone know why I'm having these issues off the top of their head? I don't think my CPU is as solid as before with the Q6600 but in my mind, I don't use my server for anything other than UnRaid, so I shouldn't need it. Also, The CPU usage seems fine when the problem is happening.
  • What logs would be helpful in troubleshooting this? How should I save them once I telnet in? I'm guessing I copy relevant logs to a folder  on /boot, restart, run parity check and then go grab them from /boot?

 

 

Thanks!

Link to comment

Here is the preclear command I'm using:

root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk

 

For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3.

 

No good ideas here.  Just to eliminate a possibility, can you try re-running with adjusted email options, as the -M 4 option runs some code not used by anything else.  Try -M 3 and lower, and perhaps with no email at all.  Are you receiving the emails correctly, both with and without the -c option?

 

Most likely, Joe L will have to help you, when he has time.

Link to comment

Here is the preclear command I'm using:

root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk

 

For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3.

 

No good ideas here.  Just to eliminate a possibility, can you try re-running with adjusted email options, as the -M 4 option runs some code not used by anything else.  Try -M 3 and lower, and perhaps with no email at all.  Are you receiving the emails correctly, both with and without the -c option?

 

Most likely, Joe L will have to help you, when he has time.

 

Thanks for your response. I do receive the emails until a certain point and then they just stop coming. It had been more than a few hours between emails during a preclear, so I telneted in to investigate and noticed things were a bit slow and the load avg was ~ 50. Then of course I try the gui, but it's dead as well. I think if I caught it earlier, I could probably shutdown cleanly via the gui. I'll have to look into monitoring the load somehow. Shutting down hard like that and the ensuing parity check sucks  :o.

 

My parity check just finished, so I'll start the following preclear command tonight and check on it in the am. I took your suggestion of simplifying the command and also removed the -b -w -r options as well since I'm doing just this one disk. I'll report back in a day or two unless it exhibits problems sooner. My experience has been that it became unresponsive before the first cycle was complete (towards the end though...) so if we move on to the second cycle, that'd be an improvement.

 

root@Tower:/boot# preclear_disk.sh -l
====================================1.13
Disks not assigned to the unRAID array 
  (potential candidates for clearing) 
========================================
     /dev/sdn = ata-Hitachi_HDS722020ALA330_JK1101B8GN2BBZ
root@Tower:/boot# preclear_disk.sh -A -c 3 /dev/sdn


Pre-Clear unRAID Disk /dev/sdn
################################################################## 1.13
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1101B8GN2BBZ
Firmware Version: JKAOA3EA
User Capacity:    2,000,398,934,016 bytes

Disk /dev/sdn: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdn1              64  3907029167  1953514552    0  Empty
Partition 1 does not end on cylinder boundary.
########################################################################
invoked as  ./preclear_disk.sh -A -c 3 /dev/sdn
########################################################################
(-A option elected, partition will start on sector 64)
Are you absolutely sure you want to clear this drive?
(Answer Yes to continue. Capital 'Y', lower case 'es'): Yes

 

Edit: So far so good. Nearly complete with cycle 2. Since the minimal command seems to work for me, I'll leave this until I need to do another preclear and add the -M option back slowly... starting with the lower levels as suggested. Thanks!

 

Edit #2: About 75% through the third of 3 cycles, the video I happened to be watching began to skip. I telneted into UnRaid and saw the load had spiked to 4-5 range. The GUI came up, but a bit slowly. I was able to stop the array properly and re-started it after the load dropped down to ~ 1.8/2.0 (5-10 min) I started the array again (with the preclear still running) and everything was fine from there out (watched another 1-2 hrs of videos after re-starting the array). Once the preclear completed the load declined to it's normal ~.04/.05 range.

 

Does preclear require a stronger CPU for some reason? Should I expect to be able to run preclear without issue on my hardware? My machine is an ASRock z77 Extreme4-M with a Celeron G1610, 4GB ddr3 memory. The majority of my drives (including the one I have been preclearing) are in drive cages connected to 2x Supermicro SAS2LP-MV8 controllers. I do also have a few slots in the cages plugged directly into the motherboard SATA ports. No idea if it would make a difference if I used one of those.

 

Thanks

Link to comment

unRAID v6 Beta 3, lastest preclear script.

(1) 3TB Hard drive installed (SATA MB) to test, no other hard drives installed, so array is not started

 

Low memory utilization until writing zero's

 

At Step 2 of 10 - Copying Zeros to remainder of disk to clear it (97% Done)

 

'Free- l' while at this step above (via screen)

            total       used       free     shared    buffers     cached
Mem:       8170244    7931528     238716          0    7207300     426780
Low:       8170244    7931528     238716
High:            0          0          0
-/+ buffers/cache:     297448    7872796
Swap:            0          0          0

Link to comment

Does preclear require a stronger CPU for some reason? Should I expect to be able to run preclear without issue on my hardware? My machine is an ASRock z77 Extreme4-M with a Celeron G1610, 4GB ddr3 memory. The majority of my drives (including the one I have been preclearing) are in drive cages connected to 2x Supermicro SAS2LP-MV8 controllers. I do also have a few slots in the cages plugged directly into the motherboard SATA ports. No idea if it would make a difference if I used one of those.

 

I've moved the drive to a bay plugged directly to the motherboard (rather than the SAS2LP-MV8). So far so good. Cycle 2 of 3 and no problems at all thus far. It's even emailing me every step of the way like it should.

 

I also found a couple other threads of people reporting difficulty pre-clearing on the SuperMicro card(s). One guy said it resolved when he plugged direct to the motherboard. It's looking like that's the key here. I wonder why it doesn't have any issues during normal use or parity checks but preclear causes problems. I guess maybe preclear is stressing the card harder than a parity check/build does?

 

Edit: Just finished the 2nd of 3 preclear cycles with my original command. Seems to be working just fine plugged into the motherboard. My system even did its monthly parity check last night and everything's still going strong. Seems like preclear and SAS2LP-MV8 just don't mix

Link to comment

Originally (years ago) I purchase many 2 TB drives and precleared them all, slowly as I needed them I added them to the unRAID server, unRAID always cleared them, never picked up the signature set by preclear. I had no idea why at the time and didn't have any more drive to figure it out. It was long ago between pre-clearing them and adding them in.

 

Well I just picked up (2) 3TB drives. I just finished pre-clearing them and added them to my unRAID 5.0.4 production server, unRAID is clearing them (v1.14) once again, it did not pick up the signature. So only thing I can think of is this. Originally when I first started with unRAID I would preclear drives on the same server with the array up and running. Stopping the array and adding the newly pre-cleared drive(s) worked (signature detected). Once I purchased all those other 2 TB drive to have on hand and now these latest (2) 3 TB drives

I have been pre-clearing them from another server. Once the preclear finished I would shut that server down, followed by shutting down the production server, moving the drives to the production server. Powering up the production server and adding the disks to the array. And the signature is not read.

 

So I do know how this all works but it almost seems like running a preclear on a different server than where they will be added to the array is not working... My understanding is the signature written to disk is not hardware specific.. so lost as to why it is not working. The desktop I preclear from is an old AMD desktop MB (SATA MB ports), the prod server is an Intel Server MB (SAS controllers) with Xeon proc.

 

 

Link to comment

Originally (years ago) I purchase many 2 TB drives and precleared them all, slowly as I needed them I added them to the unRAID server, unRAID always cleared them, never picked up the signature set by preclear. I had no idea why at the time and didn't have any more drive to figure it out. It was long ago between pre-clearing them and adding them in.

 

Well I just picked up (2) 3TB drives. I just finished pre-clearing them and added them to my unRAID 5.0.4 production server, unRAID is clearing them (v1.14) once again, it did not pick up the signature. So only thing I can think of is this. Originally when I first started with unRAID I would preclear drives on the same server with the array up and running. Stopping the array and adding the newly pre-cleared drive(s) worked (signature detected). Once I purchased all those other 2 TB drive to have on hand and now these latest (2) 3 TB drives

I have been pre-clearing them from another server. Once the preclear finished I would shut that server down, followed by shutting down the production server, moving the drives to the production server. Powering up the production server and adding the disks to the array. And the signature is not read.

 

So I do know how this all works but it almost seems like running a preclear on a different server than where they will be added to the array is not working... My understanding is the signature written to disk is not hardware specific.. so lost as to why it is not working. The desktop I preclear from is an old AMD desktop MB (SATA MB ports), the prod server is an Intel Server MB (SAS controllers) with Xeon proc.

Assigning a drive to a server,even just briefly, will change the preclear signature to where it will no longer be recognized if you un-assign the drive and re-assign it at a later time to the same or a different server.

 

You can use

preclear_disk.sh -t /dev/sdX

to test if a disk has a current/correct preclear signature.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.