Joe L. Posted February 7, 2014 Share Posted February 7, 2014 It shows there was probably nothing wrong with the drive at all. If the drive was at fault (with unreadable sectors)you would have had sectors pending re-allocation, or already re-allocated at the START of the preclear. Without a syslog showing the specific errors, nobody can even tell why the read-errors occurred,. Regardless, nobody can guess if the drive is safe to re-use. The preclear report looks perfectly fine. Quote Link to comment
nacat78 Posted February 13, 2014 Share Posted February 13, 2014 So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started. The same error came up before the drive reballed before. Any input before i retire this drive? Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:20 SERVER kernel: Result: hostbyte=0x00 driverbyte=0x08 Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:20 SERVER kernel: Sense Key : 0x3 [current] [descriptor] Feb 12 18:54:20 SERVER kernel: Descriptor sense data with sense descriptors (in hex): Feb 12 18:54:20 SERVER kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Feb 12 18:54:20 SERVER kernel: 07 d3 d5 50 Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:20 SERVER kernel: ASC=0x11 ASCQ=0x4 Feb 12 18:54:20 SERVER kernel: sd 17:0:0:0: [sdp] CDB: Feb 12 18:54:20 SERVER kernel: cdb[0]=0x28: 28 00 07 d3 d5 50 00 00 08 00 Feb 12 18:54:20 SERVER kernel: end_request: I/O error, dev sdp, sector 131323216 Feb 12 18:54:20 SERVER kernel: Buffer I/O error on device sdp, logical block 16415402 Feb 12 18:54:20 SERVER kernel: ata16: EH complete Feb 12 18:54:23 SERVER kernel: ata16: failed to read log page 10h (errno=-5) Feb 12 18:54:23 SERVER kernel: ata16.00: exception Emask 0x1 SAct 0x1 SErr 0x0 action 0x6 Feb 12 18:54:23 SERVER kernel: ata16.00: edma_err_cause=00000084 pp_flags=00000003, dev error, EDMA self-disable Feb 12 18:54:23 SERVER kernel: ata16.00: failed command: READ FPDMA QUEUED Feb 12 18:54:23 SERVER kernel: ata16.00: cmd 60/08:00:50:d5:d3/00:00:07:00:00/40 tag 0 ncq 4096 in Feb 12 18:54:23 SERVER kernel: res 41/40:04:50:d5:d3/40:00:07:00:00/40 Emask 0x9 (media error) Feb 12 18:54:23 SERVER kernel: ata16.00: status: { DRDY ERR } Feb 12 18:54:23 SERVER kernel: ata16.00: error: { UNC } Feb 12 18:54:23 SERVER kernel: ata16: hard resetting link Feb 12 18:54:24 SERVER kernel: ata16: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Feb 12 18:54:24 SERVER kernel: ata16.00: configured for UDMA/133 Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] Unhandled sense code Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:24 SERVER kernel: Result: hostbyte=0x00 driverbyte=0x08 Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:24 SERVER kernel: Sense Key : 0x3 [current] [descriptor] Feb 12 18:54:24 SERVER kernel: Descriptor sense data with sense descriptors (in hex): Feb 12 18:54:24 SERVER kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Feb 12 18:54:24 SERVER kernel: 07 d3 d5 50 Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] Feb 12 18:54:24 SERVER kernel: ASC=0x11 ASCQ=0x4 Feb 12 18:54:24 SERVER kernel: sd 17:0:0:0: [sdp] CDB: Feb 12 18:54:24 SERVER kernel: cdb[0]=0x28: 28 00 07 d3 d5 50 00 00 08 00 Feb 12 18:54:24 SERVER kernel: end_request: I/O error, dev sdp, sector 131323216 Feb 12 18:54:24 SERVER kernel: Buffer I/O error on device sdp, logical block 16415402 Feb 12 18:54:24 SERVER kernel: ata16: EH complete Quote Link to comment
Moussa Posted February 16, 2014 Share Posted February 16, 2014 Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice Thanks for any advice. preclear_start Disk: /dev/sdb smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1CH166 Serial Number: W1F3J6S6 LU WWN Device Id: 5 000c50 06a5ae556 Firmware Version: CC27 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Fri Feb 14 03:30:24 2014 GMT ==> WARNING: A firmware update for this drive may be available, see the following Seagate web pages: http://knowledge.seagate.com/articles/en_US/FAQ/207931en http://knowledge.seagate.com/articles/en_US/FAQ/223651en SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 105) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 327) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 194593040 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 419986 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 44 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 12 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 067 045 Old_age Always - 29 (Min/Max 17/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 10 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 45 194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 16 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 41h+39m+16.926s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 17581599648 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 27037241817 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. preclear_finish Disk: /dev/sdb smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.14 (AF) Device Model: ST3000DM001-1CH166 Serial Number: W1F3J6S6 LU WWN Device Id: 5 000c50 06a5ae556 Firmware Version: CC27 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Feb 15 19:05:38 2014 GMT ==> WARNING: A firmware update for this drive may be available, see the following Seagate web pages: http://knowledge.seagate.com/articles/en_US/FAQ/207931en http://knowledge.seagate.com/articles/en_US/FAQ/223651en SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 105) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 327) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x3085) SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 125433192 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 12 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 792073 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 84 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 12 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 070 067 045 Old_age Always - 30 (Min/Max 17/33) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 10 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 46 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 16 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 81h+09m+32.314s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 29302666080 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 56755436299 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. preclear_rpt ========================================================================1.14 == invoked as: ./preclear_disk.sh -A -M 3 -c 2 /dev/sdb == ST3000DM001-1CH166 W1F3J6S6 == Disk /dev/sdb has been successfully precleared == with a starting sector of 1 == Ran 2 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 5:49:09 (143 MB/s) == Last Cycle's Zeroing time : 5:08:45 (161 MB/s) == Last Cycle's Post Read Time : 11:44:01 (71 MB/s) == Last Cycle's Total Time : 16:53:45 == == Total Elapsed Time 39:35:14 == == Disk Start Temperature: 29C == == Current Disk Temperature: 30C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdb /tmp/smart_finish_sdb ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Raw_Read_Error_Rate = 117 118 6 ok 125433192 Spin_Retry_Count = 100 100 97 near_thresh 0 End-to-End_Error = 100 100 99 near_thresh 0 High_Fly_Writes = 99 100 0 ok 1 Airflow_Temperature_Cel = 70 71 45 near_thresh 30 Temperature_Celsius = 30 29 0 ok 30 No SMART attributes are FAILING_NOW 0 sectors were pending re-allocation before the start of the preclear. 0 sectors were pending re-allocation after pre-read in cycle 1 of 2. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 2. 0 sectors were pending re-allocation after post-read in cycle 1 of 2. 0 sectors were pending re-allocation after zero of disk in cycle 2 of 2. 0 sectors are pending re-allocation at the end of the preclear, the number of sectors pending re-allocation did not change. 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. ============================================================================ Quote Link to comment
RobJ Posted February 16, 2014 Share Posted February 16, 2014 So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started. The same error came up before the drive reballed before. Any input before i retire this drive? It's hard to conclude too much from just a short syslog excerpt. It's best to attach the entire syslog, zipped, plus a SMART report for the drive. There is evidence of a bad sector, plus some other failure, but I'd rather not make any conclusions without seeing the very first error reported, plus the SMART info. Quote Link to comment
RobJ Posted February 16, 2014 Share Posted February 16, 2014 Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. The important number for those is the VALUE, which for both is 100, as in 100% perfect, can't be any more perfect than that. It's not that they are close to the threshold, but that the manufacturer has factory set the thresholds to start so close to 100, who knows why. Your SMART reports for that drive appear to be perfect, nothing to worry about. Quote Link to comment
Moussa Posted February 16, 2014 Share Posted February 16, 2014 Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. The important number for those is the VALUE, which for both is 100, as in 100% perfect, can't be any more perfect than that. It's not that they are close to the threshold, but that the manufacturer has factory set the thresholds to start so close to 100, who knows why. Your SMART reports for that drive appear to be perfect, nothing to worry about. That's good to hear, thanks for the help! Quote Link to comment
nacat78 Posted February 16, 2014 Share Posted February 16, 2014 Quote from: nacat78 on February 12, 2014, 05:13:48 PM So I had a 2TB disk redball on me, reseated drive in 5 in 3 cage and cable on the back. I was able to rebuild using same drive, but then after about 30 minutes on the array after rebuild it redballed again. Anyway i have the drive replaced, but before i trash the old one, i wanted to confirm that its no longer usable for unraid. I am still in the process or going through a preclear on the drive, but syslog kicked this out about 30 min after preclear started. The same error came up before the drive reballed before. Any input before i retire this drive? It's hard to conclude too much from just a short syslog excerpt. It's best to attach the entire syslog, zipped, plus a SMART report for the drive. There is evidence of a bad sector, plus some other failure, but I'd rather not make any conclusions without seeing the very first error reported, plus the SMART info. No worries, i put the drive in two other systems on different cards and it starts to work then it redballs again no matter where it is - definitely the drive. thanks for feedback.... its to bad the drive is out of warranty though.... anyway thanks again. Quote Link to comment
Joe L. Posted February 16, 2014 Share Posted February 16, 2014 Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice Thanks for any advice. Your disk is perfectly fine. Some initial values are put purposely close to their affiliated failure threshold by the manufacturer. Example: Even a few spin-up-failures (subsequently requiring a re-try) would indicate a mechanical issue with the drive. Therefore, the failure threshold is very close to the starting value for that parameter on that disk. Quote Link to comment
Moussa Posted February 16, 2014 Share Posted February 16, 2014 Hi guys. This is my first time building an unRAID server and I just wanted to make sure that I'm correct in thinking that this preclear report is A-OK. The only things that have me worried are the Spin_Retry_Count and the End-to-End_Error values because they seem to be near the thresholds. Thanks for any advice Thanks for any advice. Your disk is perfectly fine. Some initial values are put purposely close to their affiliated failure threshold by the manufacturer. Example: Even a few spin-up-failures (subsequently requiring a re-try) would indicate a mechanical issue with the drive. Therefore, the failure threshold is very close to the starting value for that parameter on that disk. That makes sense. Got the drive in my array now, all working fine. Thanks. Quote Link to comment
Croaker Posted February 16, 2014 Share Posted February 16, 2014 My cache drive recently expired so I went out and bought a new one. While pre-clearing the drive, the system spit out a bunch of stuff I can't interpret (I'm not particularly Linux-savvy) and then the system became non-responsive through telnet and the web browser. The system has been doing this non-responsive-at-random crap for a while, but I attributed it to the cache drive being messed up. Anyways, this is what it spit out during the preclear; thoughts? http://pastebin.com/A4UEjje5 Quote Link to comment
Kode Posted February 16, 2014 Share Posted February 16, 2014 Is the drive connected directly to the motherboard, or through something like an AOC-SASLP-MV8 disk controller? Quote Link to comment
Croaker Posted February 16, 2014 Share Posted February 16, 2014 All drives are connected directly to the board. Quote Link to comment
manny Posted February 18, 2014 Share Posted February 18, 2014 Hi, I just added my 4th drive to the array and completed the 3rd round of preclear, can you please let me know if everything is fine with this disk? I am attaching the preclear finish report. Also having a one more issue, this disk is not showing in the device list and hence not able to add to the array....I had the trail version and recently purchased the plus revision...These are the few reference i see in syslog for the new HDD (SDD) Feb 18 08:35:47 Manitower kernel: scsi 4:0:0:0: Direct-Access ATA WDC WD20EFRX-68E 80.0 PQ: 0 ANSI: 5 (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] 4096-byte physical blocks (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Write Protect is off (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA (Drive related) Feb 18 08:35:47 Manitower kernel: sdd: sdd1 (Drive related) Feb 18 08:35:47 Manitower kernel: sdc: sdc1 (Drive related) Feb 18 08:35:47 Manitower kernel: sd 2:0:0:0: [sdc] Attached SCSI disk (Drive related) Feb 18 08:35:47 Manitower kernel: sd 4:0:0:0: [sdd] Attached SCSI disk (Drive related) [EDIT] Attaching the syslog preclear_finish_WD-WCC4M1568296_2014-02-18.txt syslog-2014-02-18.zip Quote Link to comment
manny Posted February 18, 2014 Share Posted February 18, 2014 This problem is resolved, for some reason Simple Feature was giving some issues. I uninstalled it and now I am able to add the disk to the array. Please check the preclear report and let me know if everything is fine. Also i upgraded to 5.0.5 and Dynamix... Quote Link to comment
BrianAz Posted February 24, 2014 Share Posted February 24, 2014 Background: A few months ago, I moved my UnRaid system from an older GIGABYTE GA-EP45-UD3R motherboard w/ a Q6600 processor w/ 4GB ram to an ASRock z77 Extreme4-M w/ Intel Celeron G1610 and 4GB ram to allow me to expand beyond the 16 drive limit I had at the time. 8 drives were using the SATA ports on the Gigabyte motherboard and the other 8 were on a AOC-SAS2LP-MV8 card. On the ASRock, I'm using 2x AOC-SAS2LP-MV8 cards (1 half-full currently) plus 3 motherboard SATA II ports for 15 drives (14 plus the 1 I'm preclearing. I have 3 more awaiting preclear). UnRaid itself works perfectly. I have no issues with multiple reads/writes going on at the same time (often streaming full Bluray rips). Parity checks happen on a monthly schedule and are running around 10 hours currently. I'm not running anything additional except UnMenu for a couple things (like the parity check). In the past (Gigabyte board, onboard + SAS2LP card ports used), I was able to run preclears without any issue at all. I could run a single drive multiple times using -c and I could also create additional screens and do 2, maybe 3 drives at once. I've been running UnRaid ver 5.0-rc16c without any changes at all since it was running on the Gigabyte board. I just moved the stick to the new board. Problem: With the new board setup, it seems I am only able to run a preclear without the "-c" option or my load avg rises to like 50, the web GUI becomes unresponsive, "Clean Shutdown" commandline script doesn't work (though to be fair, I haven't tried it under normal circumstances yet either) and I end up having to power the server down hard by holding the power button. When it boots, I do the 10 hour parity check and everything is good.... until I decide to try preclear again with -c . Here is the preclear command I'm using: root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3. I make sure the parity check is NOT running while doing a preclear. However when I use the following command to only clear the disk once, it has never failed (except when the disk itself actually fails). root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 /dev/sdk I ran this command once 2 days ago, the preclear worked great, unraid remained fully responsive. Then yesterday I added the "-c 3" as mentioned above and after 12 hours, the load (5 min avg?) was 49.89 and all I could do is telnet in. This has happened with 3 different drives thus far, so I don't think it's the drive, port/card or drive cage at fault. So my questions are two fold: Does anyone know why I'm having these issues off the top of their head? I don't think my CPU is as solid as before with the Q6600 but in my mind, I don't use my server for anything other than UnRaid, so I shouldn't need it. Also, The CPU usage seems fine when the problem is happening. What logs would be helpful in troubleshooting this? How should I save them once I telnet in? I'm guessing I copy relevant logs to a folder on /boot, restart, run parity check and then go grab them from /boot? Thanks! Quote Link to comment
RobJ Posted February 24, 2014 Share Posted February 24, 2014 Please check the preclear report and let me know if everything is fine. Drive couldn't be more perfect (but you probably knew that, right?) Quote Link to comment
RobJ Posted February 24, 2014 Share Posted February 24, 2014 Here is the preclear command I'm using: root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3. No good ideas here. Just to eliminate a possibility, can you try re-running with adjusted email options, as the -M 4 option runs some code not used by anything else. Try -M 3 and lower, and perhaps with no email at all. Are you receiving the emails correctly, both with and without the -c option? Most likely, Joe L will have to help you, when he has time. Quote Link to comment
BrianAz Posted February 25, 2014 Share Posted February 25, 2014 Here is the preclear command I'm using: root@Tower:/boot# preclear_disk.sh -r 65536 -w 65536 -b 2000 -A -M 4 -c 3 /dev/sdk For some reason, this seems to cause the load to skyrocket after a while. When it happened last time, I was 12 hours in and 75% through Zeroing the disk on cycle 1 of 3. No good ideas here. Just to eliminate a possibility, can you try re-running with adjusted email options, as the -M 4 option runs some code not used by anything else. Try -M 3 and lower, and perhaps with no email at all. Are you receiving the emails correctly, both with and without the -c option? Most likely, Joe L will have to help you, when he has time. Thanks for your response. I do receive the emails until a certain point and then they just stop coming. It had been more than a few hours between emails during a preclear, so I telneted in to investigate and noticed things were a bit slow and the load avg was ~ 50. Then of course I try the gui, but it's dead as well. I think if I caught it earlier, I could probably shutdown cleanly via the gui. I'll have to look into monitoring the load somehow. Shutting down hard like that and the ensuing parity check sucks . My parity check just finished, so I'll start the following preclear command tonight and check on it in the am. I took your suggestion of simplifying the command and also removed the -b -w -r options as well since I'm doing just this one disk. I'll report back in a day or two unless it exhibits problems sooner. My experience has been that it became unresponsive before the first cycle was complete (towards the end though...) so if we move on to the second cycle, that'd be an improvement. root@Tower:/boot# preclear_disk.sh -l ====================================1.13 Disks not assigned to the unRAID array (potential candidates for clearing) ======================================== /dev/sdn = ata-Hitachi_HDS722020ALA330_JK1101B8GN2BBZ root@Tower:/boot# preclear_disk.sh -A -c 3 /dev/sdn Pre-Clear unRAID Disk /dev/sdn ################################################################## 1.13 Model Family: Hitachi Deskstar 7K2000 Device Model: Hitachi HDS722020ALA330 Serial Number: JK1101B8GN2BBZ Firmware Version: JKAOA3EA User Capacity: 2,000,398,934,016 bytes Disk /dev/sdn: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdn1 64 3907029167 1953514552 0 Empty Partition 1 does not end on cylinder boundary. ######################################################################## invoked as ./preclear_disk.sh -A -c 3 /dev/sdn ######################################################################## (-A option elected, partition will start on sector 64) Are you absolutely sure you want to clear this drive? (Answer Yes to continue. Capital 'Y', lower case 'es'): Yes Edit: So far so good. Nearly complete with cycle 2. Since the minimal command seems to work for me, I'll leave this until I need to do another preclear and add the -M option back slowly... starting with the lower levels as suggested. Thanks! Edit #2: About 75% through the third of 3 cycles, the video I happened to be watching began to skip. I telneted into UnRaid and saw the load had spiked to 4-5 range. The GUI came up, but a bit slowly. I was able to stop the array properly and re-started it after the load dropped down to ~ 1.8/2.0 (5-10 min) I started the array again (with the preclear still running) and everything was fine from there out (watched another 1-2 hrs of videos after re-starting the array). Once the preclear completed the load declined to it's normal ~.04/.05 range. Does preclear require a stronger CPU for some reason? Should I expect to be able to run preclear without issue on my hardware? My machine is an ASRock z77 Extreme4-M with a Celeron G1610, 4GB ddr3 memory. The majority of my drives (including the one I have been preclearing) are in drive cages connected to 2x Supermicro SAS2LP-MV8 controllers. I do also have a few slots in the cages plugged directly into the motherboard SATA ports. No idea if it would make a difference if I used one of those. Thanks Quote Link to comment
madburg Posted February 27, 2014 Share Posted February 27, 2014 unRAID v6 Beta 3, lastest preclear script. (1) 3TB Hard drive installed (SATA MB) to test, no other hard drives installed, so array is not started Low memory utilization until writing zero's At Step 2 of 10 - Copying Zeros to remainder of disk to clear it (97% Done) 'Free- l' while at this step above (via screen) total used free shared buffers cached Mem: 8170244 7931528 238716 0 7207300 426780 Low: 8170244 7931528 238716 High: 0 0 0 -/+ buffers/cache: 297448 7872796 Swap: 0 0 0 Quote Link to comment
BrianAz Posted March 1, 2014 Share Posted March 1, 2014 Does preclear require a stronger CPU for some reason? Should I expect to be able to run preclear without issue on my hardware? My machine is an ASRock z77 Extreme4-M with a Celeron G1610, 4GB ddr3 memory. The majority of my drives (including the one I have been preclearing) are in drive cages connected to 2x Supermicro SAS2LP-MV8 controllers. I do also have a few slots in the cages plugged directly into the motherboard SATA ports. No idea if it would make a difference if I used one of those. I've moved the drive to a bay plugged directly to the motherboard (rather than the SAS2LP-MV8). So far so good. Cycle 2 of 3 and no problems at all thus far. It's even emailing me every step of the way like it should. I also found a couple other threads of people reporting difficulty pre-clearing on the SuperMicro card(s). One guy said it resolved when he plugged direct to the motherboard. It's looking like that's the key here. I wonder why it doesn't have any issues during normal use or parity checks but preclear causes problems. I guess maybe preclear is stressing the card harder than a parity check/build does? Edit: Just finished the 2nd of 3 preclear cycles with my original command. Seems to be working just fine plugged into the motherboard. My system even did its monthly parity check last night and everything's still going strong. Seems like preclear and SAS2LP-MV8 just don't mix Quote Link to comment
madburg Posted March 2, 2014 Share Posted March 2, 2014 Originally (years ago) I purchase many 2 TB drives and precleared them all, slowly as I needed them I added them to the unRAID server, unRAID always cleared them, never picked up the signature set by preclear. I had no idea why at the time and didn't have any more drive to figure it out. It was long ago between pre-clearing them and adding them in. Well I just picked up (2) 3TB drives. I just finished pre-clearing them and added them to my unRAID 5.0.4 production server, unRAID is clearing them (v1.14) once again, it did not pick up the signature. So only thing I can think of is this. Originally when I first started with unRAID I would preclear drives on the same server with the array up and running. Stopping the array and adding the newly pre-cleared drive(s) worked (signature detected). Once I purchased all those other 2 TB drive to have on hand and now these latest (2) 3 TB drives I have been pre-clearing them from another server. Once the preclear finished I would shut that server down, followed by shutting down the production server, moving the drives to the production server. Powering up the production server and adding the disks to the array. And the signature is not read. So I do know how this all works but it almost seems like running a preclear on a different server than where they will be added to the array is not working... My understanding is the signature written to disk is not hardware specific.. so lost as to why it is not working. The desktop I preclear from is an old AMD desktop MB (SATA MB ports), the prod server is an Intel Server MB (SAS controllers) with Xeon proc. Quote Link to comment
Joe L. Posted March 3, 2014 Share Posted March 3, 2014 Originally (years ago) I purchase many 2 TB drives and precleared them all, slowly as I needed them I added them to the unRAID server, unRAID always cleared them, never picked up the signature set by preclear. I had no idea why at the time and didn't have any more drive to figure it out. It was long ago between pre-clearing them and adding them in. Well I just picked up (2) 3TB drives. I just finished pre-clearing them and added them to my unRAID 5.0.4 production server, unRAID is clearing them (v1.14) once again, it did not pick up the signature. So only thing I can think of is this. Originally when I first started with unRAID I would preclear drives on the same server with the array up and running. Stopping the array and adding the newly pre-cleared drive(s) worked (signature detected). Once I purchased all those other 2 TB drive to have on hand and now these latest (2) 3 TB drives I have been pre-clearing them from another server. Once the preclear finished I would shut that server down, followed by shutting down the production server, moving the drives to the production server. Powering up the production server and adding the disks to the array. And the signature is not read. So I do know how this all works but it almost seems like running a preclear on a different server than where they will be added to the array is not working... My understanding is the signature written to disk is not hardware specific.. so lost as to why it is not working. The desktop I preclear from is an old AMD desktop MB (SATA MB ports), the prod server is an Intel Server MB (SAS controllers) with Xeon proc. Assigning a drive to a server,even just briefly, will change the preclear signature to where it will no longer be recognized if you un-assign the drive and re-assign it at a later time to the same or a different server. You can use preclear_disk.sh -t /dev/sdX to test if a disk has a current/correct preclear signature. Quote Link to comment
vjmcdonnell Posted March 3, 2014 Share Posted March 3, 2014 I had an issue with the preclear not working last time. Has it worked this time. Thanks syslog-2014-03-03.txt Quote Link to comment
Joe L. Posted March 3, 2014 Share Posted March 3, 2014 I had an issue with the preclear not working last time. Has it worked this time. Thanks Looks good this time. Quote Link to comment
vjmcdonnell Posted March 4, 2014 Share Posted March 4, 2014 Many thanks Joe I had an issue with the preclear not working last time. Has it worked this time. Thanks Looks good this time. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.