RobJ Posted February 13, 2013 Share Posted February 13, 2013 These increases in Current Pending Sectors *after* a Preclear are not a good sign! I would follow Joe's advice in this post. Quote Link to comment
JustinChase Posted February 13, 2013 Share Posted February 13, 2013 Okay, thanks. Since I don't really know that much about the details of the workings of hard drives, nor preclear, I'm still a bit confused why they are still pending. If preclear has identified them as needing to be re-allocated, why weren't they re-allocated? Also, the post you linked me to says that it might be a bad PSU. Those 2 drives bring me up to 10 drives total in this system, which is using a 550W power supply (XFX - Core Edition PRO550W - 80 PLUS BRONZE Certified Active PFC) I'm not positive it that is enough power for 10 drives (nor am I sure how to calculate that). Quote Link to comment
Joe L. Posted February 13, 2013 Share Posted February 13, 2013 It is not "preclear" that identified the sectors, it is the SMART firmware during the post-read phase that did. They are now pending re-allocation when next written. If they had been identified in the pre-read phase, they would have been re-alocated when written with zeros in the "write" phase. Sectors pending re-allocation after a preclear are not a great sign. It indicates the drive should be cleared once more and if the sectors are not re-allocated, an RMA is as likely as anything in the future. There is a possibility that the power supply cannot keep up with the drive's demands during the "writing" phase, in which case, a replacement drive could potentially work the same. Your power supply is a single rail supply rated at 44Amps. It should be plenty powerful. However, if you have lots of splitters in between it and the drives, you might have poor voltages at the drives. Joe L. Quote Link to comment
axeman Posted February 16, 2013 Share Posted February 16, 2013 Can anyone shed some light on what's happening here? background: Unraid 5-rc; i have simplefeatures plugin installed; Array is not started. I am using the latest version of the script. these are those damn western digital EARX drives. (I got the retail version from Bestbuy). I did not use the wdile3 utility on these; they are connected to a Br10i and are passed through to the UnRaid VM I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected). My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method. Also have a parity check going on my production UnRaid VM, which is also moving as expected. Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone? Quote Link to comment
Joe L. Posted February 16, 2013 Share Posted February 16, 2013 Can anyone shed some light on what's happening here? background: Unraid 5-rc; i have simplefeatures plugin installed; Array is not started. I am using the latest version of the script. these are those damn western digital EARX drives. (I got the retail version from Bestbuy). I did not use the wdile3 utility on these; they are connected to a Br10i and are passed through to the UnRaid VM I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected). My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method. Also have a parity check going on my production UnRaid VM, which is also moving as expected. Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone? You are probably experiencing resource contention of some kind. They are probably each waiting on some resource the other has. Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing. Joe L. Quote Link to comment
JustinChase Posted February 16, 2013 Share Posted February 16, 2013 It is not "preclear" that identified the sectors, it is the SMART firmware during the post-read phase that did. They are now pending re-allocation when next written. If they had been identified in the pre-read phase, they would have been re-alocated when written with zeros in the "write" phase. Sectors pending re-allocation after a preclear are not a great sign. It indicates the drive should be cleared once more and if the sectors are not re-allocated, an RMA is as likely as anything in the future. There is a possibility that the power supply cannot keep up with the drive's demands during the "writing" phase, in which case, a replacement drive could potentially work the same. Your power supply is a single rail supply rated at 44Amps. It should be plenty powerful. However, if you have lots of splitters in between it and the drives, you might have poor voltages at the drives. Joe L. I don't have 'lots of splitters', but I do have a couple of drives connected with an old style power connector adaptor to a new style SATA power connector. I honestly can't remember if this drive is connected with such an adaptor, and will have to take the server apart to find out for sure. The server has 8 120mm fans connected to the power supply (5 to one power connector, and 3 connected to another). if that matters. The server runs very cool, so I can disconnect at least 3 of the fans without issue, i'm sure. I stopped SABnzbd from running while I ran the preclear on only on drive, so preclear should have been the only thing running on this server all night. Below are the results, which show sectors still needing re-allocation. this drive is a few years old, and had served as my cache drive for the last couple of years. It is out of warranty so there is no RMA available. so, do I throw the drive away, even though is hasn't actually failed, or preclear again, or put it into the array, and be aware that it's likely to fail in the near future, and get a replacement ordered and precleared to be ready the day it does fail on me??? == invoked as: ./preclear_disk.sh /dev/sdk == SAMSUNG HD103UJ S13PJDWS337885 == Disk /dev/sdk has been successfully precleared == with a starting sector of 64 == Ran 1 cycle == == Using :Read block size = 8225280 Bytes == Last Cycle's Pre Read Time : 3:13:07 (86 MB/s) == Last Cycle's Zeroing time : 2:58:54 (93 MB/s) == Last Cycle's Post Read Time : 7:15:07 (38 MB/s) == Last Cycle's Total Time : 13:28:07 == == Total Elapsed Time 13:28:07 == == Disk Start Temperature: 26C == == Current Disk Temperature: 25C, == ============================================================================ ** Changed attributes in files: /tmp/smart_start_sdk /tmp/smart_finish_sdk ATTRIBUTE NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS RAW_VALUE Airflow_Temperature_Cel = 75 74 0 ok 25 Temperature_Celsius = 76 74 0 ok 24 No SMART attributes are FAILING_NOW [b]10 sectors were pending re-allocation before the start of the preclear.[/b] 11 sectors were pending re-allocation after pre-read in cycle 1 of 1. 0 sectors were pending re-allocation after zero of disk in cycle 1 of 1. [b]9 sectors are pending re-allocation at the end of the preclear, a change of -1 in the number of sectors pending re-allocation.[/b] 0 sectors had been re-allocated before the start of the preclear. 0 sectors are re-allocated at the end of the preclear, the number of sectors re-allocated did not change. Quote Link to comment
Joe L. Posted February 16, 2013 Share Posted February 16, 2013 I'd run a non-destructive read/write badblocks cycle on it. This will take many many many hours... It will read and then re-write every sector. badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svn /dev/sdk If anything will get it to settle down, it will. If you are absolutely certain of the device name of the disk AND IT IS NOT ASSIGNED TO YOUR ARRAY OR HOLD ANY DATA YOU WISH TO KEEP you can run the 4 pass badblocks write test on the disk. It will erase everything on the disk, including the preclear signature. This is a even longer test... (probably 80 hours or more on a 2TB drive) You will need to leave the telnet session open for this duration. (or you can run this under "screen", or on the system console) badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svw /dev/sdk Be absolutely certain you have the correct device name. It has no "are you sure" to prevent you from erasing the wrong drive. Joe L. Quote Link to comment
JustinChase Posted February 16, 2013 Share Posted February 16, 2013 thanks Joe. Which would you run? you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use? Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend? badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svn /dev/sdk or badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svw /dev/sdk Quote Link to comment
Joe L. Posted February 16, 2013 Share Posted February 16, 2013 thanks Joe. Which would you run? you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use? Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend? The "svn" (non destructive) test will not destroy the preclear signature and will take less time (I think) The "svw" (write four values, ending with all zeros) will take longer, but is a more through test. It will need a subsequent preclear if you intend to add it as an additional disk in the array. If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary. If you have the time, this is the one I would perform. Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read phase to save a bit of time since the badblocks would have just read all the sectors. To run only the writing of zeros and post-read-verify phases (skipping the pre--read) preclear_disk.sh -W -A /dev/sdk Quote Link to comment
JustinChase Posted February 16, 2013 Share Posted February 16, 2013 thanks Joe. Which would you run? you gave 2 commands, so I'm not sure which is 'most likely' to give me a usable drive, or determine that it's definitely not worth putting to use? Since it sounds like it's gonna take a couple of days, plus another preclear cycle, I'd really only want to do it once, so which one do you recommend? The "svn" (non destructive) test will not destroy the preclear signature and will take less time (I think) The "svw" (write four values, ending with all zeros) will take longer, but is a more through test. It will need a subsequent preclear if you intend to add it as an additional disk in the array. If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary. If you have the time, this is the one I would perform. Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read phase to save a bit of time since the badblocks would have just read all the sectors. To run only the writing of zeros and post-read-verify phases (skipping the pre--read) preclear_disk.sh -W -A /dev/sdk Thanks again Joe! i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot! Since it will take longer to order a new drive than to run any tests, I'm going to run this one now... badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svw /dev/sdk then I'll run this when it finishes (unless the drive explodes because of the first one ) preclear_disk.sh -W -A /dev/sdk Quote Link to comment
axeman Posted February 16, 2013 Share Posted February 16, 2013 Can anyone shed some light on what's happening here? background: Unraid 5-rc; i have simplefeatures plugin installed; Array is not started. I am using the latest version of the script. these are those damn western digital EARX drives. (I got the retail version from Bestbuy). I did not use the wdile3 utility on these; they are connected to a Br10i and are passed through to the UnRaid VM I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected). My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method. Also have a parity check going on my production UnRaid VM, which is also moving as expected. Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone? You are probably experiencing resource contention of some kind. They are probably each waiting on some resource the other has. Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing. Joe L. Thanks - no i didn't look - actually didn't even think to attach the log, thinking that since the array isn't on, what could the log show.. sorry that was nooby of me. anyhoo - after a little while, it took off agian, and is in the 90MB/s range - i guess knowing it was an EARX, i paniced too quickly. chugging along now. the parity check also finished on my other VM. Glad that the machine didn't croak with all these disks chugging at the same time. *whew*. Quote Link to comment
RobJ Posted February 17, 2013 Share Posted February 17, 2013 Can anyone shed some light on what's happening here? background: Unraid 5-rc; i have simplefeatures plugin installed; Array is not started. I am using the latest version of the script. these are those damn western digital EARX drives. (I got the retail version from Bestbuy). I did not use the wdile3 utility on these; they are connected to a Br10i and are passed through to the UnRaid VM I started a pre_clear of 4 drives at once; two are what you see here, and the other two are Samsung Spinpoint 1TB drives (which seem to be progressing as expected). My prod UnRaid, i've pre_cleared x3 drives at a time, (WDEARS, with idle set to max) - they were direct attached to mobo and given to unraid via the RDM method. Also have a parity check going on my production UnRaid VM, which is also moving as expected. Am I experiencing a dual failure here? Should I wait till the other two are finished, and try to run these, alone? You are probably experiencing resource contention of some kind. They are probably each waiting on some resource the other has. Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing. Joe L. Thanks - no i didn't look - actually didn't even think to attach the log, thinking that since the array isn't on, what could the log show.. sorry that was nooby of me. anyhoo - after a little while, it took off agian, and is in the 90MB/s range - i guess knowing it was an EARX, i paniced too quickly. chugging along now. the parity check also finished on my other VM. Glad that the machine didn't croak with all these disks chugging at the same time. *whew*. OK, are you saying you had a parity check running in another VM on the SAME physical machine? That would be major resource contention! UnRAID, especially during a parity check/build is I/O bound, meaning it will be making maximum use of the available I/O bandwidth, and everything else has to wait their turn. A VM is great for sharing unused resources, so multiple VM's can use idle CPU time, and can use unused RAM, but NOT any unused I/O because there isn't any! Running 2 VM's will not double your available I/O capabilities! So of course it sped up close to normal, once the parity check finished. Quote Link to comment
axeman Posted February 17, 2013 Share Posted February 17, 2013 OK, are you saying you had a parity check running in another VM on the SAME physical machine? That would be major resource contention! UnRAID, especially during a parity check/build is I/O bound, meaning it will be making maximum use of the available I/O bandwidth, and everything else has to wait their turn. A VM is great for sharing unused resources, so multiple VM's can use idle CPU time, and can use unused RAM, but NOT any unused I/O because there isn't any! Running 2 VM's will not double your available I/O capabilities! So of course it sped up close to normal, once the parity check finished. yea - it was going at the same time. as for the I/O bandwidth, i wasn't expecting it to be an issue, since the test array was running off a BR10i card and the other is running off the mobo headers. either way - lesson learned. like i said i figured the wd drive was defective until proved otherwise. Quote Link to comment
axeman Posted February 17, 2013 Share Posted February 17, 2013 okay - so the issue is back - and i am trying to get a log, whats the best way to do that? Simple features seems to freeze on opening it. I ran this command: cp /var/log/syslog /boot/syslog-2008-04-10.txt on the console, via putty and ended up with the attached.. no carriage returns, etc. I have four pre-clears running at the same time; only one of them is runnign at full speed (about 130MB/s the others are all sub 5MB/s). is there a better way to get a syslog? I had to zip it, it was too large syslog-2013-02-17.zip Quote Link to comment
axeman Posted February 17, 2013 Share Posted February 17, 2013 okay - got it - better formatted syslog-2013-02-17.zip Quote Link to comment
Joe L. Posted February 17, 2013 Share Posted February 17, 2013 Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat Your flash drive is not readable (and probably not writable) Run scandisk/checkdisk on it on your window's PC to fix it. Joe L. Quote Link to comment
JustinChase Posted February 17, 2013 Share Posted February 17, 2013 The "svw" (write four values, ending with all zeros) will take longer, but is a more through test. It will need a subsequent preclear if you intend to add it as an additional disk in the array. If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary. If you have the time, this is the one I would perform. Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read phase to save a bit of time since the badblocks would have just read all the sectors. To run only the writing of zeros and post-read-verify phases (skipping the pre--read) preclear_disk.sh -W -A /dev/sdk Thanks again Joe! i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot! Since it will take longer to order a new drive than to run any tests, I'm going to run this one now... badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svw /dev/sdk then I'll run this when it finishes (unless the drive explodes because of the first one ) preclear_disk.sh -W -A /dev/sdk 26 hours later, it's done. It says "Pass completed. 9 bad blocks found" it doesn't say if it 'fixed' the bad blocks, nor how I can fix them myself. The log it created is useless to me, it says... 14973952 14974005 14968832 14969495 14973568 14974006 10838016 10838939 14970760 which didn't have line breaks in the actual log, but now I suppose that is the list of the 9 bad blocks. is there some way to mark them as bad, and continue with using the rest of the disk? I don't want to continue with the preclear until i know if there's anything I can do about these bad blocks, or if it's necessary. Quote Link to comment
axeman Posted February 17, 2013 Share Posted February 17, 2013 Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat Your flash drive is not readable (and probably not writable) Run scandisk/checkdisk on it on your window's PC to fix it. Joe L. i thought that error was related to not having the array started. If i start the shutdown from the webui, will it cancel the pre-clears? or should i wait for them to finish? Quote Link to comment
Joe L. Posted February 18, 2013 Share Posted February 18, 2013 The "svw" (write four values, ending with all zeros) will take longer, but is a more through test. It will need a subsequent preclear if you intend to add it as an additional disk in the array. If you intend to use it as a replacement of a failed/failing drive, the preclear signature is not necessary. If you have the time, this is the one I would perform. Depending on your needs, if you do wish to have a preclear signature you can skip the pre-read phase to save a bit of time since the badblocks would have just read all the sectors. To run only the writing of zeros and post-read-verify phases (skipping the pre--read) preclear_disk.sh -W -A /dev/sdk Thanks again Joe! i really appreciate all your help, and the fact that you are usually so very quick to respond to help requests, it means a lot! Since it will take longer to order a new drive than to run any tests, I'm going to run this one now... badblocks -c 1024 -b 65536 -o /boot/badblocks_out.txt -svw /dev/sdk then I'll run this when it finishes (unless the drive explodes because of the first one ) preclear_disk.sh -W -A /dev/sdk 26 hours later, it's done. It says "Pass completed. 9 bad blocks found" Good, you originally had 9 blocks marked for re-allocation in your prior SMART report. Now, get a new SMART report and see what the current statistics show. it doesn't say if it 'fixed' the bad blocks, nor how I can fix them myself.That is what will be shown on the SMART report/ The log it created is useless to me, it says... 14973952 14974005 14968832 14969495 14973568 14974006 10838016 10838939 14970760 which didn't have line breaks in the actual log, but now I suppose that is the list of the 9 bad blocks. It does have linefeeds, but not carriage returns. MS-Dos uses both. You can read the file easily if you use an editor that recognizes UNIX/Linux files. (many seem to like notepad2) is there some way to mark them as bad, and continue with using the rest of the disk?The SMART firmware on the disk should have already done just that. I don't want to continue with the preclear until i know if there's anything I can do about these bad blocks, or if it's necessary. Get a new smart report. It will take just a few seconds and let you know what has happened on the disk. With any luck you'll see 9 sectors re-allocated, and none pending re-allocation. smartctl -a /dev/sdk Joe L. Quote Link to comment
Joe L. Posted February 18, 2013 Share Posted February 18, 2013 Feb 17 10:26:25 unraid5 kernel: read_file: error 2 opening /boot/config/super.dat Feb 17 10:26:25 unraid5 kernel: md: could not read superblock from /boot/config/super.dat Your flash drive is not readable (and probably not writable) Run scandisk/checkdisk on it on your window's PC to fix it. Joe L. i thought that error was related to not having the array started. If i start the shutdown from the webui, will it cancel the pre-clears? or should i wait for them to finish? Unless you've never started the array, the error is unexpected. The file is created the first time you start the array. Quote Link to comment
JustinChase Posted February 18, 2013 Share Posted February 18, 2013 Get a new smart report. It will take just a few seconds and let you know what has happened on the disk. With any luck you'll see 9 sectors re-allocated, and none pending re-allocation. smartctl -a /dev/sdk Joe L. I don't see any mention of re-allocating sectors, so I'm going to preclear again while I'm at work today. thanks again for all your help with this... smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD103UJ Serial Number: S13PJDWS337885 Firmware Version: 1AA01113 User Capacity: 1,000,204,886,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 3b Local Time is: Mon Feb 18 08:06:58 2013 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (11566) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off supp ort. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 194) minutes. Conveyance self-test routine recommended polling time: ( 21) minutes. SCT capabilities: (0x003f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 099 081 051 Pre-fail Always - 326 3 Spin_Up_Time 0x0007 083 083 011 Pre-fail Always - 5960 4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 1658 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 9598 9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 25623 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 139 13 Read_Soft_Error_Rate 0x000e 099 081 000 Old_age Always - 322 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 5311 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 075 054 000 Old_age Always - 25 (Min/Max 24/28) 194 Temperature_Celsius 0x0022 075 054 000 Old_age Always - 25 (Min/Max 23/31) 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 105263424 196 Reallocated_Event_Count 0x0032 096 096 000 Old_age Always - 150 197 Current_Pending_Sector 0x0012 100 099 000 Old_age Always - 6 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0 201 Soft_Read_Error_Rate 0x000a 099 098 000 Old_age Always - 8 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Extended offline Completed: read failure 90% 25457 191 5825454 # 2 Short offline Completed: read failure 20% 25457 191 5971623 # 3 Extended offline Completed: read failure 90% 24432 180 6590219 # 4 Extended offline Completed: read failure 90% 24416 191 6093552 # 5 Short offline Completed: read failure 20% 24415 180 6590219 # 6 Extended offline Completed: read failure 90% 24379 180 6590219 # 7 Extended offline Completed: read failure 90% 24345 180 6590219 # 8 Short offline Completed: read failure 20% 24345 191 6119474 # 9 Extended offline Completed: read failure 90% 23791 185 8603687 #10 Short offline Completed: read failure 20% 23790 185 8603687 #11 Extended offline Completed: read failure 90% 20232 191 6099552 #12 Short offline Completed: read failure 20% 20150 191 6001625 #13 Short offline Completed: read failure 20% 18923 191 6033547 #14 Extended offline Completed: read failure 90% 18923 191 6055629 #15 Extended offline Aborted by host 90% 18917 - #16 Short offline Completed: read failure 20% 17515 191 6088511 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
Joe L. Posted February 18, 2013 Share Posted February 18, 2013 We'll see what the disk looks like after the preclear. It shows 0 sectors re-allocated. (That is good, as it indicates the disk has been able to write successfully to the original sector.) It shows 6 sectors pending re-allocation. (This is bad, as it indicates they were identified in the most recent pass of the badblocks program...) It shows 150 reallocation events, which again indicates a constant trickle of sectors that are unreadable, but can be written in place. (the original "writes" to those blocks were marginal) Now, this can all be explained by either a defective disk drive, OR a drive that is sensitive to power supply noise or low voltages. (as when supplied on either a marginal supply, or connected through a number of high-resistance connectors/splitters, or sharing a power supply rail with a lot of other drives) In other words, if you can, try a different power connection. Did I ask you yet? What specific make/model power supply are you using? And what mix of disks are you powering? Joe L. Quote Link to comment
JustinChase Posted February 18, 2013 Share Posted February 18, 2013 Okay. I'm hoping the new preclear will be finished when I get home from work today, but probably not until late this evening (depending on how much time skipping the pre-read saves me). You did ask about the power supply (model in signature), and responded that it seemed good enough... Your power supply is a single rail supply rated at 44Amps. It should be plenty powerful. However, if you have lots of splitters in between it and the drives, you might have poor voltages at the drives. I don't have 'lots of splitters', but I do have a couple of drives connected with an old style power connector adaptor to a new style SATA power connector. I honestly can't remember if this drive is connected with such an adaptor, and will have to take the server apart to find out for sure. The server has 8 120mm fans connected to the power supply (5 to one power connector, and 3 connected to another). if that matters. The server runs very cool, so I can disconnect at least 3 of the fans without issue, i'm sure. I will disconnect the extra/unnecessary fans tonight, and review exactly how this drive is connected to power. If I can, I'll connect it directly to a SATA power connector, but that will just force me to connect another drive to the adaptor instead (assuming this is currently connected this way). If the preclear shows pending re-allocations, and switching the power around still doesn't resolve the situation, does that just mean this drive isn't worth using in unRAID? If so, would it be reasonable/okay to use in an external case as a long-term backup drive? Or is it just a paper weight at that point? Maybe I'll upgrade to a new power supply with modular connections so I can just connect/use more SATA connectors, and not use the IDE type with adaptors. Quote Link to comment
axeman Posted February 18, 2013 Share Posted February 18, 2013 Unless you've never started the array, the error is unexpected. The file is created the first time you start the array. Thanks Joe - this array has never been started, and has nothing on it, data-wise. The two WD green drives are still going, on pass 2 of 3. The Samsungs finished, with what appears to be some red flags. Also - it looks like the slow down is gone again. it seems that the pre-read is the step that this happens at when there's another disk being written to..i thought the BR10i can handle that type of bandwidth, but perhaps it maxes at about 150MB/s ? Can I trouble you to look at these two logs and let me know what I should be freaked out about? Samsung_sdd.txt Samsung_sde.txt Quote Link to comment
Joe L. Posted February 18, 2013 Share Posted February 18, 2013 Unless you've never started the array, the error is unexpected. The file is created the first time you start the array. Thanks Joe - this array has never been started, and has nothing on it, data-wise. The two WD green drives are still going, on pass 2 of 3. The Samsungs finished, with what appears to be some red flags. Also - it looks like the slow down is gone again. it seems that the pre-read is the step that this happens at when there's another disk being written to..i thought the BR10i can handle that type of bandwidth, but perhaps it maxes at about 150MB/s ? Can I trouble you to look at these two logs and let me know what I should be freaked out about? Nothing too bad, other than the second drive looks like it has been bounced a bit at some point in its past: G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 3 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.