Joe L. Posted April 30, 2010 Share Posted April 30, 2010 Im such a retard, i highlighted the disk temperature and pressed Ctrl + c.. thats the danger of ssh-ing into a linux box from a windows laptop, i went into windows mode, Ctrl + c obviously quit the process so i had to start it again for SDA, anyway, the temps are looking better Disk 1: Disk Temperature: 31C, Elapsed Time: 2:15:45 (Just before ctrl + c) Disk 2: Disk Temperature: 30C, Elapsed Time: 2:16:38 Disk 3: Disk Temperature: 31C, Elapsed Time: 2:15:34 Those are far more normal looking... You'll just test them about 2 1/4 hours longer... (and you've heat-tested them too :-)) Many years ago (roughly 1973/4) I was involved in heat-testing a telephone company electronic-switching-system. We purposely shut down the air-conditioning to the room the "computer" was in by blocking the air vents. The computer was all discrete components, individual transistor and diode logic. Its design pre-dated ICs, and pre-dated motherboards, and pre-dated hard-disks. The "computer" was built in 7 foot tall racks of equipment in a room about 100 feet square. 16k of RAM was 4 feet wide and 7 feet tall and gave off over 2000 watts of heat. (So much for being green) It was 47 bits wide, 2 20 bit half-words, with 7 bits of hamming and parity. The equivalent today would be error-correcting ECC ram. Not too bad for 36 years ago. We had 32 banks of memory, so it alone gave off 64,000 watts of heat. Ouch... and lots of other equipment in the room gave off heat too. When the air-conditioning was shut off the temperature rose about a degree a minute until it leveled off at nearly 130 degrees near the top of the equipment racks. Our job was to trouble-shoot and fix the temperature sensitive components as the test progressed. I think we kept it at the elevated temperature for about 24 hours. It was done before we brought the system on-line. It was necessary since in a power failure it was expected to keep running on the central-office batteries but the air-conditioning would not be running. We wanted to be certain it would work in any extended outage. Some years later, in 1977, when the entire New York area had a power outage it was put through its paces. It stayed running, even as the batteries in the basement of the phone-company building slowly were discharged, and the emergency turbine generators on the roof ran out of fuel. Fortunately, power came back online... My old system was hobbling along... the combination of high temperatures and low voltage caused some outages, but we must have done well with our initial temperature tests... as it did not stop running. You've just temperature tested your hard-disks... Hopefully it will be the last time. It is why many of us have temperature initiated warnings in place as add-ons. Joe L. Quote Link to comment
Kode Posted April 30, 2010 Author Share Posted April 30, 2010 You've just temperature tested your hard-disks... Hopefully it will be the last time. It is why many of us have temperature initiated warnings in place as add-ons. Joe L. Just looked in pkg manager on unmenu, is that unraid-status-mail and ssmpt? Quote Link to comment
Joe L. Posted April 30, 2010 Share Posted April 30, 2010 You've just temperature tested your hard-disks... Hopefully it will be the last time. It is why many of us have temperature initiated warnings in place as add-ons. Joe L. Just looked in pkg manager on unmenu, is that unraid-status-mail and ssmpt? Yes, that is what I use. I just updated the status e-mail to not attempt to read the temperature of the flash drive, so you might want to use the "Check for Updates" button to get the newest version. Joe L. Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 Is the output of preclear saved anywhere? I wanted to check the output on the finished drives, but the cable i had running from downstairs (where the server is) to upstairs (where the router is) was dodgy i think, so moved the server upstairs. I want to try testing for the nForce 4 data corruption isses, but dont want to transfer the whole 3TB i have to do it, so whats going to be the best way? Assign parity drive, both data drives, press start to bring the array online and do a parity sync, copy several multi GB files using teracopy to check they are being copied propery, then run a couple of parity checks? or assign just the data drives, then when the copies are done assign the parity? If/when im happy with the situation, would i just delete the files i've put on there, unassign parity, copy data over, stop array, then assign parity and start the array again? Quote Link to comment
Joe L. Posted May 2, 2010 Share Posted May 2, 2010 Is the output of preclear saved anywhere? I wanted to check the output on the finished drives, but the cable i had running from downstairs (where the server is) to upstairs (where the router is) was dodgy i think, so moved the server upstairs. I want to try testing for the nForce 4 data corruption isses, but dont want to transfer the whole 3TB i have to do it, so whats going to be the best way? Assign parity drive, both data drives, press start to bring the array online and do a parity sync, copy several multi GB files using teracopy to check they are being copied propery, then run a couple of parity checks? or assign just the data drives, then when the copies are done assign the parity? If/when im happy with the situation, would i just delete the files i've put on there, unassign parity, copy data over, stop array, then assign parity and start the array again? I would test for the corruption with the parity and as many data disks possible assigned, otherwise, the test is not a valid test. Therefore, unless you absolutely need to perform the transfer in the shortest time possible, Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 Ok, i'll assign all the drives then (1 parity, 2 data) the only reason i asked was reading the nvidia forums it seems like the corruption occurred when people were copying gigabyte files. When i eventually transfer my 3 TB of files what would be the best way to do this? Try and find 1.5TB of files move those to the first disk share, then the remaining to the second, or set up user shares? Quote Link to comment
Joe L. Posted May 2, 2010 Share Posted May 2, 2010 Ok, i'll assign all the drives then (1 parity, 2 data) the only reason i asked was reading the nvidia forums it seems like the corruption occurred when people were copying gigabyte files. When i eventually transfer my 3 TB of files what would be the best way to do this? Try and find 1.5TB of files move those to the first disk share, then the remaining to the second, or set up user shares? True, but if it is caused by noise on the motherboard I did not want you to be fooled by thinking writing to a single disk works (and that is what occurs when you do not have a parity disk assigned) As far as how to organize your data, All I can say is I'd set up two or three high level folders on one disk. For myself I used Movies Music Pictures data I then moved my files to directories under those. Perhaps something like those will work for you. If you create those same directories on each of your disk shares they will be merged when you enable user-shares and you'll see all the movies as a single share. I almost never use user-shares when writing files to the array. I usually have them configured as read-only. I leave the disk-shares as hidden-writable. That way they don't show in my network media players but I can use them by entering their full path in windows-explorer. Joe L. Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 Great advice, thank you Still have 270mins left of the parity sync so probably wont try writing anything to the array until tomorrow now, considering its nearly 9pm. You never replied if the preclear results were saved anywhere, but to be honest i dont think there was anything wrong with the disks, the one i accidentally cancelled after 2 hours had some slightly different results to the other 2, but nothing too bad i dont think, there were no reallocated sectors that i saw. Thanks for all the help Quote Link to comment
Joe L. Posted May 2, 2010 Share Posted May 2, 2010 Great advice, thank you Still have 270mins left of the parity sync so probably wont try writing anything to the array until tomorrow now, considering its nearly 9pm. You never replied if the preclear results were saved anywhere, but to be honest i dont think there was anything wrong with the disks, the one i accidentally cancelled after 2 hours had some slightly different results to the other 2, but nothing too bad i dont think, there were no reallocated sectors that i saw. Thanks for all the help The pre-clear results are not saved between boots. The pre-clear results are saved to the syslog. The individual "smart" reports are in /tmp but it too is wiped away every time you reboot. Just because you are currently doing an initial parity calc there is absolutely no reason you cannot start loading data. Parity is maintained... It will slow the parity calc a tiny bit, since the disk heads will have to seek between the two operations, but the time it takes to complete probably won't matter if you are copying the files to the server overnight while you sleep. Joe L. Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 i copied a few things using teracopy, a 4GB iso file, a couple of CD iso files, a season and a half of a TV series, a few other things, some to disk 1, some to disk 2, all the items CRC check out. However, refreshing the main page on the Unraid web page to see how long the parity sync had left, i noticed this, is this normal? Model / Serial No. Temperature Size Free Reads Writes Errors parity SAMSUNG_HD153WI_S1UVJ1LZ302796 28°C 1,465,138,552 - 116 2,654,246 0 disk1 SAMSUNG_HD153WI_S1UVJ1LZ302798 28°C 1,465,138,552 1,458,180,612 2,494,655 52,312 3,707 disk2 SAMSUNG_HD153WI_S1UVJ1LZ302800 27°C 1,465,138,552 1,451,676,524 2,490,429 90,809 0 In particular the 3,707 errors on disk1 Quote Link to comment
Joe L. Posted May 2, 2010 Share Posted May 2, 2010 i copied a few things using teracopy, a 4GB iso file, a couple of CD iso files, a season and a half of a TV series, a few other things, some to disk 1, some to disk 2, all the items CRC check out. However, refreshing the main page on the Unraid web page to see how long the parity sync had left, i noticed this, is this normal? Model / Serial No. Temperature Size Free Reads Writes Errors parity SAMSUNG_HD153WI_S1UVJ1LZ302796 28°C 1,465,138,552 - 116 2,654,246 0 disk1 SAMSUNG_HD153WI_S1UVJ1LZ302798 28°C 1,465,138,552 1,458,180,612 2,494,655 52,312 3,707 disk2 SAMSUNG_HD153WI_S1UVJ1LZ302800 27°C 1,465,138,552 1,451,676,524 2,490,429 90,809 0 In particular the 3,707 errors on disk1 Those are "read" errors. You should get a smart report on that drive. It might be failing, or it might just be a bad or loose cable. If nothing else, post a syslog. If you install unMENU, it will make both of those tasks much easier. http://code.google.com/p/unraid-unmenu/ It is described here: http://lime-technology.com/forum/index.php?topic=2595.0 It has a disk management page that can run smart reports on the disk drives by just clicking on a button, and a system log page that will allow you to easily view and/or download a system log for attachment. Joe L. Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 will run smart after the parity sync has finished, will attach a syslog though. syslog was too big, have uploaded it here http://www.lockstockmods.net/unraid/syslog-2010-05-02.txt Quote Link to comment
Joe L. Posted May 2, 2010 Share Posted May 2, 2010 will run smart after the parity sync has finished, will attach a syslog though. syslog was too big, have uploaded it here http://www.lockstockmods.net/unraid/syslog-2010-05-02.txt No need to wait to run the smart report. You can do it at any time. The errors are all like this: May 2 21:39:01 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 May 2 21:39:01 Tower kernel: ata3.00: BMDMA stat 0x24 May 2 21:39:01 Tower kernel: ata3.00: failed command: READ DMA EXT May 2 21:39:01 Tower kernel: ata3.00: cmd 25/00:00:c7:fd:1b/00:04:68:00:00/e0 tag 0 dma 524288 in May 2 21:39:01 Tower kernel: res 51/40:88:3f:00:1c/40:01:68:00:00/e0 Emask 0x9 (media error) May 2 21:39:01 Tower kernel: ata3.00: status: { DRDY ERR } May 2 21:39:01 Tower kernel: ata3.00: error: { UNC } May 2 21:39:01 Tower kernel: ata3.00: configured for UDMA/133 May 2 21:39:01 Tower kernel: ata3: EH complete Media errors are usually indications of unreadable sectors on the disk. You'll probably see sectors pending re-allocation, and sectors already re-allocated in the smart report. The command line command would be: smartctl -a -d ata /dev/sda I see from your syslog you've already installed unMENU. Just run the smart status report for disk1 from the disk-management page. Joe L. Quote Link to comment
Kode Posted May 2, 2010 Author Share Posted May 2, 2010 A short smart returned this: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 1640 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 074 060 025 Pre-fail Always - 8028 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 51 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 4 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 045 000 Old_age Always - 29 (Lifetime Min/Max 18/55) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 164 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 5 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77 SMART Error Log Version: 1 No Errors Logged Should i run a long one as well? Quote Link to comment
Joe L. Posted May 3, 2010 Share Posted May 3, 2010 A short smart returned this: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 1640 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 074 060 025 Pre-fail Always - 8028 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 51 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 4 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 045 000 Old_age Always - 29 (Lifetime Min/Max 18/55) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 164 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 5 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 77 SMART Error Log Version: 1 No Errors Logged Should i run a long one as well? That is not the output of the "short" test. But, it is the section I was interested in. There are 164 sectors pending re-allocation. They will not be re-allocated until the they are written to, since the disk has no way to know what they should contain. The section of the smart report dealing with long (extended) and short tests is below the section with the parameters you posted. You'll need to disable any spin-down timer to run the long test, as it takes 3 to 5 hours on a large disk. The spin-down would cause it to abort. Both the "long" and "short" tests are just "requests to initiate the test. The results are visible in a subsequent "status" report when you request one after the required interval. A "short" test will attempt to read a small number of sectors. It typically takes less than 5 minutes. A "long" test will attempt to read all the sectors on the disk. It typically takes many hours. Either type of test can be run at any time (as long as you don't spin down the disk... since it is really hard for the test to continue with the disk not spinning) If there is nothing important on the disk, and you don't mind deleting the files on it, you can un-assign the disk from the array and then run the preclear_disk.sh script on it. It will completely exercise the disk pre-reading it, writing it with all zeros, then post-reading it. It also does a pre and post smart report compare to let you see if it found more sectors being re-allocated. I'd not use that disk unless those are the only errors and they do not continue to increase over time. somehow, I doubt they'll be the only ones, since you've just begun to use the disk. Joe L. Quote Link to comment
Kode Posted May 3, 2010 Author Share Posted May 3, 2010 Thanks Joe, i have just taken the disk out the array and am performing a preclear on it now, will see what it comes up with. Quote Link to comment
Kode Posted May 4, 2010 Author Share Posted May 4, 2010 S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2179 --- > 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 5286 64c64 < 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 4 --- > 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 5 68c68 < 197 Current_Pending_Sector 0x0032 099 099 000 Old_age Always - 250 --- > 197 Current_Pending_Sector 0x0032 100 099 000 Old_age Always - 1 71c71 < 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 5 --- > 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 12 ============================================================================ Quote Link to comment
Joe L. Posted May 4, 2010 Share Posted May 4, 2010 S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 2179 --- > 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 5286 64c64 < 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 4 --- > 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 5 68c68 < 197 Current_Pending_Sector 0x0032 099 099 000 Old_age Always - 250 --- > 197 Current_Pending_Sector 0x0032 100 099 000 Old_age Always - 1 71c71 < 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 5 --- > 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 12 ============================================================================ It appears as if the sectors that were pending re-allocation were able to be re-written in their existing locations. Since the pending count went down, but the re-allocated count did not go up. Probably a good sign. There is still one sector pending re-allocation. That probably showed itself in the post-read. Reading into what is happening, my best guess is that the disk is having some difficulty writing to the platters. When re-writing to the same sector, it succeeds. If you have time, run through another pre-clear cycle. If not, just keep an eye on it once you put it into service. Joe L. Quote Link to comment
Kode Posted May 4, 2010 Author Share Posted May 4, 2010 I'm in no rush, i have attached the full syslog as well, i will do another preclear, are the raw read error rates not a cause for concern? they went from 2179 to 5286 syslog-2010-05-04.txt Quote Link to comment
Kode Posted May 5, 2010 Author Share Posted May 5, 2010 Ok, i have completed another preclear, results seem a little better, though RAW_READ_ERROR_RATE has gone up again, but not by much this time. S.M.A.R.T. error count differences detected after pre-clear note, some 'raw' values may change, but not be an indication of a problem 54c54 < 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 5286 --- > 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 5485 71c71 < 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 12 --- > 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 15 syslog-2010-05-05.txt Quote Link to comment
Joe L. Posted May 5, 2010 Share Posted May 5, 2010 I'm in no rush, i have attached the full syslog as well, i will do another preclear, are the raw read error rates not a cause for concern? they went from 2179 to 5286 If they were of a concern you'd see the "normalized" column for that parameter change. It has not changed at all. The Raw values only have meaning to the manufacturer. Joe L. Quote Link to comment
Kode Posted May 6, 2010 Author Share Posted May 6, 2010 I did a parity check last night, sda is still showing read errors, 2845 this time, which is down on the 8000 odd last time, i clicked on short smart test in unraid, then after 2 mins clicked smart status report and it came up with SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 6104 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 074 060 025 Pre-fail Always - 8028 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 119 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 5 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 045 000 Old_age Always - 20 (Lifetime Min/Max 15/55) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 099 000 Old_age Always - 41 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 15 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 125 Which seems to indicate there are 41 sectors pending. Is this disk dying? Should i try returning the disk? If so what would i say? Quote Link to comment
Joe L. Posted May 6, 2010 Share Posted May 6, 2010 I did a parity check last night, sda is still showing read errors, 2845 this time, which is down on the 8000 odd last time, i clicked on short smart test in unraid, then after 2 mins clicked smart status report and it came up with SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 6104 2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0 3 Spin_Up_Time 0x0023 074 060 025 Pre-fail Always - 8028 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 119 10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 8 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 5 192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 064 045 000 Old_age Always - 20 (Lifetime Min/Max 15/55) 195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 099 000 Old_age Always - 41 198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 15 223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0 225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 125 Which seems to indicate there are 41 sectors pending. Is this disk dying? Should i try returning the disk? If so what would i say? All you need say is it is randomly unable to re-read what it has previously written to the platters. You are correct, there are now 41 more sectors pending re-allocation. The most interesting point though, is so far none of the sectors have actually been re-allocated. As I said before, it appears as if the sectors that were pending re-allocation are able to be re-written in their existing locations. We can deduce this since the pending count is going down, but the re-allocated count did not go up. Reading into what is happening, my best guess is that the disk is having some difficulty writing to the platters. When re-writing to the same sector, it succeeds. One thing you might try is a different POWER connection to the drive. It is possible that the voltage on the drive is not sufficient or noise-free enough for it to properly write to the disk. Put it on a different "rail" on the power supply, or get rid of any splitters in line. Joe L. Quote Link to comment
Kode Posted May 6, 2010 Author Share Posted May 6, 2010 Hi Joe, from the PSU there are 2 cables with just sata connectors on (3 on each cable iirc), the drive in question is connected to the middle one, i'd have thought if there was an issue, it would effect the end one as well, or at least occasionally effect the other ones as well, but in every single test, the only drive that has been affected is the middle one. I will double check the connections though. The PSU is a 700W FSP EPSILON FX700GLN PSU, so it shouldn't be overloading it. Thanks for all your help and advice Joe, its been a massive help. I might contact scan (who i bought the HD's from) and tell them whats going on, as i'm not sure how long i have to return them, so probably be better if i have at least contacted them. Quote Link to comment
Joe L. Posted May 6, 2010 Share Posted May 6, 2010 Hi Joe, from the PSU there are 2 cables with just sata connectors on (3 on each cable iirc), the drive in question is connected to the middle one, i'd have thought if there was an issue, it would effect the end one as well, or at least occasionally effect the other ones as well, but in every single test, the only drive that has been affected is the middle one. I will double check the connections though. The PSU is a 700W FSP EPSILON FX700GLN PSU, so it shouldn't be overloading it. Thanks for all your help and advice Joe, its been a massive help. I might contact scan (who i bought the HD's from) and tell them whats going on, as i'm not sure how long i have to return them, so probably be better if i have at least contacted them. It may just be more sensitive to voltage fluctuations than the others, or, it might just have a defect making it harder for it to read what it has written. Either way, you've learned this before putting the disk in use for your data, so you are way ahead in the long run. Just think how many disks are in service where we mistakenly blame Microsoft for data corruption instead of the actual hardware. If the disk is new, and a change to a different power connector does not do it, then RMA it. You do not want a disk you cannot reliably read. Joe L. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.