MSI K8N Diamond - Am i going to be flogging a dead horse?


Recommended Posts

Im such a retard, i highlighted the disk temperature and pressed Ctrl + c.. thats the danger of ssh-ing into a linux box from a windows laptop, i went into windows mode, Ctrl + c obviously quit the process so i had to start it again for SDA, anyway, the temps are looking better

 

Disk 1: Disk Temperature: 31C, Elapsed Time:  2:15:45 (Just before ctrl + c)

Disk 2: Disk Temperature: 30C, Elapsed Time:  2:16:38

Disk 3: Disk Temperature: 31C, Elapsed Time:  2:15:34

Those are far more normal looking...

 

You'll just test them about 2 1/4 hours longer...  (and you've heat-tested them too :-))

 

Many years ago (roughly 1973/4) I was involved in heat-testing a telephone company electronic-switching-system.  We purposely shut down the air-conditioning to the room the "computer" was in by blocking the air vents.   The computer was all discrete components, individual transistor and diode logic.  Its design pre-dated ICs, and pre-dated motherboards, and pre-dated hard-disks.   The "computer" was built in 7 foot tall racks of equipment in a room about 100 feet square.   16k of RAM was 4 feet wide and 7 feet tall and gave off over 2000 watts of heat.  (So much for being green)  It was 47 bits wide, 2 20 bit half-words, with 7 bits of hamming and parity.  The equivalent today would be error-correcting ECC ram.  Not too bad for 36 years ago.   We had 32 banks of memory, so it alone gave off 64,000 watts of heat.   Ouch... and lots of other equipment in the room gave off heat too.

 

When the air-conditioning was shut off the temperature rose about a degree a minute until it leveled off at nearly 130 degrees near the top of the equipment racks.   Our job was to trouble-shoot and fix the temperature sensitive components as the test progressed.  I think we kept it at the elevated temperature for about 24 hours.  It was done before we brought the system on-line.    It was necessary since in a power failure it was expected to keep running on the central-office batteries but the air-conditioning would not be running.   We wanted to be certain it would work in any extended outage.

 

Some years later, in 1977, when the entire New York area had a power outage it was put through its paces.   It stayed running, even as the batteries in the basement of the phone-company building slowly were discharged, and the emergency turbine generators on the roof ran out of fuel.   Fortunately, power came back online...   My old system was hobbling along... the combination of high temperatures and low voltage caused some outages, but we must have done well with our initial temperature tests... as it did not stop running.

 

You've just temperature tested your hard-disks... Hopefully it will be the last time.  It is why many of us have temperature initiated warnings in place as add-ons.

 

Joe L.

 

Link to comment
  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

You've just temperature tested your hard-disks... Hopefully it will be the last time.  It is why many of us have temperature initiated warnings in place as add-ons.

 

Joe L.

 

Just looked in pkg manager on unmenu, is that unraid-status-mail and ssmpt?

Link to comment

You've just temperature tested your hard-disks... Hopefully it will be the last time.  It is why many of us have temperature initiated warnings in place as add-ons.

 

Joe L.

 

Just looked in pkg manager on unmenu, is that unraid-status-mail and ssmpt?

Yes, that is what I use.  I just updated the status e-mail to not attempt to read the temperature of the flash drive, so you might want to use the "Check for Updates" button to get the newest version.

 

Joe L.

Link to comment

Is the output of preclear saved anywhere?  I wanted to check the output on the finished drives, but the cable i had running from downstairs (where the server is) to upstairs (where the router is) was dodgy i think, so moved the server upstairs.

 

I want to try testing for the nForce 4 data corruption isses, but dont want to transfer the whole 3TB i have to do it, so whats going to be the best way?

 

Assign parity drive, both data drives, press start to bring the array online and do a parity sync, copy several multi GB files using teracopy to check they are being copied propery, then run a couple of parity checks?

 

or assign just the data drives, then when the copies are done assign the parity?

 

If/when im happy with the situation, would i just delete the files i've put on there, unassign parity, copy data over, stop array, then assign parity and start the array again?

Link to comment

Is the output of preclear saved anywhere?  I wanted to check the output on the finished drives, but the cable i had running from downstairs (where the server is) to upstairs (where the router is) was dodgy i think, so moved the server upstairs.

 

I want to try testing for the nForce 4 data corruption isses, but dont want to transfer the whole 3TB i have to do it, so whats going to be the best way?

 

Assign parity drive, both data drives, press start to bring the array online and do a parity sync, copy several multi GB files using teracopy to check they are being copied propery, then run a couple of parity checks?

 

or assign just the data drives, then when the copies are done assign the parity?

 

If/when im happy with the situation, would i just delete the files i've put on there, unassign parity, copy data over, stop array, then assign parity and start the array again?

I would test for the corruption with the parity and as many data disks possible assigned, otherwise, the test is not a valid test.

 

Therefore, unless you absolutely need to perform the transfer in the shortest time possible,

Link to comment

Ok, i'll assign all the drives then (1 parity, 2 data) the only reason i asked was reading the nvidia forums it seems like the corruption occurred when people were copying gigabyte files.

 

When i eventually transfer my 3 TB of files what would be the best way to do this?  Try and find 1.5TB of files move those to the first disk share, then the remaining to the second, or set up user shares?

Link to comment

Ok, i'll assign all the drives then (1 parity, 2 data) the only reason i asked was reading the nvidia forums it seems like the corruption occurred when people were copying gigabyte files.

 

When i eventually transfer my 3 TB of files what would be the best way to do this?  Try and find 1.5TB of files move those to the first disk share, then the remaining to the second, or set up user shares?

True, but if it is caused by noise on the motherboard I did not want you to be fooled by thinking writing to a single disk works (and that is what occurs when you do not have a parity disk assigned)

 

As far as how to organize your data, All I can say is I'd set up two or three high level folders on one disk.  For myself I used

Movies

Music

Pictures

data

 

I then moved my files to directories under those.   Perhaps something like those will work for you.  If you create those same directories on each of your disk shares they will be merged when you enable user-shares and you'll see all the movies as a single share.

 

I almost never use user-shares when writing files to the array.  I usually have them configured as read-only.   I leave the disk-shares as hidden-writable.  That way they don't show in my network media players but I can use them by entering their full path in windows-explorer.

 

Joe L.

 

Link to comment

Great advice, thank you :)

 

Still have 270mins left of the parity sync so probably wont try writing anything to the array until tomorrow now, considering its nearly 9pm.

 

You never replied if the preclear results were saved anywhere, but to be honest i dont think there was anything wrong with the disks, the one i accidentally cancelled after 2 hours had some slightly different results to the other 2, but nothing too bad i dont think, there were no reallocated sectors that i saw.

 

Thanks for all the help

Link to comment

Great advice, thank you :)

 

Still have 270mins left of the parity sync so probably wont try writing anything to the array until tomorrow now, considering its nearly 9pm.

 

You never replied if the preclear results were saved anywhere, but to be honest i dont think there was anything wrong with the disks, the one i accidentally cancelled after 2 hours had some slightly different results to the other 2, but nothing too bad i dont think, there were no reallocated sectors that i saw.

 

Thanks for all the help

The pre-clear results are not saved between boots.

 

The pre-clear results are saved to the syslog.  The individual "smart" reports are in /tmp but it too is wiped away every time you reboot.

 

Just because you are currently doing an initial parity calc there is absolutely no reason you cannot start loading data.  Parity is maintained... It will slow the parity calc a tiny bit, since the disk heads will have to seek between the two operations, but the time it takes to complete probably won't matter if you are copying the files to the server overnight while you sleep.

 

Joe L.

Link to comment

i copied a few things using teracopy, a 4GB iso file, a couple of CD iso files, a season and a half of a TV series, a few other things, some to disk 1, some to disk 2, all the items CRC check out.  However, refreshing the main page on the Unraid web page to see how long the parity sync had left, i noticed this, is this normal?

 

              Model / Serial No.                          Temperature   Size   Free                  Reads   Writes   Errors

parity       SAMSUNG_HD153WI_S1UVJ1LZ302796 28°C 1,465,138,552 -                116        2,654,246 0

disk1        SAMSUNG_HD153WI_S1UVJ1LZ302798 28°C 1,465,138,552 1,458,180,612 2,494,655 52,312 3,707

disk2        SAMSUNG_HD153WI_S1UVJ1LZ302800 27°C 1,465,138,552 1,451,676,524 2,490,429 90,809 0

 

In particular the 3,707 errors on disk1

Link to comment

i copied a few things using teracopy, a 4GB iso file, a couple of CD iso files, a season and a half of a TV series, a few other things, some to disk 1, some to disk 2, all the items CRC check out.  However, refreshing the main page on the Unraid web page to see how long the parity sync had left, i noticed this, is this normal?

 

              Model / Serial No.                          Temperature   Size   Free                  Reads   Writes   Errors

parity       SAMSUNG_HD153WI_S1UVJ1LZ302796 28°C 1,465,138,552 -                116        2,654,246 0

disk1        SAMSUNG_HD153WI_S1UVJ1LZ302798 28°C 1,465,138,552 1,458,180,612 2,494,655 52,312 3,707

disk2        SAMSUNG_HD153WI_S1UVJ1LZ302800 27°C 1,465,138,552 1,451,676,524 2,490,429 90,809 0

 

In particular the 3,707 errors on disk1

Those are "read" errors.  You should get a smart report on that drive.  It might be failing, or it might just be a bad or loose cable.

 

If nothing else, post a syslog.

 

If you install unMENU, it will make both of those tasks much easier.  http://code.google.com/p/unraid-unmenu/

It is described here: http://lime-technology.com/forum/index.php?topic=2595.0

It has a disk management page that can run smart reports on the disk drives by just clicking on a button, and a system log page that will allow you to easily view and/or download a system log for attachment.

 

Joe L.

Link to comment

will run smart after the parity sync has finished, will attach a syslog though.

 

syslog was too big, have uploaded it here http://www.lockstockmods.net/unraid/syslog-2010-05-02.txt

No need to wait to run the smart report.  You can do it at any time. 

 

The errors are all like this:

May  2 21:39:01 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

May  2 21:39:01 Tower kernel: ata3.00: BMDMA stat 0x24

May  2 21:39:01 Tower kernel: ata3.00: failed command: READ DMA EXT

May  2 21:39:01 Tower kernel: ata3.00: cmd 25/00:00:c7:fd:1b/00:04:68:00:00/e0 tag 0 dma 524288 in

May  2 21:39:01 Tower kernel:          res 51/40:88:3f:00:1c/40:01:68:00:00/e0 Emask 0x9 (media error)

May  2 21:39:01 Tower kernel: ata3.00: status: { DRDY ERR }

May  2 21:39:01 Tower kernel: ata3.00: error: { UNC }

May  2 21:39:01 Tower kernel: ata3.00: configured for UDMA/133

May  2 21:39:01 Tower kernel: ata3: EH complete

 

Media errors are usually indications of unreadable sectors on the disk.  You'll probably see sectors pending re-allocation, and sectors already re-allocated in the smart report.

 

The command line command would be:

smartctl -a -d ata /dev/sda

 

I see from your syslog you've already installed unMENU.  Just run the smart status report for disk1 from the disk-management page.

 

Joe L.

Link to comment

A short smart returned this:

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      1640

  2 Throughput_Performance  0x0026  252  252  000    Old_age  Always      -      0

  3 Spin_Up_Time            0x0023  074  060  025    Pre-fail  Always      -      8028

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      8

  5 Reallocated_Sector_Ct  0x0033  252  252  010    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  252  252  051    Old_age  Always      -      0

  8 Seek_Time_Performance  0x0024  252  252  015    Old_age  Offline      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      51

10 Spin_Retry_Count        0x0032  252  252  051    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  252  252  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      8

191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -      4

192 Power-Off_Retract_Count 0x0022  252  252  000    Old_age  Always      -      0

194 Temperature_Celsius    0x0002  064  045  000    Old_age  Always      -      29 (Lifetime Min/Max 18/55)

195 Hardware_ECC_Recovered  0x003a  100  100  000    Old_age  Always      -      0

196 Reallocated_Event_Count 0x0032  252  252  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  100  100  000    Old_age  Always      -      164

198 Offline_Uncorrectable  0x0030  252  252  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0036  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      5

223 Load_Retry_Count        0x0032  252  252  000    Old_age  Always      -      0

225 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      77

 

SMART Error Log Version: 1

No Errors Logged

 

Should i run a long one as well?

Link to comment

A short smart returned this:

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       1640

  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0

  3 Spin_Up_Time            0x0023   074   060   025    Pre-fail  Always       -       8028

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       8

  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0

  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       51

10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8

191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       4

192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0

194 Temperature_Celsius     0x0002   064   045   000    Old_age   Always       -       29 (Lifetime Min/Max 18/55)

195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0

196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       164

198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       5

223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0

225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       77

 

SMART Error Log Version: 1

No Errors Logged

 

Should i run a long one as well?

That is not the output of the "short" test.  But, it is the section I was interested in.  There are 164 sectors pending re-allocation.

 

They will not be re-allocated until the they are written to, since the disk has no way to know what they should contain.

 

The section of the smart report dealing with long (extended) and short tests is below the section with the parameters you posted.

 

You'll need to disable any spin-down timer to run the long test, as it takes 3 to 5 hours on a large disk.  The spin-down would cause it to abort.  Both the "long" and "short" tests are just "requests to initiate the test.  The results are visible in a subsequent "status" report when you request one after the required interval.   

 

A "short" test will attempt to read a small number of sectors.  It typically takes less than 5 minutes.  A "long" test will attempt to read all the sectors on the disk.  It typically takes many hours.  Either type of test can be run at any time (as long as you don't spin down the disk... since it is really hard for the test to continue with the disk not spinning)

 

If there is nothing important on the disk, and you don't mind deleting the files on it, you can un-assign the disk from the array and then run the preclear_disk.sh script on it.  It will completely exercise the disk pre-reading it, writing it with all zeros, then post-reading it.  It also does a pre and post smart report compare to let you see if it found more sectors being re-allocated.

 

I'd not use that disk unless those are the only errors and they do not continue to increase over time.  somehow, I doubt they'll be the only ones, since you've just begun to use the disk.

 

Joe L.

 

Link to comment

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      2179

---

>  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      5286

64c64

< 191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -      4

---

> 191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -      5

68c68

< 197 Current_Pending_Sector  0x0032  099  099  000    Old_age  Always      -      250

---

> 197 Current_Pending_Sector  0x0032  100  099  000    Old_age  Always      -      1

71c71

< 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      5

---

> 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      12

============================================================================

 

Link to comment

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<   1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       2179

---

>   1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       5286

64c64

< 191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       4

---

> 191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       5

68c68

< 197 Current_Pending_Sector  0x0032   099   099   000    Old_age   Always       -       250

---

> 197 Current_Pending_Sector  0x0032   100   099   000    Old_age   Always       -       1

71c71

< 200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       5

---

> 200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       12

============================================================================

 

It appears as if the sectors that were pending re-allocation were able to be re-written in their existing locations. Since the pending count went down, but the re-allocated count did not go up.  Probably a good sign.

 

There is still one sector pending re-allocation.    That probably showed itself in the post-read.

 

Reading into what is happening, my best guess is that the disk is having some difficulty writing to the platters.  When re-writing to the same sector, it succeeds.

 

If you have time, run through another pre-clear cycle.  If not, just keep an eye on it once you put it into service.

 

Joe L.

Link to comment

Ok, i have completed another preclear, results seem a little better, though RAW_READ_ERROR_RATE has gone up again, but not by much this time.

 

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

54c54

<  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      5286

---

>  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      5485

71c71

< 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      12

---

> 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      15

syslog-2010-05-05.txt

Link to comment

I'm in no rush, i have attached the full syslog as well, i will do another preclear, are the raw read error rates not a cause for concern? they went from 2179 to 5286

If they were of a concern you'd see the "normalized" column for that parameter change.  It has not changed at all.

 

The Raw values only have meaning to the manufacturer.

 

Joe L.

Link to comment

I did a parity check last night, sda is still showing read errors, 2845 this time, which is down on the 8000 odd last time, i clicked on short smart test in unraid, then after 2 mins clicked smart status report and it came up with

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -      6104

  2 Throughput_Performance  0x0026  252  252  000    Old_age  Always      -      0

  3 Spin_Up_Time            0x0023  074  060  025    Pre-fail  Always      -      8028

  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      10

  5 Reallocated_Sector_Ct  0x0033  252  252  010    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x002e  252  252  051    Old_age  Always      -      0

  8 Seek_Time_Performance  0x0024  252  252  015    Old_age  Offline      -      0

  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      119

10 Spin_Retry_Count        0x0032  252  252  051    Old_age  Always      -      0

11 Calibration_Retry_Count 0x0032  252  252  000    Old_age  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      8

191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -      5

192 Power-Off_Retract_Count 0x0022  252  252  000    Old_age  Always      -      0

194 Temperature_Celsius    0x0002  064  045  000    Old_age  Always      -      20 (Lifetime Min/Max 15/55)

195 Hardware_ECC_Recovered  0x003a  100  100  000    Old_age  Always      -      0

196 Reallocated_Event_Count 0x0032  252  252  000    Old_age  Always      -      0

197 Current_Pending_Sector  0x0032  100  099  000    Old_age  Always      -      41

198 Offline_Uncorrectable  0x0030  252  252  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0036  200  200  000    Old_age  Always      -      0

200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -      15

223 Load_Retry_Count        0x0032  252  252  000    Old_age  Always      -      0

225 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      125

 

Which seems to indicate there are 41 sectors pending.

 

Is this disk dying?

 

Should i try returning the disk?  If so what would i say?

Link to comment

I did a parity check last night, sda is still showing read errors, 2845 this time, which is down on the 8000 odd last time, i clicked on short smart test in unraid, then after 2 mins clicked smart status report and it came up with

 

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       6104

  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0

  3 Spin_Up_Time            0x0023   074   060   025    Pre-fail  Always       -       8028

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10

  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0

  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       119

10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       8

191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       5

192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0

194 Temperature_Celsius     0x0002   064   045   000    Old_age   Always       -       20 (Lifetime Min/Max 15/55)

195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0

196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   100   099   000    Old_age   Always       -       41

198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       15

223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0

225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       125

 

Which seems to indicate there are 41 sectors pending.

 

Is this disk dying?

 

Should i try returning the disk?  If so what would i say?

All you need say is it is randomly unable to re-read what it has previously written to the platters.

 

You are correct, there are now 41 more sectors pending re-allocation.

 

The most interesting point though, is so far none of the sectors have actually been re-allocated.  As I said before,

it appears as if the sectors that were pending re-allocation are able to be re-written in their existing locations. We can deduce this since the pending count is going down, but the re-allocated count did not go up. 

 

Reading into what is happening, my best guess is that the disk is having some difficulty writing to the platters.  When re-writing to the same sector, it succeeds.

 

One thing you might try is a different POWER connection to the drive.  It is possible that the voltage on the drive is not sufficient or noise-free enough for it to properly write to the disk.  Put it on a different "rail" on the power supply, or get rid of any splitters in line.

 

Joe L.

Link to comment

Hi Joe, from the PSU there are 2 cables with just sata connectors on (3 on each cable iirc), the drive in question is connected to the middle one, i'd have thought if there was an issue, it would effect the end one as well, or at least occasionally effect the other ones as well, but in every single test, the only drive that has been affected is the middle one.  I will double check the connections though. The PSU is a 700W FSP EPSILON FX700GLN PSU, so it shouldn't be overloading it.  Thanks for all your help and advice Joe, its been a massive help.  I might contact scan (who i bought the HD's from) and tell them whats going on, as i'm not sure how long i have to return them, so probably be better if i have at least contacted them.

Link to comment

Hi Joe, from the PSU there are 2 cables with just sata connectors on (3 on each cable iirc), the drive in question is connected to the middle one, i'd have thought if there was an issue, it would effect the end one as well, or at least occasionally effect the other ones as well, but in every single test, the only drive that has been affected is the middle one.  I will double check the connections though. The PSU is a 700W FSP EPSILON FX700GLN PSU, so it shouldn't be overloading it.  Thanks for all your help and advice Joe, its been a massive help.  I might contact scan (who i bought the HD's from) and tell them whats going on, as i'm not sure how long i have to return them, so probably be better if i have at least contacted them.

It may just be more sensitive to voltage fluctuations than the others, or, it might just have a defect making it harder for it to read what it has written.  Either way, you've learned this before putting the disk in use for your data, so you are way ahead in the long run.

 

Just think how many disks are in service where we mistakenly blame Microsoft for data corruption instead of the actual hardware.  :D

 

If the disk is new, and a change to a different power connector does not do it, then RMA it.  You do not want a disk you cannot reliably read.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.