Preclear.sh results - Questions about your results? Post them here.


Recommended Posts

Well I am by no means an expert but there are lots of I/O errors in there and it looks to me like at some point unRAID had to reset the link to the drive because of this.  Here are some of the lines from the report ...

 

Oct 29 02:47:09 Titan kernel: ata5.00: failed command: CHECK POWER MODE
Oct 29 02:47:09 Titan kernel: ata5.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0
Oct 29 02:47:09 Titan kernel:          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 29 02:47:09 Titan kernel: ata5.00: status: { DRDY }
Oct 29 02:47:09 Titan kernel: ata5: hard resetting link

 

Maybe you have a bad power connection and/or SATA cable.  Check the connections and cables.  I'd recommend switching them with a known good power connector and SATA cable.  Perhaps use ones that are connected to another drive that is working fine now.  Then try the precelar again.

Link to comment

Well I am by no means an expert but there are lots of I/O errors in there and it looks to me like at some point unRAID had to reset the link to the drive because of this.  Here are some of the lines from the report ...

 

Oct 29 02:47:09 Titan kernel: ata5.00: failed command: CHECK POWER MODE
Oct 29 02:47:09 Titan kernel: ata5.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0
Oct 29 02:47:09 Titan kernel:          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Oct 29 02:47:09 Titan kernel: ata5.00: status: { DRDY }
Oct 29 02:47:09 Titan kernel: ata5: hard resetting link

 

Maybe you have a bad power connection and/or SATA cable.  Check the connections and cables.  I'd recommend switching them with a known good power connector and SATA cable.  Perhaps use ones that are connected to another drive that is working fine now.  Then try the precelar again.

 

I checked the cables just now, even moved it to a different SATA port on the motherboard but unraid still throws errors about this disk when I reboot.

 

I pulled the disk out of the array and put it on my external SATA dock on my Windows 7 machine and tried to check it using CrystalDiskInfo but it didn't even see the disk.  The disk isn't showing up in the BIOS of my Windows PC at boot time either.

 

So all signs point to a bad disk.  Time to start the RMA process with Newegg.

 

Thanks for your help!

 

Seven

Link to comment

Looks like preclear failed on a brand new disk I purchased to keep on hand as a spare drive.  I've attached all of the logfiles and output I could find....  I did not find a smart_finish report in the /tmp directory. 

 

This is a new WD20EARS drive from Newegg and I did install a jumper over pins 7/8 before attaching to my server.

 

Any thoughts or suggestions would be appreciated.  Thanks!

Either that or the SATA or POWER cable to it came loose.

 

If it is the drive, it is good that it failed before you started using it for your data.  FAR easier to RMA before you put it in the array.

 

Joe L.

Link to comment

Looks like preclear failed on a brand new disk I purchased to keep on hand as a spare drive.  I've attached all of the logfiles and output I could find....  I did not find a smart_finish report in the /tmp directory. 

 

This is a new WD20EARS drive from Newegg and I did install a jumper over pins 7/8 before attaching to my server.

 

Any thoughts or suggestions would be appreciated.  Thanks!

 

 

 

Seven......

 

I experience that same exact issue with a brand new WD20EARS with jumpers on. Here is the thread i was asking if someone could help me out.http://lime-technology.com/forum/index.php?topic=8506.0

 

All the errors start showing up about 60% into the post read phase, the process just hang from there and would not proceed, so i stop it.

 

I really cannot pinpoint what the problem was, cause no one chimed in. But from looking at the syslog it might have been some kind of connection issue of some sort be it data or power.

Anyway i restarted the preclear process and it completed the process fully and the drive was ok as far as in know how to interpret the results.

 

Good luck....

Link to comment

Hey guys,

 

Just completed my first ever unRAID build, specs are in my sig. Quite pleased at the ease of use, and speed at getting this up and running.

 

I ran the preclear script on my 3 WD10EARS (jumpered) drives by opening 3 shells and running the script for each drive on a different shell, before adding them to the array and building parity for the first time. I noticed my third drive (sdc) logged some errors and I shrugged them off at the time as a couple of simple read errors, but now I'm wondering if I should have paid more attention before adding it to the array and letting it rebuild parity (sda).

 

Just wondering if anyone can put my mind at ease? :)

 

I've attached my full syslog as a ZIP in case it's needed or if anyone is just curious.

 

== Disk /dev/sdc has been successfully precleared

==

== Ran 1 preclear-disk cycle

==

== Using :Read block size = 8225280 Bytes

== Last Cycle's Pre Read Time  : 7:01:54 (59 MB/s)

== Last Cycle's Zeroing time  : 7:27:25 (55 MB/s)

== Last Cycle's Post Read Time : 16:58:26 (24 MB/s)

== Last Cycle's Total Time    : 31:28:54

==

== Total Elapsed Time 31:28:54

==

== Disk Start Temperature: 30C

==

== Current Disk Temperature: 35C,

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

19,20c19,20

< Offline data collection status:  (0x80)^IOffline data collection activity

< ^I^I^I^I^Iwas never started.

---

> Offline data collection status:  (0x84)^IOffline data collection activity

> ^I^I^I^I^Iwas suspended by an interrupting command from host.

54c54

<  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0

---

>  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      2

58c58

<  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

---

>  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

63c63

< 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      29

---

> 193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      31

============================================================================

syslog.zip

Link to comment

Hey guys,

 

Just completed my first ever unRAID build, specs are in my sig. Quite pleased at the ease of use, and speed at getting this up and running.

 

I ran the preclear script on my 3 WD10EARS (jumpered) drives by opening 3 shells and running the script for each drive on a different shell, before adding them to the array and building parity for the first time. I noticed my third drive (sdc) logged some errors and I shrugged them off at the time as a couple of simple read errors, but now I'm wondering if I should have paid more attention before adding it to the array and letting it rebuild parity (sda).

 

Just wondering if anyone can put my mind at ease? :)

 

I've attached my full syslog as a ZIP in case it's needed or if anyone is just curious.

 

== Disk /dev/sdc has been successfully precleared

==

== Ran 1 preclear-disk cycle

==

== Using :Read block size = 8225280 Bytes

== Last Cycle's Pre Read Time  : 7:01:54 (59 MB/s)

== Last Cycle's Zeroing time   : 7:27:25 (55 MB/s)

== Last Cycle's Post Read Time : 16:58:26 (24 MB/s)

== Last Cycle's Total Time     : 31:28:54

==

== Total Elapsed Time 31:28:54

==

== Disk Start Temperature: 30C

==

== Current Disk Temperature: 35C,

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

19,20c19,20

< Offline data collection status:  (0x80)^IOffline data collection activity

< ^I^I^I^I^Iwas never started.

---

> Offline data collection status:  (0x84)^IOffline data collection activity

> ^I^I^I^I^Iwas suspended by an interrupting command from host.

54c54

<   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

---

>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2

58c58

<   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0

---

>   7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

63c63

< 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       29

---

> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       31

============================================================================

All drives have raw read errors... Some report them some do not.  The actual number reported is meaningful only to the manufacturer.  Notice that the "Normalized" value of 200 is unchanged and nowhere near the failure threshold of 51.

 

Your drive looks fine.  The other changes were the values changing from the factory initialized 253 value to the starting "normalized" value of 200.

 

Joe L.

Link to comment

Disk Temperature: 28C, Elapsed Time:  18:11:33

============================================================================

==

== Disk /dev/sdc has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022   076   070   000    Old_age   Always       -       24 (Lifetime Min/Max 24/24)

---

> 190 Airflow_Temperature_Cel 0x0022   072   070   000    Old_age   Always       -       28 (Lifetime Min/Max 24/30)

78c78

< 201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

---

> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

============================================================================

 

 

This is my output of my one drive and i am not sure what i am looking for, i have read a some of the preclear thread but it's very involved.

Is there any guide to what comes out of this script and what to look for.

i hope it's not somewhere obvious cause i did look.

 

 

thanks

 

 

 

 

 

Hi Joe,

 

The preclear_disk.sh script finally completed one cycle and below is the result. I suppose it is OK, right?

 

Thanks,

--Tom

 

 

 

===========================================================================

=                unRAID server Pre-Clear disk /dev/sdj

=                      cycle 1 of 1

= Disk Pre-Clear-Read completed                                DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes            DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward.          DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4      DONE

= Step 5 of 10 - Clearing MBR code area                        DONE

= Step 6 of 10 - Setting MBR signature bytes                    DONE

= Step 7 of 10 - Setting partition 1 to precleared state        DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning  DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries            DONE

= Step 10 of 10 - Testing if the clear has been successful.    DONE

= Disk Post-Clear-Read completed                                DONE

Elapsed Time:  14:36:38

============================================================================

==

== Disk /dev/sdj has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

71c71

< 190 Airflow_Temperature_Cel 0x0022  075  075  000    Old_age  Always      -      25 (Lifetime Min/Max 25/26)

---

> 190 Airflow_Temperature_Cel 0x0022  072  072  000    Old_age  Always      -      28 (Lifetime Min/Max 25/28)

77c77

< 200 Multi_Zone_Error_Rate  0x000a  253  253  000    Old_age  Always      -      0

---

> 200 Multi_Zone_Error_Rate  0x000a  100  100  000    Old_age  Always      -      0

============================================================================

 

After 14+ hours your disk temperature went from 25C to 28C.    I'd say that in itself is not too serious  ;)  but it does say you have some serious fans. ;)

 

The S.M.A.R.T. wiki here indicates that attribute 200 is

200 C8 Write Error Rate / Multi-Zone Error Rate

The total number of errors when writing a sector.

You started with the default initialized value of 253, and after a full 14+ hour pre-clear cycle, it has a normalized value of 100.  The failure threshold is 0.  You are nowhere close to the failure threshold value, so unless it changes over time, you are fine there too.

 

Joe L.

 

 

this answered my question...

sorry if i wasted anyone's time... :-\

Link to comment

Greetings,

Last week I started to get errors on one drive in the array. There were no errors in the parity check, but the drive itself was showing errors - 67 and then 98. I replaced the drive with a new one and the data was rebuilt successfully. I put the questionable drive in an test unRaid server and did a preclear. These are the results:

============================================================================

1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      42

 

1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      78

 

7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0

 

7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0

 

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -    5673

 

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -    5674

 

197 Current_Pending_Sector  0x0032  200  197  000    Old_age  Always      -      10

 

197 Current_Pending_Sector  0x0032  200  197  000    Old_age  Always      -      1

============================================================================

Do I have any reason to be concerned?

Cheers

Link to comment

Greetings,

Last week I started to get errors on one drive in the array. There were no errors in the parity check, but the drive itself was showing errors - 67 and then 98. I replaced the drive with a new one and the data was rebuilt successfully. I put the questionable drive in an test unRaid server and did a preclear. These are the results:

============================================================================

1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       42

 

1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       78

 

7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

 

7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0

 

193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -    5673

 

193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -    5674

 

197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -      10

 

197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       1

============================================================================

Do I have any reason to be concerned?

Cheers

Probably... I would never expect to see a "Current Pending Sector" in the post-clear smart report.  That's because they should all have been identified in the pre-read phase, and re-allocated in the writing of zeros.  The post-zeroing phase should therefore not have detected any additional un-readable sectors, yet it appears it has.

 

I'd run it through another cycle or two, and see if that last sector pending re-allocation gets re-allocated and no others show themselves.

 

Sorry to say, but a continual trickle of un-readable sectors is not an indication of a healthy drive.  If you only have a few, and the number do not increment when you continue to use the drive, then you should be OK.  Otherwise, you are asking for constant random read errors.  All un-readable sectors should re-allocate themselves when you preclear a drive, as it writes to every sector on the disk.

 

 

Link to comment

Thanks for the feedback, Joe. It's a relatively new WD 1.5TB EARS (jumpered). Sounds like I should start an RMA. I certainly wouldn't feel comfortable putting it back in the array.

ok your choice, but they will not consider it failed... most disks have several thousand spare sectors, and you've apparently used NONE of them.

 

Look closer... you had 10 pending re-allocation at the start, and most, if not all were NOT re-allocated, but instead were  re-written in place in their original sectors.  (You did not show any output in the Reallocated Sector counter, so it must not have changed)

 

To me that indicates a different class of problem, one where the writing to the sector was poorly done and when re-written it worked.  That could easily be a vibration issue, or a power supply issue, with noise on the power supply leading to a poor quality written sector.  Of course it could be poor electronics in the drive itself too, but it points less towards defective magnetic surface on the platters.

Link to comment

I'm having problems preclearing a new disk i got, it starts out fine on step one but after about 10 mins it slows down to a crawl, like ~4 MB/s down from ~100/s, it only updates the screen every 5 mins or so at this point as well, it should be doing so every 10 seconds I believe.  I stopped the process and restarted a couple times, and restarted the server but it is still doing the same thing.  This isn't how it should go is it?  Syslog attached, let me know if there is another log somewhere that will help.  

 

Edit: It's been an hour now and only 3% into the pre read.

 

Edit part deux:  Now it's back up to 100/s, am I just being too impatient?

 

Getting errors now, updated syslog here http://pastebin.ca/1983708

Link to comment

I'm having problems preclearing a new disk i got, it starts out fine on step one but after about 10 mins it slows down to a crawl, like ~4 MB/s down from ~100/s, it only updates the screen every 5 mins or so at this point as well, it should be doing so every 10 seconds I believe.  I stopped the process and restarted a couple times, and restarted the server but it is still doing the same thing.  This isn't how it should go is it?  Syslog attached, let me know if there is another log somewhere that will help.  

 

Edit: It's been an hour now and only 3% into the pre read.

 

Edit part deux:  Now it's back up to 100/s, am I just being too impatient?

 

Getting errors now, updated syslog here http://pastebin.ca/1983708

You are getting tons of "media errors"  (Un-readable sectors)

If you get a smart report on the drive you'll see them as sectors pending re-allocation... ( or you can wait, the final preclear report will show them too)

 

Joe L.

Link to comment

I'll just wait, not in a huge hurry to add it and it's the weekend anyway.  Kind of interesting to see what happens with bad hard drives so I can recognize them in the future.  I'll RMA it on Monday. 

 

This is the third hard drive I have acquired for my server, but the first one I have used preclear on.  It's a good thing I did, intended to use this one as a parity drive.

Link to comment

I'll just wait, not in a huge hurry to add it and it's the weekend anyway.  Kind of interesting to see what happens with bad hard drives so I can recognize them in the future.  I'll RMA it on Monday. 

 

This is the third hard drive I have acquired for my server, but the first one I have used preclear on.  It's a good thing I did, intended to use this one as a parity drive.

You would not have been happy having that drive in your server. 

 

Yes, most drives preclear just fine... and then there are those, like yours, that show their true colors. 

 

Just think, 99.99 % of the people installing drives have no idea at all if they are readable, and most will only know when their program/and or data is subsequently unreadable.    Most will blame it on Microsoft...  ;D

 

Joe L.

Link to comment

Indeed, just glad i decided to give it a try. 

 

The preclear finally moved onto the writing to disk stage, and started putting out a different error, this error kept repeating in the syslog until it was over a GB in size, and it was only 7% done, so I had to close it out, didn't want it to crash the server. 

 

Obviously cant post it all, but its basically just this repeating over and over added onto the previous syslog

 

Nov  6 18:17:39 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Nov  6 18:17:39 Tower kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Nov  6 18:17:39 Tower kernel:         05 ef 2f 68 
Nov  6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Nov  6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 05 ef 2f 68 00 04 00 00
Nov  6 18:17:39 Tower kernel: end_request: I/O error, dev sda, sector 99561320
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: ata1: EH complete
Nov  6 18:17:39 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  6 18:17:39 Tower kernel: ata1.00: irq_stat 0x40000001
Nov  6 18:17:39 Tower kernel: ata1.00: failed command: WRITE DMA EXT
Nov  6 18:17:39 Tower kernel: ata1.00: cmd 35/00:00:68:33:ef/00:04:05:00:00/e0 tag 0 dma 524288 out
Nov  6 18:17:39 Tower kernel:          res 61/04:00:68:33:ef/4d:04:05:00:00/e0 Emask 0x1 (device error)
Nov  6 18:17:39 Tower kernel: ata1.00: status: { DRDY DF ERR }
Nov  6 18:17:39 Tower kernel: ata1.00: error: { ABRT }
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: failed to enable AA(error_mask=0x1)
Nov  6 18:17:39 Tower kernel: ata1.00: configured for UDMA/133 (device error ignored)
Nov  6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
Nov  6 18:17:39 Tower kernel: sd 1:0:0:0: [sda] Sense Key : 0xb [current] [descriptor

 

However, smart test for the drive shows up as fine with no errors, no sectors pending or reallocated.  There are a bunch of other fault codes though.  It is attached. 

smart.txt

Link to comment

Just wanted to point out that you cannot draw conclusions about whether a drives is failing by looking at the syslog. Only by seeing reallocated sectors or or failed attributes in a smart report will you know the drive itself in the problem. It is MUCH more common for syslog errors to be traced back to a cabling / backplane issue.

 

So look at your SMART report and confirm the reallocated sectors are increasing. Otherwise you may have cabling issues in addition to a suspect disk.

Link to comment

I ran preclear 2 more times - on the first, the current pending sectors dropped by one, but then this is what I got on the second:

 

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

 

54c54

1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      96

---

1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      132

63c63

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -      5682

---

193 Load_Cycle_Count        0x0032  199  199  000    Old_age  Always      -      5683

65c65

197 Current_Pending_Sector  0x0032  200  197  000    Old_age  Always      -      9

---

197 Current_Pending_Sector  0x0032  200  197  000    Old_age  Always      -      46

============================================================================

 

It seems to be getting worse. Do I just keep running preclear until the drive fails? - and then RMA it?

Cheers

Link to comment

I ran preclear 2 more times - on the first, the current pending sectors dropped by one, but then this is what I got on the second:

 

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

 

54c54

1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       96

---

1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       132

63c63

193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5682

---

193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5683

65c65

197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       9

---

197 Current_Pending_Sector  0x0032   200   197   000    Old_age   Always       -       46

============================================================================

 

It seems to be getting worse. Do I just keep running preclear until the drive fails? - and then RMA it?

Cheers

 

I would not keep running.  RMA the disk.

Link to comment

Since the sectors seem to be re-written to their existing locations (I see no change in the re-allocated sector count, only "pending sector") I would not be so quick to point the finger at the disk drive.   It could be the drive, but it could just as easily be the power supply.

 

Basically, the drive seems to be able to re-write the sectors in their existing locations.  The question is what made the "writing" of the sector un-reliable the first time?  Was it the disk itself?  vibration? noise on the power supply?  Almost impossible to tell from an outsiders point of view.

 

Since there are continual sectors pending re-allocation I'd just start an RMA stating that fact.  You'll probably never get the drive to fail a smart test, at least not in the next weeks/months.

 

Joe L.

Link to comment

The PSU's in both computers, that this HDD has been connected to, both seem to be sound. In the main unRaid unit (using a Corsair TX650W PSU) there are 11 other HDD's that seem to be free of the issues that this one has. In terms of vibration, when it was a part of the main array it was secured using silicone grommets. As you suggest, I will RMA, but include a note mentioning the continual sectors pending re-allocation.

Thanks again for the assistance.

Cheers

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.