Jump to content

Some errors during preclear


Recommended Posts

I'm currently pre-clearing a new drive, and don't quite understand the errors I'm getting.

 

Here's an example:

Aug 25 06:53:26 WorldGobbler kernel: ata15.00: cmd 60/08:00:00:ff:94/00:00:dc:00:00/40 tag 0 ncq 4096 in
Aug 25 06:53:26 WorldGobbler kernel:          res 41/40:00:00:ff:94/38:00:dc:00:00/40 Emask 0x409 (media error) <F>
Aug 25 06:53:26 WorldGobbler kernel: ata15.00: status: { DRDY ERR }
Aug 25 06:53:26 WorldGobbler kernel: ata15.00: error: { UNC }
Aug 25 06:53:26 WorldGobbler kernel: ata15.00: configured for UDMA/133
Aug 25 06:53:26 WorldGobbler kernel: ata15: EH complete
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: irq_stat 0x40000008
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: failed command: READ FPDMA QUEUED
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: cmd 60/08:00:00:ff:94/00:00:dc:00:00/40 tag 0 ncq 4096 in
Aug 25 06:53:30 WorldGobbler kernel:          res 41/40:00:00:ff:94/38:00:dc:00:00/40 Emask 0x409 (media error) <F>
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: status: { DRDY ERR }
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: error: { UNC }
Aug 25 06:53:30 WorldGobbler kernel: ata15.00: configured for UDMA/133
Aug 25 06:53:30 WorldGobbler kernel: ata15: EH complete
Aug 25 06:53:33 WorldGobbler kernel: ata15.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Aug 25 06:53:33 WorldGobbler kernel: ata15.00: irq_stat 0x40000008
Aug 25 06:53:33 WorldGobbler kernel: ata15.00: failed command: READ FPDMA QUEUED
Aug 25 06:53:33 WorldGobbler kernel: ata15.00: cmd 60/08:00:00:ff:94/00:00:dc:00:00/40 tag 0 ncq 4096 in
Aug 25 06:53:33 WorldGobbler kernel:          res 41/40:00:00:ff:94/38:00:dc:00:00/40 Emask 0x409 (media error) <F>

 

My previous preclears have been problem free so I'm at a loss as to what's going on. Full syslog is attached.

 

Are these errors something to be concerned about?

 

The drive in question is mounted in an icydock 5-in-3 and connected to the motherboard SATA controller.

syslog-2011-08-25.txt

Link to comment

OK the preclear finished, but the results look bad. Preclear report is attached.

What concerns me most is this:

6 sectors were pending re-allocation before the start of the preclear.
6 sectors were pending re-allocation after pre-read in cycle 1 of 2.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 2.
1 sector was pending re-allocation after post-read in cycle 1 of 2.
65535 sectors were pending re-allocation after zero of disk in cycle 2 of 2.
65535 sectors are pending re-allocation at the end of the preclear,
  a change of 65529 in the number of sectors pending re-allocation.
0 sectors had been re-allocated before the start of the preclear.
0 sectors are re-allocated at the end of the preclear,
  the number of sectors re-allocated did not change.

preclear_sdj.txt

Link to comment

One last question. Out of a batch of 4 drives I've had 1 drive pass 2 preclear cycles with no pending reallocation, one currently preclearing, the failure mentioned above and I assume the last one is failing. Zeroing executes fine, but when it begins reading the syslog fills up with the errors below. This persists until the syslog is 1.2GB in size and the oom-killer starts killing processes. Basically I can't preclear this drive. I'm guessing that based on the errors that this drive has to be returned for replacement. Am I wrong?

 

More importantly we are aware of the abysmal failure rate of WD20EARS; with that knowledge are people avoiding it, or are they relying on the fact that if a drive passes preclear it must make the cut? I know we preclear to weed out these problems, and I  thank Joe L. his work on the preclear script saving me from further headaches.

 

Aug 30 06:35:26 WorldGobbler kernel: end_request: I/O error, dev sdb, sector 64 (Errors)
Aug 30 06:35:26 WorldGobbler kernel: Buffer I/O error on device sdb1, logical block 0 (Errors)
Aug 30 06:35:26 WorldGobbler kernel: Buffer I/O error on device sdb1, logical block 1 (Errors)
Aug 30 06:35:26 WorldGobbler kernel: Buffer I/O error on device sdb1, logical block 2 (Errors)
Aug 30 06:35:26 WorldGobbler kernel: Buffer I/O error on device sdb1, logical block 3 (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3.00: device reported invalid CHS sector 0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: ata3: status=0x41 { DriveReady Error } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: ata3: error=0x04 { DriveStatusError } (Errors)
Aug 30 06:35:26 WorldGobbler kernel: sd 3:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08 (System)
Aug 30 06:35:26 WorldGobbler kernel: sd 3:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor] (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: Descriptor sense data with sense descriptors (in hex):
Aug 30 06:35:26 WorldGobbler kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Aug 30 06:35:26 WorldGobbler kernel:         00 00 00 00 
Aug 30 06:35:26 WorldGobbler kernel: sd 3:0:0:0: [sdb] ASC=0x0 ASCQ=0x0 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: sd 3:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 00 00 00 40 00 00 08 00 (Drive related)
Aug 30 06:35:26 WorldGobbler kernel: end_request: I/O error, dev sdb, sector 64 (Errors)

Link to comment

D'oh. I wasn't all awake last night as I posted this(as evidenced by my atrocious grammar and spelling). The errors are occurring  about an hour into the pre-clear read. The drive that passed is in the same icydock cage on the same sff8087 cable attached(obviously) to the same SASLP-MV8.

 

I suppose it would be a good idea to move the drive to a different slot before RMA'ing it. The fact that it reads for over an hour before dumping gigabytes of messages into the syslog is where I made the assumption that this drive is no good..

 

 

Link to comment

I suppose it would be a good idea to move the drive to a different slot before RMA'ing it. The fact that it reads for over an hour before dumping gigabytes of messages into the syslog is where I made the assumption that this drive is no good..

 

Or, you could equally theorize:

"It took an hour of the drive vibration before the connection on the cable or drive tray connector loosened enough to cause the disk to stop responding."

 

It definitely warrants testing in another slot.

Link to comment

Or, you could equally theorize:

"It took an hour of the drive vibration before the connection on the cable or drive tray connector loosened enough to cause the disk to stop responding."

 

 

I suppose I shouldn't have left out the fact that this is the 3rd attempt I have made to preclear this drive. It took a couple of attempts at preclearing to catch the errors before the system became unstable and unusable.The behavior has been the same all 3 times(in that it appears to run fine for more than an hour before going south). If the cable vibrated loose then would it not work the next attempt? I suppose it could just as easily be a thermal problem...

 

Anywho I have spare breakout cables on the way, and will remove that drive from the drive cage as soon as my currently preclearing drive is done. I'll try to preclear it in my eSATA dock that way I can isolate power, sata cable, and backplane as possible culprits.

 

Link to comment

Or, you could equally theorize:

"It took an hour of the drive vibration before the connection on the cable or drive tray connector loosened enough to cause the disk to stop responding."

 

 

I suppose I shouldn't have left out the fact that this is the 3rd attempt I have made to preclear this drive. It took a couple of attempts at preclearing to catch the errors before the system became unstable and unusable.The behavior has been the same all 3 times(in that it appears to run fine for more than an hour before going south). If the cable vibrated loose then would it not work the next attempt? I suppose it could just as easily be a thermal problem...

 

Anywho I have spare breakout cables on the way, and will remove that drive from the drive cage as soon as my currently preclearing drive is done. I'll try to preclear it in my eSATA dock that way I can isolate power, sata cable, and backplane as possible culprits.

 

That does change the analysis a tiny bit.  As you said, it could be heat related, or related to the specific conditions on the disks part way through the set of platters.

 

I would try a long "smart" test.  Can it pass an internal read-only test?  (make sure you disable disk-spin-down for this, since spinning down the disk will cause the "long" test to abort.  The test will probably take 4 to 6 hours, depending on the speed and size of the disk.  Once submitting the "long" test request you'll need to wait until it is complete before getting the final results in a smartctl report.

 

Joe L.

 

Joe L.

Link to comment

One last question. Out of a batch of 4 drives I've had 1 drive pass 2 preclear cycles with no pending reallocation, one currently preclearing, the failure mentioned above and I assume the last one is failing. Zeroing executes fine, but when it begins reading the syslog fills up with the errors below. This persists until the syslog is 1.2GB in size and the oom-killer starts killing processes. Basically I can't preclear this drive. I'm guessing that based on the errors that this drive has to be returned for replacement. Am I wrong?

 

I had one (might actually have been two) case where I started a preclear, checked it was going fine, went to bed and in the morning found my unRAID server had crashed out.  I suspected the error log had consumed enough memory to cause something to crash and burn.  In this case the drive (a WD20EARS) responded to a short smart test with a good report, but it would never complete a long smart test.  So I put it in my desktop PC and ran the WD diagnostic tool on it and it failed their test with a too many bad blocks error.  This is why I now run a few cycles of the WD tool's tests on new drives before putting them into my unRAID box and starting preclear.

 

Regards,

 

Stephen

Link to comment

Thanks, it hadn't yet occurred to me to try the WD tools. I'll do that as soon as I can shut down my array and pull the drives.

 

**edit**

Joe, I tried to disable spindown so I could run a smart test, but I got the following:

 spin-up: setting power-up in standby to 0 (off)
HDIO_DRIVE_CMD(powerup_in_standby) failed: Input/output error

 

I'll try again when I can remove the drive from the backplane.

 

Note to self: preclear disks in the dock, even if I have multiple open slots in a drive cage...

Link to comment

Well I think I beat the drive into the ground... I switched it to my eSATA dock and still get drive not ready errors when I try to disable spindown. Short smart tests are failing with read errors, so I think this drive is toast. If there's anything else I can try let me know, otherwise these are getting RMA'd tomorrow.

 

Thanks for all the help.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...