HDD Errors - advice please?


Recommended Posts

Dear Experts,

 

Please see below an extract from the log file for a specific HDD that keeps coming up with the same errors over and over.  Is the drive faulty or should I investigate something else please?

 

Sep 18 18:27:09 GOOGOLPLEX kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1.00: failed command: IDENTIFY DEVICE (Minor Issues)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Sep 18 18:27:09 GOOGOLPLEX kernel:          res 50/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error) (Errors)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1.00: status: { DRDY } (Drive related)
Sep 18 18:27:09 GOOGOLPLEX kernel: ata1: hard resetting link (Minor Issues)
Sep 18 18:27:10 GOOGOLPLEX kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) (Drive related)
Sep 18 18:27:10 GOOGOLPLEX kernel: ata1.00: configured for UDMA/33 (Drive related)
Sep 18 18:27:10 GOOGOLPLEX kernel: ata1: EH complete (Drive related)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x280900 action 0x6 frozen (Errors)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1: SError: { UnrecovData HostInt 10B8B BadCRC } (Errors)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1.00: failed command: IDENTIFY DEVICE (Minor Issues)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in (Drive related)
Sep 18 18:31:54 GOOGOLPLEX kernel:          res 50/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error) (Errors)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1.00: status: { DRDY } (Drive related)
Sep 18 18:31:54 GOOGOLPLEX kernel: ata1: hard resetting link (Minor Issues)
Sep 18 18:31:55 GOOGOLPLEX kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) (Drive related)
Sep 18 18:31:55 GOOGOLPLEX kernel: ata1.00: configured for UDMA/33 (Drive related)
Sep 18 18:31:55 GOOGOLPLEX kernel: ata1: EH complete (Drive related)

 

SMART Report:

ATA Error Count: 171 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 171 occurred at disk power-on lifetime: 1859 hours (77 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 01 00 00 00 00 08      00:04:43.993  IDENTIFY DEVICE
  b0 da 00 00 4f c2 00 08      00:04:43.992  SMART RETURN STATUS
  b0 d1 01 01 4f c2 00 08      00:04:43.992  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 00 08      00:04:43.992  SMART READ DATA
  ec 00 01 00 00 00 00 08      00:04:43.992  IDENTIFY DEVICE

Error 170 occurred at disk power-on lifetime: 1859 hours (77 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 4f c2 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d1 01 01 4f c2 00 08      00:04:43.965  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 00 08      00:04:43.965  SMART READ DATA
  ec 00 01 00 00 00 00 08      00:04:43.965  IDENTIFY DEVICE
  e5 00 00 00 00 00 00 08      00:04:43.965  CHECK POWER MODE
  ec 00 01 00 00 00 00 08      00:04:43.965  IDENTIFY DEVICE

Error 169 occurred at disk power-on lifetime: 1859 hours (77 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 4f c2 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d1 01 01 4f c2 00 08      00:04:43.932  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 00 08      00:04:43.931  SMART READ DATA
  ec 00 01 00 00 00 00 08      00:04:43.931  IDENTIFY DEVICE
  e5 00 00 00 00 00 00 08      00:04:43.931  CHECK POWER MODE
  ec 00 01 00 00 00 00 08      00:04:43.931  IDENTIFY DEVICE

Error 168 occurred at disk power-on lifetime: 1859 hours (77 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 4f c2 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d1 01 01 4f c2 00 08      00:04:43.872  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 00 08      00:04:43.871  SMART READ DATA
  ec 00 01 00 00 00 00 08      00:04:43.871  IDENTIFY DEVICE
  e5 00 00 00 00 00 00 08      00:04:43.871  CHECK POWER MODE
  ec 00 01 00 00 00 00 08      00:04:43.871  IDENTIFY DEVICE

Error 167 occurred at disk power-on lifetime: 1859 hours (77 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 01 00 00 00 00 08      00:04:43.871  IDENTIFY DEVICE
  e5 00 00 00 00 00 00 08      00:04:43.871  CHECK POWER MODE
  ec 00 01 00 00 00 00 08      00:04:43.871  IDENTIFY DEVICE
  b0 d1 01 01 4f c2 00 08      00:04:43.869  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 01 00 4f c2 00 08      00:04:43.868  SMART READ DATA

 

Many thanks!

Link to comment
  • 2 weeks later...

Interestingly I have resolved the issue.

 

It transpires there is a problem with the backplane on my 5 drive caddy which obviously is causing SATA errors which then seems to screw up the drives or at least unRaid things it is.  Take the backplane out of the equation and link it directly to the M1015 and all is well.

 

Very strange!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.