Jump to content

Drive Taking a Dump? - General Diagnosis


Recommended Posts

Thank you for viewing the post.  It appears long, I know, however, I'm hopeful that I placed the information in an organized enough manner to make it a little easier on the eyes.

 

 

"Priority" Diagnosis/Real Concern(s)

 

** Parity Drive having issues according to:

 

System Log Excerpts:

Jan 23 23:06:06 Tower kernel: ata3.00: ATA-8: TOSHIBA DT01ACA300,            43NPEWxxS, MX6OABB0, max UDMA/133 (Drive related)
Jan 23 23:06:06 Tower kernel: ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA (Drive related)
Jan 23 23:06:06 Tower kernel: ata3.00: configured for UDMA/133 (Drive related)

 

Feb 12 22:57:44 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors)
Feb 12 22:57:44 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors)
Feb 12 22:57:44 Tower kernel: ata3: SError: { HostInt Handshk } (Errors)
Feb 12 22:57:44 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues)
Feb 12 22:57:44 Tower kernel: ata3.00: cmd 35/00:00:88:f3:ad/00:04:a7:00:00/e0 tag 0 dma 524288 out (Drive related)
Feb 12 22:57:44 Tower kernel:          res 50/00:00:87:f3:ad/00:00:a7:00:00/e7 Emask 0x50 (ATA bus error) (Errors)
Feb 12 22:57:44 Tower kernel: ata3.00: status: { DRDY } (Drive related)
Feb 12 22:57:44 Tower kernel: ata3: hard resetting link (Minor Issues)
Feb 12 22:57:44 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 12 22:57:44 Tower kernel: ata3.00: configured for UDMA/133 (Drive related)
Feb 12 22:57:44 Tower kernel: ata3: EH complete (Drive related)

 

Feb 26 21:51:10 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors)
Feb 26 21:51:10 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors)
Feb 26 21:51:10 Tower kernel: ata3: SError: { HostInt Handshk } (Errors)
Feb 26 21:51:10 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues)
Feb 26 21:51:10 Tower kernel: ata3.00: cmd 35/00:00:c0:2a:55/00:04:b9:00:00/e0 tag 0 dma 524288 out (Drive related)
Feb 26 21:51:10 Tower kernel:          res 50/00:00:bf:2a:55/00:00:b9:00:00/e9 Emask 0x50 (ATA bus error) (Errors)
Feb 26 21:51:10 Tower kernel: ata3.00: status: { DRDY } (Drive related)
Feb 26 21:51:10 Tower kernel: ata3: hard resetting link (Minor Issues)
Feb 26 21:51:11 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 26 21:51:11 Tower kernel: ata3.00: configured for UDMA/133 (Drive related)
Feb 26 21:51:11 Tower kernel: ata3: EH complete (Drive related)

 

Feb 27 11:12:04 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors)
Feb 27 11:12:04 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors)
Feb 27 11:12:04 Tower kernel: ata3: SError: { HostInt Handshk } (Errors)
Feb 27 11:12:04 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues)
Feb 27 11:12:04 Tower kernel: ata3.00: cmd 35/00:00:38:6e:31/00:04:c0:00:00/e0 tag 0 dma 524288 out (Drive related)
Feb 27 11:12:04 Tower kernel:          res 50/00:00:37:6e:31/00:00:c0:00:00/e0 Emask 0x50 (ATA bus error) (Errors)
Feb 27 11:12:04 Tower kernel: ata3.00: status: { DRDY } (Drive related)
Feb 27 11:12:04 Tower kernel: ata3: hard resetting link (Minor Issues)
Feb 27 11:12:04 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb 27 11:12:04 Tower kernel: ata3.00: configured for UDMA/133 (Drive related)
Feb 27 11:12:04 Tower kernel: ata3: EH complete (Drive related)

 

 

** SMART Report (Short) for Parity Drive:

 

***************************************************************************
** Statistics for /dev/sdc TOSHIBA_DT01ACA300_[b]43NPEWxxS - PARITY[/b]  **
***************************************************************************
smartctl -a -d ata /dev/sdc
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA DT01ACA300
Serial Number:    43NPEWxxS
Firmware Version: MX6OABB0
User Capacity:    3,000,592,982,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Feb 28 11:35:43 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
				was aborted by an interrupting command from host.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		 (22222) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				No Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 255) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
				SCT Error Recovery Control supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
  3 Spin_Up_Time            0x0007   135   135   024    Pre-fail  Always       -       423 (Average 425)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       120
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   124   124   020    Pre-fail  Offline      -       33
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       3980
10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       19
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       120
194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Min/Max 15/35)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       15

SMART Error Log Version: 1
ATA Error Count: 15 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 15 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 60 e0 92 de 0a  Error: ICRC, ABRT 96 sectors at LBA = 0x0ade92e0 = 182358752

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 40 92 de e0 08   3d+03:55:14.424  WRITE DMA EXT
  25 00 00 40 a6 de e0 08   3d+03:55:14.420  READ DMA EXT
  25 00 00 40 a2 de e0 08   3d+03:55:14.416  READ DMA EXT
  25 00 00 40 9e de e0 08   3d+03:55:14.411  READ DMA EXT
  25 00 00 40 9a de e0 08   3d+03:55:14.405  READ DMA EXT

Error 14 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 40 80 f9 5f 02  Error: ICRC, ABRT 64 sectors at LBA = 0x025ff980 = 39844224

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 c0 f6 5f e0 08   3d+03:17:38.705  WRITE DMA EXT
  35 00 00 c0 f2 5f e0 08   3d+03:17:38.703  WRITE DMA EXT
  25 00 80 48 08 60 e0 08   3d+03:17:38.701  READ DMA EXT
  25 00 00 48 04 60 e0 08   3d+03:17:38.696  READ DMA EXT
  25 00 00 48 00 60 e0 08   3d+03:17:38.693  READ DMA EXT

Error 13 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 40 f0 5e 2d 0a  Error: ICRC, ABRT 64 sectors at LBA = 0x0a2d5ef0 = 170745584

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 30 5d 2d e0 08   3d+02:44:20.508  WRITE DMA EXT
  35 00 00 30 59 2d e0 08   3d+02:44:20.505  WRITE DMA EXT
  35 00 00 30 55 2d e0 08   3d+02:44:20.503  WRITE DMA EXT
  35 00 00 30 51 2d e0 08   3d+02:44:20.501  WRITE DMA EXT
  25 00 00 30 65 2d e0 08   3d+02:44:20.498  READ DMA EXT

Error 12 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 c0 b0 14 54 04  Error: ICRC, ABRT 192 sectors at LBA = 0x045414b0 = 72619184

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 70 13 54 e0 08   3d+02:20:37.926  WRITE DMA EXT
  35 00 00 70 0f 54 e0 08   3d+02:20:37.924  WRITE DMA EXT
  25 00 98 e0 24 54 e0 08   3d+02:20:37.923  READ DMA EXT
  25 00 00 e0 20 54 e0 08   3d+02:20:37.921  READ DMA EXT
  25 00 00 e0 1c 54 e0 08   3d+02:20:37.919  READ DMA EXT

Error 11 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 80 b0 dc b1 03  Error: ICRC, ABRT 128 sectors at LBA = 0x03b1dcb0 = 61988016

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 00 30 da b1 e0 08   3d+02:18:08.927  WRITE DMA EXT
  35 00 00 30 d6 b1 e0 08   3d+02:18:08.925  WRITE DMA EXT
  25 00 00 30 ea b1 e0 08   3d+02:18:08.921  READ DMA EXT
  25 00 00 30 e6 b1 e0 08   3d+02:18:08.917  READ DMA EXT
  25 00 00 30 e2 b1 e0 08   3d+02:18:08.913  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3980         -
# 2  Short offline       Completed without error       00%      3127         -
# 3  Short offline       Completed without error       00%      3127         -
# 4  Short offline       Completed without error       00%      3127         -
# 5  Short offline       Completed without error       00%      3127         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

** SMART Report (Short) for another problem Toshiba (nothing in logs mentioned that I saw):

 

Attached (went beyond maximum characters) - haven't looked to see if it is on the same controller or not (perhaps you can tell by the system log?).  It's on a rack, without rails, just haven't gotten to it just yet.

 

 

 

General Diagnosis/Curiosities:

 

Curious to this:

Jan 23 23:06:06 Tower kernel: acpi PNP0A03:00: ACPI _OSC support notification failed, disabling PCIe ASPM (Minor Issues)
Jan 23 23:06:06 Tower kernel: acpi PNP0A03:00: Unable to request _OSC control (_OSC support mask: 0x08)

 

IDE Disabled in BIOS but why this:

Jan 23 23:06:06 Tower kernel: atiixp 0000:00:14.1: simplex device: DMA disabled (Errors)
Jan 23 23:06:06 Tower kernel: ide1: DMA disabled (Errors)

 

syslog02-28-2014.txt

other_problematic_toshiba.txt

Link to comment

I have those same 2 errors as well eevry time I boot the machine.

 

Feb 28 18:11:34 unRAID kernel: atiixp 0000:00:14.1: simplex device: DMA disabled (Errors)
Feb 28 18:11:34 unRAID kernel: ide1: DMA disabled (Errors)

 

I removed the drives from the hot-swap cages and copied tons of data to/from the disks and I haven't had any of the reset SATA connections since. I was copying directly to each drive (\\UNRAID\disk1, \\UNRAID\disk2, etc). When they were in the cage they would always begin to show SATA reset errors. I did get the errors on ATA1 (Seagate NAS 4TB) and ATA3 (old Hitachi 2TB, when Hitachi was still made by Hitachi). I bought new cables and used old cables, but the errors persisted. I changed the cables at least 6 times and the problem still persisted, so that's when I removed them from the cage which ended the ATA errors.

 

So, try some new cables, and if that doesn't work remove them from the hot-swap cage and connect them directly.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...