grandprix Posted February 28, 2014 Share Posted February 28, 2014 Thank you for viewing the post. It appears long, I know, however, I'm hopeful that I placed the information in an organized enough manner to make it a little easier on the eyes. "Priority" Diagnosis/Real Concern(s) ** Parity Drive having issues according to: System Log Excerpts: Jan 23 23:06:06 Tower kernel: ata3.00: ATA-8: TOSHIBA DT01ACA300, 43NPEWxxS, MX6OABB0, max UDMA/133 (Drive related) Jan 23 23:06:06 Tower kernel: ata3.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA (Drive related) Jan 23 23:06:06 Tower kernel: ata3.00: configured for UDMA/133 (Drive related) Feb 12 22:57:44 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors) Feb 12 22:57:44 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors) Feb 12 22:57:44 Tower kernel: ata3: SError: { HostInt Handshk } (Errors) Feb 12 22:57:44 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues) Feb 12 22:57:44 Tower kernel: ata3.00: cmd 35/00:00:88:f3:ad/00:04:a7:00:00/e0 tag 0 dma 524288 out (Drive related) Feb 12 22:57:44 Tower kernel: res 50/00:00:87:f3:ad/00:00:a7:00:00/e7 Emask 0x50 (ATA bus error) (Errors) Feb 12 22:57:44 Tower kernel: ata3.00: status: { DRDY } (Drive related) Feb 12 22:57:44 Tower kernel: ata3: hard resetting link (Minor Issues) Feb 12 22:57:44 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 12 22:57:44 Tower kernel: ata3.00: configured for UDMA/133 (Drive related) Feb 12 22:57:44 Tower kernel: ata3: EH complete (Drive related) Feb 26 21:51:10 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors) Feb 26 21:51:10 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors) Feb 26 21:51:10 Tower kernel: ata3: SError: { HostInt Handshk } (Errors) Feb 26 21:51:10 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues) Feb 26 21:51:10 Tower kernel: ata3.00: cmd 35/00:00:c0:2a:55/00:04:b9:00:00/e0 tag 0 dma 524288 out (Drive related) Feb 26 21:51:10 Tower kernel: res 50/00:00:bf:2a:55/00:00:b9:00:00/e9 Emask 0x50 (ATA bus error) (Errors) Feb 26 21:51:10 Tower kernel: ata3.00: status: { DRDY } (Drive related) Feb 26 21:51:10 Tower kernel: ata3: hard resetting link (Minor Issues) Feb 26 21:51:11 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 26 21:51:11 Tower kernel: ata3.00: configured for UDMA/133 (Drive related) Feb 26 21:51:11 Tower kernel: ata3: EH complete (Drive related) Feb 27 11:12:04 Tower kernel: ata3.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors) Feb 27 11:12:04 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error (Errors) Feb 27 11:12:04 Tower kernel: ata3: SError: { HostInt Handshk } (Errors) Feb 27 11:12:04 Tower kernel: ata3.00: failed command: WRITE DMA EXT (Minor Issues) Feb 27 11:12:04 Tower kernel: ata3.00: cmd 35/00:00:38:6e:31/00:04:c0:00:00/e0 tag 0 dma 524288 out (Drive related) Feb 27 11:12:04 Tower kernel: res 50/00:00:37:6e:31/00:00:c0:00:00/e0 Emask 0x50 (ATA bus error) (Errors) Feb 27 11:12:04 Tower kernel: ata3.00: status: { DRDY } (Drive related) Feb 27 11:12:04 Tower kernel: ata3: hard resetting link (Minor Issues) Feb 27 11:12:04 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related) Feb 27 11:12:04 Tower kernel: ata3.00: configured for UDMA/133 (Drive related) Feb 27 11:12:04 Tower kernel: ata3: EH complete (Drive related) ** SMART Report (Short) for Parity Drive: *************************************************************************** ** Statistics for /dev/sdc TOSHIBA_DT01ACA300_[b]43NPEWxxS - PARITY[/b] ** *************************************************************************** smartctl -a -d ata /dev/sdc smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: TOSHIBA DT01ACA300 Serial Number: 43NPEWxxS Firmware Version: MX6OABB0 User Capacity: 3,000,592,982,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Fri Feb 28 11:35:43 2014 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22222) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 139 139 054 Pre-fail Offline - 71 3 Spin_Up_Time 0x0007 135 135 024 Pre-fail Always - 423 (Average 425) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 120 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 124 124 020 Pre-fail Offline - 33 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 3980 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 19 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 120 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 120 194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Min/Max 15/35) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 15 SMART Error Log Version: 1 ATA Error Count: 15 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 15 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 60 e0 92 de 0a Error: ICRC, ABRT 96 sectors at LBA = 0x0ade92e0 = 182358752 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 00 40 92 de e0 08 3d+03:55:14.424 WRITE DMA EXT 25 00 00 40 a6 de e0 08 3d+03:55:14.420 READ DMA EXT 25 00 00 40 a2 de e0 08 3d+03:55:14.416 READ DMA EXT 25 00 00 40 9e de e0 08 3d+03:55:14.411 READ DMA EXT 25 00 00 40 9a de e0 08 3d+03:55:14.405 READ DMA EXT Error 14 occurred at disk power-on lifetime: 232 hours (9 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 40 80 f9 5f 02 Error: ICRC, ABRT 64 sectors at LBA = 0x025ff980 = 39844224 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 00 c0 f6 5f e0 08 3d+03:17:38.705 WRITE DMA EXT 35 00 00 c0 f2 5f e0 08 3d+03:17:38.703 WRITE DMA EXT 25 00 80 48 08 60 e0 08 3d+03:17:38.701 READ DMA EXT 25 00 00 48 04 60 e0 08 3d+03:17:38.696 READ DMA EXT 25 00 00 48 00 60 e0 08 3d+03:17:38.693 READ DMA EXT Error 13 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 40 f0 5e 2d 0a Error: ICRC, ABRT 64 sectors at LBA = 0x0a2d5ef0 = 170745584 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 00 30 5d 2d e0 08 3d+02:44:20.508 WRITE DMA EXT 35 00 00 30 59 2d e0 08 3d+02:44:20.505 WRITE DMA EXT 35 00 00 30 55 2d e0 08 3d+02:44:20.503 WRITE DMA EXT 35 00 00 30 51 2d e0 08 3d+02:44:20.501 WRITE DMA EXT 25 00 00 30 65 2d e0 08 3d+02:44:20.498 READ DMA EXT Error 12 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 c0 b0 14 54 04 Error: ICRC, ABRT 192 sectors at LBA = 0x045414b0 = 72619184 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 00 70 13 54 e0 08 3d+02:20:37.926 WRITE DMA EXT 35 00 00 70 0f 54 e0 08 3d+02:20:37.924 WRITE DMA EXT 25 00 98 e0 24 54 e0 08 3d+02:20:37.923 READ DMA EXT 25 00 00 e0 20 54 e0 08 3d+02:20:37.921 READ DMA EXT 25 00 00 e0 1c 54 e0 08 3d+02:20:37.919 READ DMA EXT Error 11 occurred at disk power-on lifetime: 231 hours (9 days + 15 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 80 b0 dc b1 03 Error: ICRC, ABRT 128 sectors at LBA = 0x03b1dcb0 = 61988016 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 35 00 00 30 da b1 e0 08 3d+02:18:08.927 WRITE DMA EXT 35 00 00 30 d6 b1 e0 08 3d+02:18:08.925 WRITE DMA EXT 25 00 00 30 ea b1 e0 08 3d+02:18:08.921 READ DMA EXT 25 00 00 30 e6 b1 e0 08 3d+02:18:08.917 READ DMA EXT 25 00 00 30 e2 b1 e0 08 3d+02:18:08.913 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 3980 - # 2 Short offline Completed without error 00% 3127 - # 3 Short offline Completed without error 00% 3127 - # 4 Short offline Completed without error 00% 3127 - # 5 Short offline Completed without error 00% 3127 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ** SMART Report (Short) for another problem Toshiba (nothing in logs mentioned that I saw): Attached (went beyond maximum characters) - haven't looked to see if it is on the same controller or not (perhaps you can tell by the system log?). It's on a rack, without rails, just haven't gotten to it just yet. General Diagnosis/Curiosities: Curious to this: Jan 23 23:06:06 Tower kernel: acpi PNP0A03:00: ACPI _OSC support notification failed, disabling PCIe ASPM (Minor Issues) Jan 23 23:06:06 Tower kernel: acpi PNP0A03:00: Unable to request _OSC control (_OSC support mask: 0x08) IDE Disabled in BIOS but why this: Jan 23 23:06:06 Tower kernel: atiixp 0000:00:14.1: simplex device: DMA disabled (Errors) Jan 23 23:06:06 Tower kernel: ide1: DMA disabled (Errors) syslog02-28-2014.txt other_problematic_toshiba.txt Link to comment
dgaschk Posted February 28, 2014 Share Posted February 28, 2014 Try a new SATA cable. Link to comment
Superorb Posted March 1, 2014 Share Posted March 1, 2014 I have those same 2 errors as well eevry time I boot the machine. Feb 28 18:11:34 unRAID kernel: atiixp 0000:00:14.1: simplex device: DMA disabled (Errors) Feb 28 18:11:34 unRAID kernel: ide1: DMA disabled (Errors) I removed the drives from the hot-swap cages and copied tons of data to/from the disks and I haven't had any of the reset SATA connections since. I was copying directly to each drive (\\UNRAID\disk1, \\UNRAID\disk2, etc). When they were in the cage they would always begin to show SATA reset errors. I did get the errors on ATA1 (Seagate NAS 4TB) and ATA3 (old Hitachi 2TB, when Hitachi was still made by Hitachi). I bought new cables and used old cables, but the errors persisted. I changed the cables at least 6 times and the problem still persisted, so that's when I removed them from the cage which ended the ATA errors. So, try some new cables, and if that doesn't work remove them from the hot-swap cage and connect them directly. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.