November 4, 201411 yr Hi everyone, I had just entered into my array a precleared drive and left it overnight while copying data to it. In the morning it was redballed. I noticed this error that coincides with the time around which I put the disk into the array: Nov 4 01:07:53 towerS emhttp: disk9 mount error: 32 (Errors) But then there are no more errors while files are being copied. After 7 hours there is a flood of errors (see attached file). The errors are of this type: Nov 4 08:33:31 towerS kernel: sas: sas_eh_handle_sas_errors: task 0xf4260e00 is aborted (Errors) Nov 4 08:33:31 towerS kernel: sas: ata7: end_device-5:2: cmd error handler (Errors) Nov 4 08:33:31 towerS kernel: ata7.00: exception Emask 0x0 SAct 0x80 SErr 0x0 action 0x6 frozen (Errors) Nov 4 08:33:31 towerS kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) (Errors) Nov 4 08:33:36 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Nov 4 08:33:36 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x11) (Errors) Nov 4 08:33:44 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x5) (Errors) Nov 4 08:33:49 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors) Nov 4 08:33:49 towerS kernel: ata7.00: disabled (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1969226936 (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1970817768 (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) Nov 4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors) and (predominantly) of this type Nov 4 08:33:49 towerS kernel: md: disk9 read error, sector=1969226872 (Errors) . . . Nov 4 08:33:49 towerS kernel: md: disk9 write error, sector=1970817784 (Errors) . . . The disk is on a SASLP-MV8 card using a SAS-to-4SATA cable. The same cable connect 2 more drives without problems. Additionally it is this kind of cables that have a clip so I doubt it was moved. Any suggestions as to whether this is a hard disk problem or something else? red_ball_4_nov_2014.txt
November 4, 201411 yr The most common reason for red-balled drives is cabling problems. You cannot say that because your cable is working fine for 2 drives that it is working fine for this one. The smart report would tell us whether the drive itself if failing or whether there is some issue with its connection to the server.
November 4, 201411 yr Author I get this: smartctl -a -d ata /dev/sdh smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Read Device Identity failed: Input/output error A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. I guess the drive stopped responding and I need to reboot in order to check (I am currently copying data that exists in this disk - as it is virtualized by the rest if the array actually - back to my PC and then I will reboot). But does fact that it does not respond imply something about the nature of the problem?
November 4, 201411 yr Author try smartctl -a -A /dev/sdh I did: smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: /5:0:2:0 Product: User Capacity: 600,332,565,813,390,450 bytes [600 PB] Logical block size: 774843950 bytes Physical block size: 3099375800 bytes Lowest aligned LBA: 14896 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46 >> Terminate command early due to bad response to IEC mode page A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. Seems I have a 600Petabyte HDD btw, smart works for my other drives
November 4, 201411 yr I was working on a similar problem last night. The cable was loose. It's a locking cable, but I guess I didn't lock it properly... I had the same "IEC mode page" error, which seemed to indicate a communications failure with the disk.
November 4, 201411 yr Normally the cabling is not so bad that the smart report won't run! But re-securing the cable would be my first step. You could try connecting this disk to a known good port / cable.
November 5, 201411 yr Author I rebooted the PC after securing the files of the problematic disk. The smart tatus does not show any problems: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 154 130 021 Pre-fail Always - 9258 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 663 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8089 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 234 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 132 193 Load_Cycle_Count 0x0032 195 195 000 Old_age Always - 16118 194 Temperature_Celsius 0x0022 114 105 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 I will run a preclear cycle and get back to you (it takes long, it is a SATA I Mobo)
November 9, 201411 yr Author So I ran a preclear cycle with no problems. I will try to write some data after a parity check, which in my SATA I Mobo takes more than 25hrs even though all disks are 2TB only But I think everything is OK. It still is strange that the drive stopped responding but probably a little mystery in our life is a good thing
Archived
This topic is now archived and is closed to further replies.