Red balled drive, first use after preclear

November 4, 201411 yr

Hi everyone,

I had just entered into my array a precleared drive and left it overnight while copying data to it. In the morning it was redballed.

I noticed this error that coincides with the time around which I put the disk into the array:

Nov  4 01:07:53 towerS emhttp: disk9 mount error: 32 (Errors)

But then there are no more errors while files are being copied. After 7 hours there is a flood of errors (see attached file). The errors are of this type:

Nov  4 08:33:31 towerS kernel: sas: sas_eh_handle_sas_errors: task 0xf4260e00 is aborted (Errors)
Nov  4 08:33:31 towerS kernel: sas: ata7: end_device-5:2: cmd error handler (Errors)
Nov  4 08:33:31 towerS kernel: ata7.00: exception Emask 0x0 SAct 0x80 SErr 0x0 action 0x6 frozen (Errors)
Nov  4 08:33:31 towerS kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) (Errors)
Nov  4 08:33:36 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors)
Nov  4 08:33:36 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x11) (Errors)
Nov  4 08:33:44 towerS kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x5) (Errors)
Nov  4 08:33:49 towerS kernel: sas: sas_ata_task_done: SAS error 8a (Errors)
Nov  4 08:33:49 towerS kernel: ata7.00: disabled (Errors)
Nov  4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors)
Nov  4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1969226936 (Errors)
Nov  4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors)
Nov  4 08:33:49 towerS kernel: end_request: I/O error, dev sdh, sector 1970817768 (Errors)
Nov  4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors)
Nov  4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors)
Nov  4 08:33:49 towerS kernel: sd 5:0:2:0: [sdh] Unhandled error code (Errors)

and (predominantly) of this type

Nov  4 08:33:49 towerS kernel: md: disk9 read error, sector=1969226872 (Errors)
.
.
.
Nov  4 08:33:49 towerS kernel: md: disk9 write error, sector=1970817784 (Errors)
.
.
.

The disk is on a SASLP-MV8 card using a SAS-to-4SATA cable. The same cable connect 2 more drives without problems. Additionally it is this kind of cables that have a clip so I doubt it was moved.

Any suggestions as to whether this is a hard disk problem or something else?

red_ball_4_nov_2014.txt

Quote

November 4, 201411 yr

The most common reason for red-balled drives is cabling problems. You cannot say that because your cable is working fine for 2 drives that it is working fine for this one.

The smart report would tell us whether the drive itself if failing or whether there is some issue with its connection to the server.

Quote

November 4, 201411 yr

Author

I get this:

smartctl -a -d ata /dev/sdh
smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

Read Device Identity failed: Input/output error

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

I guess the drive stopped responding and I need to reboot in order to check (I am currently copying data that exists in this disk - as it is virtualized by the rest if the array actually - back to my PC and then I will reboot). But does fact that it does not respond imply something about the nature of the problem?

Quote

November 4, 201411 yr

try

smartctl -a -A /dev/sdh

Quote

November 4, 201411 yr

Author

try

smartctl -a -A /dev/sdh

I did:

smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               /5:0:2:0
Product:
User Capacity:        600,332,565,813,390,450 bytes [600 PB]
Logical block size:   774843950 bytes
Physical block size:  3099375800 bytes
Lowest aligned LBA:   14896
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

Seems I have a 600Petabyte HDD

btw, smart works for my other drives

Quote

November 4, 201411 yr

I was working on a similar problem last night. The cable was loose. It's a locking cable, but I guess I didn't lock it properly... I had the same "IEC mode page" error, which seemed to indicate a communications failure with the disk.

Quote

November 4, 201411 yr

Normally the cabling is not so bad that the smart report won't run!

But re-securing the cable would be my first step. You could try connecting this disk to a known good port / cable.

Quote

November 5, 201411 yr

Author

I rebooted the PC after securing the files of the problematic disk.

The smart tatus does not show any problems:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   154   130   021    Pre-fail  Always       -       9258
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       663
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8089
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       234
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       132
193 Load_Cycle_Count        0x0032   195   195   000    Old_age   Always       -       16118
194 Temperature_Celsius     0x0022   114   105   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

I will run a preclear cycle and get back to you (it takes long, it is a SATA I Mobo)

Quote

November 9, 201411 yr

Author

So I ran a preclear cycle with no problems. I will try to write some data after a parity check, which in my SATA I Mobo takes more than 25hrs even though all disks are 2TB only

But I think everything is OK. It still is strange that the drive stopped responding but probably a little mystery in our life is a good thing

Quote

Red balled drive, first use after preclear

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)