(SOLVED) Array has 1 disk with read errors


ptr78

Recommended Posts

Hi,

 

Got a notification that stated that: "Array has 1 disk with read errors". This happened during a parity check.

 

From the diagnostics.

Syslog entries from the time that the errors happened:

Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 Sense Key : 0x3 [current] [descriptor] 
Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 ASC=0x11 ASCQ=0x0 
Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 3c a9 8c 60 00 00 04 00 00 00
Nov 21 15:14:47 Tower kernel: print_req_error: critical medium error, dev sdh, sector 5312712464
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712400
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712408
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712416
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712424
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712432
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712440
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712448
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712456
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712464
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712472
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712480
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712488
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712496
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712504
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712512
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712520
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712528
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712536
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712544
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712552
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712560
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712568
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712576
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712584
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712592
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712600
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712608
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712616
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712624
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712632
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712640
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712648
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712656
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712664
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712672
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712680
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712688
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712696
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712704
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712712
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712720
Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712728
Nov 21 15:15:01 Tower sSMTP[21826]: Creating SSL connection to host
Nov 21 15:15:01 Tower sSMTP[21826]: SSL connection using TLS_AES_256_GCM_SHA384
Nov 21 15:15:04 Tower sSMTP[21826]: Sent mail for email@removed.com (221 2.0.0 closing connection e27sm1387940lfb.79 - gsmtp) uid=0 username=xxx outbytes=786

 

Smart report about the error:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    9
  3 Spin_Up_Time            POS--K   170   164   021    -    6500
  4 Start_Stop_Count        -O--CK   072   072   000    -    28062
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   055   055   000    -    32973
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   083   083   000    -    17784
192 Power-Off_Retract_Count -O--CK   200   200   000    -    38
193 Load_Cycle_Count        -O--CK   191   191   000    -    28024
194 Temperature_Celsius     -O---K   122   108   000    -    28
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning


SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 32968 hours (1373 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 3c a9 8f 10 40 00  Error: UNC at LBA = 0x13ca98f10 = 5312712464

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 00 00 01 3c a9 8c 60 40 00 40d+04:22:09.241  READ FPDMA QUEUED
  60 04 00 00 00 00 01 3c a9 88 60 40 00 40d+04:22:09.234  READ FPDMA QUEUED
  60 03 68 00 00 00 01 3c a9 84 f8 40 00 40d+04:22:09.229  READ FPDMA QUEUED
  60 04 00 00 00 00 01 3c a9 80 f8 40 00 40d+04:22:09.222  READ FPDMA QUEUED
  60 00 98 00 00 00 01 3c a9 80 60 40 00 40d+04:22:09.221  READ FPDMA QUEUED

 

The disk is quite old but I was hoping to utilize it a bit longer. Does this seem bad?

 

I am planning on changing the SATA cables with another disk to check that if the cable is to blame. Also, I plan to run file system check, extended smart tests and new parity check. Anything else that I should do?

 

Thank you for any help!

Link to comment
34 minutes ago, johnnie.black said:

It was a disk problem (UNC @ LBA), these errors can sometimes be intermittent but are never a good sign, you can run an extended SMART test and if OK keep monitoring the disk, if it fails the disk needs replacing.

Thank you for the very fast reply. I'll do that.

 

Actually, I examined the syslog and saw something else also. A lot of this kind of rows: "Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1644746976". There are about 60 of them from the last 40 days of operation. Often there are 3-5 from the same day and then several days nothing. The disk is an old one and I use it only for temporary storage purposes, so I can just change it if it fails. But what does those errors mean? That is, is it possible that some data corruption has happened or do those lines mean that a write has failed and the OS has retried and succeeded?

Link to comment

Here is the latest one:

Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#1 CDB: opcode=0x28 28 00 5c 09 48 98 00 00 08 00
Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1544112280
Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 Sense Key : 0x2 [current] 
Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 ASC=0x4 ASCQ=0x2 
Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 CDB: opcode=0x28 28 00 62 08 d8 e0 00 00 60 00
Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1644746976

This is the first one from Oct 20:

Oct 20 14:33:43 Tower kernel: sd 9:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00
Oct 20 14:33:43 Tower kernel: sd 9:0:1:0: [sdi] tag#0 CDB: opcode=0x28 28 00 00 01 bf a8 00 00 08 00
Oct 20 14:33:43 Tower kernel: print_req_error: I/O error, dev sdi, sector 114600

 

Link to comment
  • ptr78 changed the title to (SOLVED) Array has 1 disk with read errors

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.