ptr78 Posted November 21, 2019 Share Posted November 21, 2019 Hi, Got a notification that stated that: "Array has 1 disk with read errors". This happened during a parity check. From the diagnostics. Syslog entries from the time that the errors happened: Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 Sense Key : 0x3 [current] [descriptor] Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 ASC=0x11 ASCQ=0x0 Nov 21 15:14:47 Tower kernel: sd 9:0:0:0: [sdh] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 3c a9 8c 60 00 00 04 00 00 00 Nov 21 15:14:47 Tower kernel: print_req_error: critical medium error, dev sdh, sector 5312712464 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712400 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712408 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712416 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712424 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712432 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712440 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712448 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712456 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712464 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712472 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712480 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712488 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712496 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712504 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712512 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712520 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712528 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712536 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712544 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712552 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712560 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712568 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712576 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712584 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712592 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712600 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712608 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712616 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712624 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712632 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712640 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712648 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712656 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712664 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712672 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712680 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712688 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712696 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712704 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712712 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712720 Nov 21 15:14:47 Tower kernel: md: disk4 read error, sector=5312712728 Nov 21 15:15:01 Tower sSMTP[21826]: Creating SSL connection to host Nov 21 15:15:01 Tower sSMTP[21826]: SSL connection using TLS_AES_256_GCM_SHA384 Nov 21 15:15:04 Tower sSMTP[21826]: Sent mail for [email protected] (221 2.0.0 closing connection e27sm1387940lfb.79 - gsmtp) uid=0 username=xxx outbytes=786 Smart report about the error: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 9 3 Spin_Up_Time POS--K 170 164 021 - 6500 4 Start_Stop_Count -O--CK 072 072 000 - 28062 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 055 055 000 - 32973 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 083 083 000 - 17784 192 Power-Off_Retract_Count -O--CK 200 200 000 - 38 193 Load_Cycle_Count -O--CK 191 191 000 - 28024 194 Temperature_Celsius -O---K 122 108 000 - 28 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 100 253 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 1 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 1 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 1 [0] occurred at disk power-on lifetime: 32968 hours (1373 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 3c a9 8f 10 40 00 Error: UNC at LBA = 0x13ca98f10 = 5312712464 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 00 00 01 3c a9 8c 60 40 00 40d+04:22:09.241 READ FPDMA QUEUED 60 04 00 00 00 00 01 3c a9 88 60 40 00 40d+04:22:09.234 READ FPDMA QUEUED 60 03 68 00 00 00 01 3c a9 84 f8 40 00 40d+04:22:09.229 READ FPDMA QUEUED 60 04 00 00 00 00 01 3c a9 80 f8 40 00 40d+04:22:09.222 READ FPDMA QUEUED 60 00 98 00 00 00 01 3c a9 80 60 40 00 40d+04:22:09.221 READ FPDMA QUEUED The disk is quite old but I was hoping to utilize it a bit longer. Does this seem bad? I am planning on changing the SATA cables with another disk to check that if the cable is to blame. Also, I plan to run file system check, extended smart tests and new parity check. Anything else that I should do? Thank you for any help! Quote Link to comment
JorgeB Posted November 21, 2019 Share Posted November 21, 2019 It was a disk problem (UNC @ LBA), these errors can sometimes be intermittent but are never a good sign, you can run an extended SMART test and if OK keep monitoring the disk, if it fails the disk needs replacing. 1 Quote Link to comment
ptr78 Posted November 21, 2019 Author Share Posted November 21, 2019 34 minutes ago, johnnie.black said: It was a disk problem (UNC @ LBA), these errors can sometimes be intermittent but are never a good sign, you can run an extended SMART test and if OK keep monitoring the disk, if it fails the disk needs replacing. Thank you for the very fast reply. I'll do that. Actually, I examined the syslog and saw something else also. A lot of this kind of rows: "Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1644746976". There are about 60 of them from the last 40 days of operation. Often there are 3-5 from the same day and then several days nothing. The disk is an old one and I use it only for temporary storage purposes, so I can just change it if it fails. But what does those errors mean? That is, is it possible that some data corruption has happened or do those lines mean that a write has failed and the OS has retried and succeeded? Quote Link to comment
JorgeB Posted November 21, 2019 Share Posted November 21, 2019 Best to post the complete diags, it helps seeing the errors in context. Quote Link to comment
ptr78 Posted November 21, 2019 Author Share Posted November 21, 2019 Here is the latest one: Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#1 CDB: opcode=0x28 28 00 5c 09 48 98 00 00 08 00 Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1544112280 Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 Sense Key : 0x2 [current] Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 ASC=0x4 ASCQ=0x2 Nov 20 23:30:46 Tower kernel: sd 9:0:1:0: [sdi] tag#2 CDB: opcode=0x28 28 00 62 08 d8 e0 00 00 60 00 Nov 20 23:30:46 Tower kernel: print_req_error: I/O error, dev sdi, sector 1644746976 This is the first one from Oct 20: Oct 20 14:33:43 Tower kernel: sd 9:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 Oct 20 14:33:43 Tower kernel: sd 9:0:1:0: [sdi] tag#0 CDB: opcode=0x28 28 00 00 01 bf a8 00 00 08 00 Oct 20 14:33:43 Tower kernel: print_req_error: I/O error, dev sdi, sector 114600 Quote Link to comment
JorgeB Posted November 21, 2019 Share Posted November 21, 2019 I meant the diagnostics: Tools -> Diagnostics Quote Link to comment
ptr78 Posted November 21, 2019 Author Share Posted November 21, 2019 (edited) Ah, ok. Please, see the attacment. Edited November 21, 2019 by ptr78 Quote Link to comment
JorgeB Posted November 21, 2019 Share Posted November 21, 2019 Disk looks fine and those errors are likely spin down related, it appears to happen sometimes when they are on an LSI HBA, see for example here for a workaround. 1 Quote Link to comment
ptr78 Posted November 21, 2019 Author Share Posted November 21, 2019 Thank you again for your excellent help! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.