dalmaar Posted December 12, 2016 Share Posted December 12, 2016 Hello, looking for some advice on how to proceed after parity check. I have never gotten any kind of errors on this unraid box until now. I did a parity check (with write corrections) and while it finished close to previous parity check times, it finished with sync errors. Last checked on Sun 11 Dec 2016 10:10:08 PM CST (today), finding 988 errors. Duration: 15 hours, 31 minutes, 38 seconds. Average speed: 143.1 MB/s after examining the logfile, I can see what might be a more serious issue? After a bunch of "correcting parity" logs entries followed by a read error block (see snippet below), this happened one more time and near at the end of the parity check. ---------------------------------------------------------------------------------- (Begin LOGFILE SNIPPET) . . Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671128 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671136 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671144 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671152 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671160 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671168 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671176 Dec 11 21:56:05 Tower2 kernel: md: correcting parity, stopped logging Dec 11 21:56:11 Tower2 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 11 21:56:11 Tower2 kernel: ata5.00: irq_stat 0x40000001 Dec 11 21:56:11 Tower2 kernel: ata5.00: failed command: READ DMA EXT Dec 11 21:56:11 Tower2 kernel: ata5.00: cmd 25/00:90:10:b8:dc/00:03:9a:03:00/e0 tag 5 dma 466944 in Dec 11 21:56:11 Tower2 kernel: res 51/40:00:c8:b8:dc/00:00:9a:03:00/00 Emask 0x9 (media error) Dec 11 21:56:11 Tower2 kernel: ata5.00: status: { DRDY ERR } Dec 11 21:56:11 Tower2 kernel: ata5.00: error: { UNC } Dec 11 21:56:11 Tower2 kernel: ata5.00: configured for UDMA/133 Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 Sense Key : 0x3 [current] [descriptor] Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 ASC=0x11 ASCQ=0x4 Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 CDB: opcode=0x88 88 00 00 00 00 03 9a dc b8 10 00 00 03 90 00 00 Dec 11 21:56:11 Tower2 kernel: blk_update_request: I/O error, dev sdf, sector 15483058376 Dec 11 21:56:11 Tower2 kernel: ata5: EH complete Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058312 Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058320 Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058328 Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058336 . . Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483059032 Dec 11 21:56:14 Tower2 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 11 21:56:14 Tower2 kernel: ata5.00: irq_stat 0x40000001 Dec 11 21:56:14 Tower2 kernel: ata5.00: failed command: READ DMA EXT Dec 11 21:56:14 Tower2 kernel: ata5.00: cmd 25/00:40:a0:bb:dc/00:05:9a:03:00/e0 tag 8 dma 688128 in Dec 11 21:56:14 Tower2 kernel: res 51/40:00:a0:bb:dc/00:00:9a:03:00/00 Emask 0x9 (media error) Dec 11 21:56:14 Tower2 kernel: ata5.00: status: { DRDY ERR } Dec 11 21:56:14 Tower2 kernel: ata5.00: error: { UNC } Dec 11 21:56:14 Tower2 kernel: ata5.00: configured for UDMA/133 Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 Sense Key : 0x3 [current] [descriptor] Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 ASC=0x11 ASCQ=0x4 Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 CDB: opcode=0x88 88 00 00 00 00 03 9a dc bb a0 00 00 05 40 00 00 Dec 11 21:56:14 Tower2 kernel: blk_update_request: I/O error, dev sdf, sector 15483059104 Dec 11 21:56:14 Tower2 kernel: md: disk4 read error, sector=15483059040 . . Dec 11 22:10:08 Tower2 kernel: md: sync done. time=55898sec Dec 11 22:10:08 Tower2 kernel: md: recovery thread sync completion status: 0 (End LOGFILE SNIPPET) ------------------------------------------------------------------------------------- The SMART report on drive 4 shows 0 reallocated and pending sectors. 5 Reallocated sector count 0x0033 100 100 010 Pre-fail Always Never 0 197 Current pending sector 0x0012 100 100 000 Old age Always Never 0 but one SMART value I have never noticed before, this one marked in yellow as a warning. I have not kept records and not sure if this was there before the parity check. 187 Reported uncorrect 0x0032 098 098 000 Old age Always Never 2 I usually just leave the array alone during the parity check, but accidentally started a copy to this raid box (thought I was writing to my other unraid box), and I aborted the copy after I realized I was on the wrong unraid box. However the copy was going to disk2 (not disk 4 where the errors occurred). My question is how to proceed, I can try another parity check, but if this seems more serious I can start looking into replacement. I have never seen any drive on this array red ball. Or does this appear like some glitch that happened? I have the server behind UPS and did not detect any power issues or anything, did I "cause" this with my copy to the array during parity check? System Specs: unRAID system: unRAID server Pro, version 6.1.9 (not upgraded to current version yet ) Motherboard: Gigabyte Technology Co., Ltd. - F2A88XM-D3H Processor: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G @ 3.7 GHz HVM: Enabled IOMMU: Enabled Memory: 16 GB (max. installable capacity 16 GB) Array: 4 data - 8TB Seagate archive (Shingle) drives 1 parity - 8TB Seagate archive (Shingle) drive Not running anything to spectacular, just a basic NAS. Thanks, any help/advice is appreciated Dalmaar Quote Link to comment
JorgeB Posted December 12, 2016 Share Posted December 12, 2016 Since it was a correcting check not much you can do now, unless you have checksums for all your data. I would give that disk a second chance, but not a third, another read error like that and replace it. Quote Link to comment
dalmaar Posted December 12, 2016 Author Share Posted December 12, 2016 Thanks for the reply, No, I do not have file checksums in place. Should I now do another parity check without correction? I will definitely watch this drive I always have a tail running on the logfile. Dalmaar Quote Link to comment
JorgeB Posted December 12, 2016 Share Posted December 12, 2016 You can do a non correcting check to see if the issues with that disk continue. Quote Link to comment
dalmaar Posted December 13, 2016 Author Share Posted December 13, 2016 well, the second parity check finished (without corrections) and went well... Last checked on Mon 12 Dec 2016 08:52:49 PM CST (today), finding 0 errors. Duration: unavailable (no parity-check entries logged) Logfile... Dec 12 05:31:16 Tower2 kernel: mdcmd (482): check NOCORRECT Dec 12 05:31:16 Tower2 kernel: md: recovery thread woken up ... Dec 12 05:31:16 Tower2 kernel: md: recovery thread checking parity... Dec 12 05:31:16 Tower2 kernel: md: using 1536k window, over a total of 7814026532 blocks. Dec 12 20:52:49 Tower2 kernel: md: sync done. time=55293sec Dec 12 20:52:49 Tower2 kernel: md: recovery thread sync completion status: 0 No changes in the smart report, I guess at this point I will just keep an eye on this drive. I already have another drive on order, was going to expand the array. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.