Jump to content
Sign in to follow this  
dalmaar

Possible disk issue after parity check?

5 posts in this topic Last Reply

Recommended Posts

Hello, looking for some advice on how to proceed after parity check.

 

I have never gotten any kind of errors on this unraid box until now. I did a

parity check (with write corrections) and while it finished close to previous

parity check times, it finished with sync errors.  :(

 

Last checked on Sun 11 Dec 2016 10:10:08 PM CST (today), finding 988 errors.

Duration: 15 hours, 31 minutes, 38 seconds. Average speed: 143.1 MB/s

 

after examining the logfile, I can see what might be a more serious issue?

After a bunch of "correcting parity" logs entries followed by a read error block (see

snippet below), this happened one more time and near at the end of the parity check.

 

----------------------------------------------------------------------------------

(Begin LOGFILE SNIPPET)

.

.

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671128

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671136

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671144

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671152

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671160

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671168

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, sector=15482671176

Dec 11 21:56:05 Tower2 kernel: md: correcting parity, stopped logging

Dec 11 21:56:11 Tower2 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Dec 11 21:56:11 Tower2 kernel: ata5.00: irq_stat 0x40000001

Dec 11 21:56:11 Tower2 kernel: ata5.00: failed command: READ DMA EXT

Dec 11 21:56:11 Tower2 kernel: ata5.00: cmd 25/00:90:10:b8:dc/00:03:9a:03:00/e0 tag 5 dma 466944 in

Dec 11 21:56:11 Tower2 kernel:        res 51/40:00:c8:b8:dc/00:00:9a:03:00/00 Emask 0x9 (media error)

Dec 11 21:56:11 Tower2 kernel: ata5.00: status: { DRDY ERR }

Dec 11 21:56:11 Tower2 kernel: ata5.00: error: { UNC }

Dec 11 21:56:11 Tower2 kernel: ata5.00: configured for UDMA/133

Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 Sense Key : 0x3 [current] [descriptor]

Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 ASC=0x11 ASCQ=0x4

Dec 11 21:56:11 Tower2 kernel: sd 5:0:0:0: [sdf] tag#5 CDB: opcode=0x88 88 00 00 00 00 03 9a dc b8 10 00 00 03 90 00 00

Dec 11 21:56:11 Tower2 kernel: blk_update_request: I/O error, dev sdf, sector 15483058376

Dec 11 21:56:11 Tower2 kernel: ata5: EH complete

Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058312

Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058320

Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058328

Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483058336

.

.

Dec 11 21:56:11 Tower2 kernel: md: disk4 read error, sector=15483059032

Dec 11 21:56:14 Tower2 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Dec 11 21:56:14 Tower2 kernel: ata5.00: irq_stat 0x40000001

Dec 11 21:56:14 Tower2 kernel: ata5.00: failed command: READ DMA EXT

Dec 11 21:56:14 Tower2 kernel: ata5.00: cmd 25/00:40:a0:bb:dc/00:05:9a:03:00/e0 tag 8 dma 688128 in

Dec 11 21:56:14 Tower2 kernel:        res 51/40:00:a0:bb:dc/00:00:9a:03:00/00 Emask 0x9 (media error)

Dec 11 21:56:14 Tower2 kernel: ata5.00: status: { DRDY ERR }

Dec 11 21:56:14 Tower2 kernel: ata5.00: error: { UNC }

Dec 11 21:56:14 Tower2 kernel: ata5.00: configured for UDMA/133

Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 Sense Key : 0x3 [current] [descriptor]

Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 ASC=0x11 ASCQ=0x4

Dec 11 21:56:14 Tower2 kernel: sd 5:0:0:0: [sdf] tag#8 CDB: opcode=0x88 88 00 00 00 00 03 9a dc bb a0 00 00 05 40 00 00

Dec 11 21:56:14 Tower2 kernel: blk_update_request: I/O error, dev sdf, sector 15483059104

Dec 11 21:56:14 Tower2 kernel: md: disk4 read error, sector=15483059040

.

.

Dec 11 22:10:08 Tower2 kernel: md: sync done. time=55898sec

Dec 11 22:10:08 Tower2 kernel: md: recovery thread sync completion status: 0

 

(End LOGFILE SNIPPET)

-------------------------------------------------------------------------------------

 

The SMART report on drive 4 shows 0 reallocated and pending sectors.

 

5 Reallocated sector count 0x0033 100 100 010 Pre-fail Always Never 0

197 Current pending sector 0x0012 100 100 000 Old age Always Never 0

 

but one SMART value I have never noticed before, this one marked in yellow as a warning.

I have not kept records and not sure if this was there before the parity check.

 

187 Reported uncorrect 0x0032 098 098 000 Old age Always Never 2

 

I usually just leave the array alone during the parity check, but accidentally started

a copy to this raid box (thought I was writing to my other unraid box), and I aborted the

copy after I realized I was on the wrong unraid box. However the copy was going to

disk2 (not disk 4 where the errors occurred).

 

My question is how to proceed, I can try another parity check, but if this seems

more serious I can start looking into replacement. I have never seen any drive

on this array red ball.

 

Or does this appear like some glitch that happened? I have the server behind

UPS and did not detect any power issues or anything, did I "cause" this with

my copy to the array during parity check?

 

System Specs:

unRAID system: unRAID server Pro, version 6.1.9 (not upgraded to current version yet  :-[ )

Motherboard: Gigabyte Technology Co., Ltd. - F2A88XM-D3H

Processor:         AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G @ 3.7 GHz

HVM:                 Enabled

IOMMU:         Enabled

Memory:       16 GB (max. installable capacity 16 GB)

 

Array:

4 data  - 8TB Seagate archive (Shingle) drives

1 parity - 8TB Seagate archive (Shingle) drive

 

Not running anything to spectacular, just a basic NAS.

 

Thanks, any help/advice is appreciated

Dalmaar

Share this post


Link to post

Since it was a correcting check not much you can do now, unless you have checksums for all your data.

 

I would give that disk a second chance, but not a third, another read error like that and replace it.

Share this post


Link to post

Thanks for the reply,

 

No, I do not have file checksums in place.

 

Should I now do another parity check without correction?

 

I will definitely watch this drive  >:(  I always have a tail running on the logfile.

 

Dalmaar

 

 

 

Share this post


Link to post

well, the second parity check finished (without corrections) and went well...

 

Last checked on Mon 12 Dec 2016 08:52:49 PM CST (today), finding 0 errors.

Duration: unavailable (no parity-check entries logged)

 

Logfile...

Dec 12 05:31:16 Tower2 kernel: mdcmd (482): check NOCORRECT

Dec 12 05:31:16 Tower2 kernel: md: recovery thread woken up ...

Dec 12 05:31:16 Tower2 kernel: md: recovery thread checking parity...

Dec 12 05:31:16 Tower2 kernel: md: using 1536k window, over a total of 7814026532 blocks.

Dec 12 20:52:49 Tower2 kernel: md: sync done. time=55293sec

Dec 12 20:52:49 Tower2 kernel: md: recovery thread sync completion status: 0

 

No changes in the smart report, I guess at this point I will just keep an eye

on this drive. I already have another drive on order, was going to expand the

array.

 

 

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this