Parity disk showing errors during parity check

bondoo0 · August 31, 2015

I'm on unraid 6.0.1, and during my monthly parity check I started getting reallocated sector alerts and an increasing incorrect count (currently at 12 at about 70% complete in the parity check) on my parity drive. In the GUI I can see it showing about 400 errors for that drive. I assume that means its time to replace my parity drive, but I thought I would confirm my diagnosis, and see if there are any additional things I should do before replacing the drive. I do have a cold spare drive that has already been precleared to replace the drive.

trurl · August 31, 2015

Tools - Diagnostics

bondoo0 · August 31, 2015

Not sure what diagnostics will tell you in this case (not trying to be difficult, just curious), since the errors going up and reallocated sectors messages seem to point to an issue with the hard drive? I don't think it's a bad cable since nothing has been moved/changed physically in the machine in months.

When I try to attach, it is too large (I'm guessing because of the syslog being 12 MB originally, so zip file is 1192 KB). What should I remove to be allowed to upload?

bondoo0 · August 31, 2015

I looked through the larger syslog, and it was syslog_1 (prior to rolling), and the parity check is included in the smaller log, so I removed syslog_1 from the diagnostic zip file. This one shows the read errors, etc.

unraid-server1-diagnostics-20150831-0820.zip

RobJ · August 31, 2015

Just a comment, without having looked at anything, you always want the very first errors, not the later read errors and others. They are just the after effects of the issue and can be ignored, may not even be real. I would suggest zipping that first syslog and attaching it separately, as it contains the initial setup and the first and most important errors. If it still seems too big, you can chop it after the important part, as the later error repetition is never useful.

bondoo0 · August 31, 2015

The errors actually start in the smaller one that I was able to keep (and the previous log isn't the first one from startup since the machine has been up long enough for a couple of log rollovers). The smaller (current) log file starts on 8/30 @ 5:00 AM, and the first error happened at 8/31/15 01:40:55.

First error is this (about 1.5 hours into the parity check):

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: BMDMA stat 0x4

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: failed command: READ DMA EXT

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: cmd 25/00:00:20:73:f6/00:04:43:00:00/e0 tag 29 dma 524288 in

Aug 31 01:40:55 unraid-server1 kernel: res 51/40:00:78:76:f6/40:00:43:00:00/00 Emask 0x9 (media error)

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: status: { DRDY ERR }

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: error: { UNC }

Aug 31 01:40:55 unraid-server1 kernel: ata12.00: configured for UDMA/133

Aug 31 01:40:55 unraid-server1 kernel: sd 7:0:0:0: [sdc] tag#29 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

Aug 31 01:40:55 unraid-server1 kernel: sd 7:0:0:0: [sdc] tag#29 Sense Key : 0x3 [current] [descriptor]

Aug 31 01:40:55 unraid-server1 kernel: sd 7:0:0:0: [sdc] tag#29 ASC=0x11 ASCQ=0x4

Aug 31 01:40:55 unraid-server1 kernel: sd 7:0:0:0: [sdc] tag#29 CDB: opcode=0x88 88 00 00 00 00 00 43 f6 73 20 00 00 04 00 00 00

Aug 31 01:40:55 unraid-server1 kernel: blk_update_request: I/O error, dev sdc, sector 1140225656

Aug 31 01:40:55 unraid-server1 kernel: ata12: EH complete

Aug 31 01:40:55 unraid-server1 kernel: md: disk0 read error, sector=1140225592

...

RobJ · August 31, 2015

You are correct, there are a series of blocks of true bad sectors, each block reallocated with all of their sectors, 14 so far, all on the parity drive. Nothing else appears wrong with the drive, but I would guess something mechanical has degraded. Needs to be Precleared, probably multiple times until no further changes occur. Or attempt to RMA it.

The go file has the following line, should that u be in the size parameter?

mount -o remount,size=1u024M /var/log

The network config shows a huge number of dropped packets. I suspect the bond is not configured correctly, but I'm not an expert.

bondoo0 · August 31, 2015

You are correct, there are a series of blocks of true bad sectors, each block reallocated with all of their sectors, 14 so far, all on the parity drive. Nothing else appears wrong with the drive, but I would guess something mechanical has degraded. Needs to be Precleared, probably multiple times until no further changes occur. Or attempt to RMA it.

That's my thought as well. Guess it's time to swap the drive, run the preclears and make see where that goes.

The go file has the following line, should that u be in the size parameter?

mount -o remount,size=1u024M /var/log

That line should have been removed, if you notice later in the go file I set it to 512 MB instead of 1 GB, as a fix to log filling up from dynamix system stats.

The network config shows a huge number of dropped packets. I suspect the bond is not configured correctly, but I'm not an expert.

I'll have to take a look at that, I'm guessing that's because I had to swap a switch, and so the errors were probably from that time, but something to keep an eye on for sure.

Thanks for taking a look.

Parity disk showing errors during parity check

Recommended Posts

bondoo0

Link to comment

trurl

Link to comment

bondoo0

Link to comment

bondoo0

Link to comment

RobJ

Link to comment

bondoo0

Link to comment

RobJ

Link to comment

bondoo0

Link to comment

Archived