Jump to content

Repeated Parity Drive Failure


Shadey1

Recommended Posts

Hey all,

 

I bit the bullet last month and upgraded, switching out my old mobo, CPU, PSU and RAM.

Updated all the software and for the last two weeks or so I keep seeing the parity fail. I can't work out why, it doesn't help that when I unmount it, and re-add it that it takes 2 days to rebuild parity.

 

It died last night/morning at 06:34 when I checked the server tonight, it was working fine but a kodi box couldn't stream so I wanted to check it wasn't down and that's when I found it tonight.

I've attached the full diagnostic, from just before the registered loss at 06:34 but pulled the extract below as it covers the core time, everything previous is an hour before.

 

Looks like preclear is throwing some errors, but I can't imagine that's killing the parity surely?

I've even switched sata ports on the mobo last time and that didn't save it.

2 drives are on an expansion card, the parity was one of these until I switched it around.

 

Part of me wonders if it's the sata port card, it was off the old machine, it worked fine then but the errors imply it failed, but I don't know much about them!

 

Thanks in advance!

 

Quote

Jul 25 06:31:03 Wintermute kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 25 06:31:03 Wintermute kernel: ata2.00: failed command: FLUSH CACHE EXT
Jul 25 06:31:03 Wintermute kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 26
Jul 25 06:31:03 Wintermute kernel:         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 25 06:31:03 Wintermute kernel: ata2.00: status: { DRDY }
Jul 25 06:31:03 Wintermute kernel: ata2: hard resetting link
Jul 25 06:31:13 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:31:13 Wintermute kernel: ata2: hard resetting link
Jul 25 06:31:23 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:31:23 Wintermute kernel: ata2: hard resetting link
Jul 25 06:31:33 Wintermute kernel: ata2: link is slow to respond, please be patient (ready=0)
Jul 25 06:31:58 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:31:58 Wintermute kernel: ata2: limiting SATA link speed to 3.0 Gbps
Jul 25 06:31:58 Wintermute kernel: ata2: hard resetting link
Jul 25 06:31:58 Wintermute kernel: ata2: SATA link down (SStatus 0 SControl 320)
Jul 25 06:31:58 Wintermute kernel: ata2: hard resetting link
Jul 25 06:32:08 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:32:08 Wintermute kernel: ata2: hard resetting link
Jul 25 06:32:18 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:32:18 Wintermute kernel: ata2: hard resetting link
Jul 25 06:32:29 Wintermute kernel: ata2: link is slow to respond, please be patient (ready=0)
Jul 25 06:32:53 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:32:53 Wintermute kernel: ata2: limiting SATA link speed to 1.5 Gbps
Jul 25 06:32:53 Wintermute kernel: ata2: hard resetting link
Jul 25 06:32:58 Wintermute kernel: ata2: softreset failed (device not ready)
Jul 25 06:32:58 Wintermute kernel: ata2: reset failed, giving up
Jul 25 06:32:58 Wintermute kernel: ata2.00: disabled
Jul 25 06:32:58 Wintermute kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x5050000 action 0xf t4
Jul 25 06:32:58 Wintermute kernel: ata2: irq_stat 0x00400040, connection status changed
Jul 25 06:32:58 Wintermute kernel: ata2: SError: { PHYRdyChg CommWake TrStaTrns DevExch }
Jul 25 06:32:58 Wintermute kernel: ata2: hard resetting link
Jul 25 06:33:04 Wintermute kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 25 06:33:04 Wintermute kernel: ata2.00: ATA-9: WDC WD60EFRX-68MYMN1,      WD-WX51D6427226, 82.00A82, max UDMA/133
Jul 25 06:33:04 Wintermute kernel: ata2.00: 11721045168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Jul 25 06:33:04 Wintermute kernel: ata2.00: configured for UDMA/133
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 Sense Key : 0x2 [current] 
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 ASC=0x4 ASCQ=0x21 
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
Jul 25 06:33:04 Wintermute kernel: print_req_error: I/O error, dev sdc, sector 0
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: rejecting I/O to offline device
Jul 25 06:33:04 Wintermute kernel: print_req_error: I/O error, dev sdc, sector 0
Jul 25 06:33:04 Wintermute kernel: ata2: EH complete
Jul 25 06:33:04 Wintermute kernel: ata2.00: detaching (SCSI 2:0:0:0)
Jul 25 06:33:04 Wintermute kernel: md: disk0 read error, sector=7928
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] Synchronizing SCSI cache
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] Stopping disk
Jul 25 06:33:04 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info.
Jul 25 06:33:04 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info.
Jul 25 06:33:04 Wintermute kernel: md: disk0 write error, sector=1449670864
Jul 25 06:33:04 Wintermute kernel: md: disk0 write error, sector=1449674832
Jul 25 06:33:04 Wintermute kernel: scsi 2:0:0:0: Direct-Access     ATA      WDC WD60EFRX-68M 0A82 PQ: 0 ANSI: 5
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB)
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] 4096-byte physical blocks
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Write Protect is off
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Mode Sense: 00 3a 00 00
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0
Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 25 06:33:05 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant ID_MODEL - assumed 'ID_MODEL' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470
Jul 25 06:33:05 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant SERIAL_SHORT - assumed 'SERIAL_SHORT' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470
Jul 25 06:33:13 Wintermute kernel: sdj: sdj1
Jul 25 06:33:13 Wintermute kernel: sd 2:0:0:0: [sdj] Attached SCSI disk
Jul 25 06:33:14 Wintermute kernel: md: disk0 write error, sector=7928
Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info.
Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant ID_MODEL - assumed 'ID_MODEL' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470
Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant SERIAL_SHORT - assumed 'SERIAL_SHORT' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470
Jul 25 06:33:19 Wintermute unassigned.devices: Disk with serial 'WDC_WD60EFRX-68MYMN1_WD-WX51D6427226', mountpoint 'WDC_WD60EFRX-68MYMN1_WD-WX51D6427226' is not set to auto mount and will not be mounted...
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte11h - assumed 'byte11h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte10h - assumed 'byte10h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte9h - assumed 'byte9h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte8h - assumed 'byte8h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte15h - assumed 'byte15h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte14h - assumed 'byte14h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte13h - assumed 'byte13h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663
Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte12h - assumed 'byte12h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663
Jul 25 06:33:35 Wintermute kernel: md: disk0 write error, sector=4002149576

 

wintermute-diagnostics-20180725-2230.zip

Link to comment
12 hours ago, johnnie.black said:

Looks more like a cable/connection problem, but since the disk dropped offline there's no SMART report, check connections and post new diags.

 

I'll reseat and redo parity again then run SMART and post back. 2+ days I suspect

Link to comment
  • 2 weeks later...
5 hours ago, johnnie.black said:

SMART looks fine, there's a single UDMA CRC error that points to a connection/cable problem, so and since you replaced them you hopefully will be fine now.

 

I hope so! Thanks for your help, I'll let you know how it pans out!

Link to comment

So my parity dropped last night again, nothing other than this I pulled from the quick log function in unraid as it was after I hit reboot that I noticed the issue!

 

 

unRAID Parity disk error: 07-08-2018 22:46
Alert [WINTERMUTE] - Parity disk in error state (disk dsbl)
WDC_WD60EFRX-68MYMN1_WD-WX51D6427226 (sdc)
 
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544704
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544712
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544720
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544728
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544736
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544744
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544752
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544760
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544768
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544776
Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544784
 
Guess I'll re-seat the parity again, and see when it dc's but I'm running out of ideas!
 
 
 
Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...