Shadey1 Posted July 25, 2018 Share Posted July 25, 2018 Hey all, I bit the bullet last month and upgraded, switching out my old mobo, CPU, PSU and RAM. Updated all the software and for the last two weeks or so I keep seeing the parity fail. I can't work out why, it doesn't help that when I unmount it, and re-add it that it takes 2 days to rebuild parity. It died last night/morning at 06:34 when I checked the server tonight, it was working fine but a kodi box couldn't stream so I wanted to check it wasn't down and that's when I found it tonight. I've attached the full diagnostic, from just before the registered loss at 06:34 but pulled the extract below as it covers the core time, everything previous is an hour before. Looks like preclear is throwing some errors, but I can't imagine that's killing the parity surely? I've even switched sata ports on the mobo last time and that didn't save it. 2 drives are on an expansion card, the parity was one of these until I switched it around. Part of me wonders if it's the sata port card, it was off the old machine, it worked fine then but the errors imply it failed, but I don't know much about them! Thanks in advance! Quote Jul 25 06:31:03 Wintermute kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 25 06:31:03 Wintermute kernel: ata2.00: failed command: FLUSH CACHE EXT Jul 25 06:31:03 Wintermute kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 26 Jul 25 06:31:03 Wintermute kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 25 06:31:03 Wintermute kernel: ata2.00: status: { DRDY } Jul 25 06:31:03 Wintermute kernel: ata2: hard resetting link Jul 25 06:31:13 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:31:13 Wintermute kernel: ata2: hard resetting link Jul 25 06:31:23 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:31:23 Wintermute kernel: ata2: hard resetting link Jul 25 06:31:33 Wintermute kernel: ata2: link is slow to respond, please be patient (ready=0) Jul 25 06:31:58 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:31:58 Wintermute kernel: ata2: limiting SATA link speed to 3.0 Gbps Jul 25 06:31:58 Wintermute kernel: ata2: hard resetting link Jul 25 06:31:58 Wintermute kernel: ata2: SATA link down (SStatus 0 SControl 320) Jul 25 06:31:58 Wintermute kernel: ata2: hard resetting link Jul 25 06:32:08 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:32:08 Wintermute kernel: ata2: hard resetting link Jul 25 06:32:18 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:32:18 Wintermute kernel: ata2: hard resetting link Jul 25 06:32:29 Wintermute kernel: ata2: link is slow to respond, please be patient (ready=0) Jul 25 06:32:53 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:32:53 Wintermute kernel: ata2: limiting SATA link speed to 1.5 Gbps Jul 25 06:32:53 Wintermute kernel: ata2: hard resetting link Jul 25 06:32:58 Wintermute kernel: ata2: softreset failed (device not ready) Jul 25 06:32:58 Wintermute kernel: ata2: reset failed, giving up Jul 25 06:32:58 Wintermute kernel: ata2.00: disabled Jul 25 06:32:58 Wintermute kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x5050000 action 0xf t4 Jul 25 06:32:58 Wintermute kernel: ata2: irq_stat 0x00400040, connection status changed Jul 25 06:32:58 Wintermute kernel: ata2: SError: { PHYRdyChg CommWake TrStaTrns DevExch } Jul 25 06:32:58 Wintermute kernel: ata2: hard resetting link Jul 25 06:33:04 Wintermute kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 25 06:33:04 Wintermute kernel: ata2.00: ATA-9: WDC WD60EFRX-68MYMN1, WD-WX51D6427226, 82.00A82, max UDMA/133 Jul 25 06:33:04 Wintermute kernel: ata2.00: 11721045168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Jul 25 06:33:04 Wintermute kernel: ata2.00: configured for UDMA/133 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 Sense Key : 0x2 [current] Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 ASC=0x4 ASCQ=0x21 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] tag#26 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00 Jul 25 06:33:04 Wintermute kernel: print_req_error: I/O error, dev sdc, sector 0 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: rejecting I/O to offline device Jul 25 06:33:04 Wintermute kernel: print_req_error: I/O error, dev sdc, sector 0 Jul 25 06:33:04 Wintermute kernel: ata2: EH complete Jul 25 06:33:04 Wintermute kernel: ata2.00: detaching (SCSI 2:0:0:0) Jul 25 06:33:04 Wintermute kernel: md: disk0 read error, sector=7928 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] Synchronizing SCSI cache Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdc] Stopping disk Jul 25 06:33:04 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info. Jul 25 06:33:04 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info. Jul 25 06:33:04 Wintermute kernel: md: disk0 write error, sector=1449670864 Jul 25 06:33:04 Wintermute kernel: md: disk0 write error, sector=1449674832 Jul 25 06:33:04 Wintermute kernel: scsi 2:0:0:0: Direct-Access ATA WDC WD60EFRX-68M 0A82 PQ: 0 ANSI: 5 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] 11721045168 512-byte logical blocks: (6.00 TB/5.46 TiB) Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] 4096-byte physical blocks Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Write Protect is off Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Mode Sense: 00 3a 00 00 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0 Jul 25 06:33:04 Wintermute kernel: sd 2:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 25 06:33:05 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant ID_MODEL - assumed 'ID_MODEL' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470 Jul 25 06:33:05 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant SERIAL_SHORT - assumed 'SERIAL_SHORT' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470 Jul 25 06:33:13 Wintermute kernel: sdj: sdj1 Jul 25 06:33:13 Wintermute kernel: sd 2:0:0:0: [sdj] Attached SCSI disk Jul 25 06:33:14 Wintermute kernel: md: disk0 write error, sector=7928 Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: SIGHUP received, forcing refresh of disks info. Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant ID_MODEL - assumed 'ID_MODEL' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470 Jul 25 06:33:19 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant SERIAL_SHORT - assumed 'SERIAL_SHORT' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 470 Jul 25 06:33:19 Wintermute unassigned.devices: Disk with serial 'WDC_WD60EFRX-68MYMN1_WD-WX51D6427226', mountpoint 'WDC_WD60EFRX-68MYMN1_WD-WX51D6427226' is not set to auto mount and will not be mounted... Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte11h - assumed 'byte11h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte10h - assumed 'byte10h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte9h - assumed 'byte9h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte8h - assumed 'byte8h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 662 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte15h - assumed 'byte15h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte14h - assumed 'byte14h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte13h - assumed 'byte13h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663 Jul 25 06:33:21 Wintermute rc.diskinfo[8221]: PHP Warning: Use of undefined constant byte12h - assumed 'byte12h' (this will throw an Error in a future version of PHP) in /etc/rc.d/rc.diskinfo on line 663 Jul 25 06:33:35 Wintermute kernel: md: disk0 write error, sector=4002149576 wintermute-diagnostics-20180725-2230.zip Link to comment
JorgeB Posted July 26, 2018 Share Posted July 26, 2018 Looks more like a cable/connection problem, but since the disk dropped offline there's no SMART report, check connections and post new diags. Link to comment
Shadey1 Posted July 26, 2018 Author Share Posted July 26, 2018 12 hours ago, johnnie.black said: Looks more like a cable/connection problem, but since the disk dropped offline there's no SMART report, check connections and post new diags. I'll reseat and redo parity again then run SMART and post back. 2+ days I suspect Link to comment
Shadey1 Posted August 4, 2018 Author Share Posted August 4, 2018 It failed once mid parity after just reseating the connections so I took out the sata data cable and replaced it and used a different power cable off another multicable. Be back in 2 days nearly with some more results but here is the SMART while it's 10 minutes into a day day parity check. wintermute-smart-20180804-1154.zip Link to comment
JorgeB Posted August 4, 2018 Share Posted August 4, 2018 SMART looks fine, there's a single UDMA CRC error that points to a connection/cable problem, so and since you replaced them you hopefully will be fine now. Link to comment
Shadey1 Posted August 4, 2018 Author Share Posted August 4, 2018 5 hours ago, johnnie.black said: SMART looks fine, there's a single UDMA CRC error that points to a connection/cable problem, so and since you replaced them you hopefully will be fine now. I hope so! Thanks for your help, I'll let you know how it pans out! Link to comment
Shadey1 Posted August 6, 2018 Author Share Posted August 6, 2018 Looking good so far! Now to figure out the stuttering that's started on kodi on windows and rpi3 boxes in the past week or so! Can use VLC fine... EDIT: Wifi switch needed restarting... lol Link to comment
Shadey1 Posted August 8, 2018 Author Share Posted August 8, 2018 So my parity dropped last night again, nothing other than this I pulled from the quick log function in unraid as it was after I hit reboot that I noticed the issue! unRAID Parity disk error: 07-08-2018 22:46 Alert [WINTERMUTE] - Parity disk in error state (disk dsbl) WDC_WD60EFRX-68MYMN1_WD-WX51D6427226 (sdc) Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544704Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544712Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544720Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544728Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544736Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544744Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544752Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544760Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544768Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544776Aug 7 23:03:24 Wintermute kernel: md: disk0 write error, sector=4577544784 Guess I'll re-seat the parity again, and see when it dc's but I'm running out of ideas! Link to comment
JorgeB Posted August 9, 2018 Share Posted August 9, 2018 Post a new SMART report for that disk, but if the same disk continues to give problems after replacing the cables it might be failing, despite the healthy SMART, SMART can help predict issues but not always. Link to comment
Shadey1 Posted August 9, 2018 Author Share Posted August 9, 2018 It failed after about 30 mins, I'll try again and if not I'll order another. Left it doing a parity rebuild while we were away and it's been fine for over a day. Must've been the drive dying! Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.