December 9, 20178 yr I think they are related as they both started last month some time. The call traces were first reported on Nov-11 and the Parity2 started showing errors on Nov-18. Attached both unraid logs and the smart errors report. Thanks in advance! unraid-diagnostics-20171209-0201.zip smarterrorlog.txt Edit: Added diagnostics from October as well, there was alot of entries repeating themselves too. And I also noticed there is a 4 day gap between my logs for some reason, Nov 10-14. On the 14th the log doesnt begin with the startup sequence, like it normally does. unraid-diagnostics-20171110-1307.zip Edited August 6, 20187 yr by Acps
December 9, 20178 yr 1 hour ago, Acps said: I think they are related as they both started last month some time. unrelated, there were read errors on parity2: Nov 22 04:42:18 unRaid kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 22 04:42:18 unRaid kernel: ata4.00: irq_stat 0x40000001 Nov 22 04:42:18 unRaid kernel: ata4.00: failed command: READ DMA EXT Nov 22 04:42:18 unRaid kernel: ata4.00: cmd 25/00:40:80:35:2c/00:05:c1:01:00/e0 tag 26 dma 688128 in Nov 22 04:42:18 unRaid kernel: res 53/40:00:f0:35:2c/00:00:c1:01:00/00 Emask 0x8 (media error) Nov 22 04:42:18 unRaid kernel: ata4.00: status: { DRDY SENSE ERR } Nov 22 04:42:18 unRaid kernel: ata4.00: error: { UNC } Nov 22 04:42:18 unRaid kernel: ata4.00: NCQ Send/Recv Log not supported Nov 22 04:42:18 unRaid kernel: ata4.00: NCQ Send/Recv Log not supported Nov 22 04:42:18 unRaid kernel: ata4.00: configured for UDMA/133 Nov 22 04:42:18 unRaid kernel: sd 5:0:0:0: [sdc] tag#26 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Nov 22 04:42:18 unRaid kernel: sd 5:0:0:0: [sdc] tag#26 Sense Key : 0x3 [current] Nov 22 04:42:18 unRaid kernel: sd 5:0:0:0: [sdc] tag#26 ASC=0x11 ASCQ=0x0 Nov 22 04:42:18 unRaid kernel: sd 5:0:0:0: [sdc] tag#26 CDB: opcode=0x88 88 00 00 00 00 01 c1 2c 35 80 00 00 05 40 00 00 Nov 22 04:42:18 unRaid kernel: blk_update_request: I/O error, dev sdc, sector 7535867264 Nov 22 04:42:18 unRaid kernel: md: disk29 read error, sector=7535867200 Nov 22 04:42:18 unRaid kernel: md: disk29 read error, sector=7535867208 And they correspond to these: 187 Reported_Uncorrect 0x0032 097 097 000 Old_age Always - 3 They are not pending/bad sectors, so the disk may be fine for now, you should at least run extended SMART test and if it passes keep an eye on the disk, any more issues replace it.
May 17, 20188 yr Author Looks like I might be having the same issue again. IS it my parity 2 disk that is having issues or my disk 1? unraid-smart-20180517-1305.zip unraid-diagnostics-20180517-1303.zip
May 17, 20188 yr Two separate issues, there's was another reported UNC error on parity2, I would recommend replacing it now, since it's the second time, as for disk1, it dropped offline, most likely related to the SASLP it's on since it's a known issue, but you'll need to reboot and post new diags so we can see the SMART report.
June 17, 20188 yr Author Sorry I took me a bit to get around to posting these again. I was gonna install the preclear plugin and try running that again on disk1? I tryed to rebuild it twice and it wasnt working like before when this happened in Dec. unraid-diagnostics-20180616-2308.zip unraid-smart-20180616-2352-parity2.zip unraid-smart-20180617-0001-disk1.zip
June 19, 20188 yr Author Accidentally posted in another thread, that I thought is related to what was going on here. https://lime-technology.com/forums/topic/62206-dockerappdata-permission-issue/?do=findComment&comment=664652 On a side note, I was able to pull the smart report from disk 1: unraid-smart-20180618-2304.zip Edited June 19, 20188 yr by Acps
July 21, 20187 yr Author So i got 2 replacement drives 5tb. Last night I was going to replace Disk1 first and rebuild the array, then replace Parity2 and rebuild it again. Not sure what happened after i installed Disk1, but when I powered it up, Parit1 caught fire and melted the Sata connector from the power supply. Im not sure whether or not Parity1 is actually damaged beyond repair yet, obviously the Sata power cable melted and is, but I might be able to clean up the disk port and clean the contacts off to bring it back to life. My questions is, since I have 2 parity drives, i can loose 2 disks and still not have any data loss. Technically Disk1 is dead, Parity1 might be and Pairty2 has some errors. What is going to be my best route to not loose any damage if its still possible? Can i use my 2 new 5tb drives for Disks1 and Parity1, and rebuild it with my a faulty parity 2? Thanks again for the help! ~Acps Edited July 22, 20187 yr by Acps
July 22, 20187 yr 8 hours ago, Acps said: My questions is, since I have 2 parity drives, i can loose 2 disks and still not have any data loss. Yes, but 8 hours ago, Acps said: and Pairty2 has some errors. This would be a third failing disk, depends on what you mean by some errors.
July 22, 20187 yr Author It was a UNC error reported on Parity2. I acknowledged the error and since then been able to do several Parity checks without any errors.
July 22, 20187 yr 1 hour ago, Acps said: It was a UNC error reported on Parity2. I acknowledged the error and since then been able to do several Parity checks without any errors. But do you still have a non-zero UNC counter for the drive? Or could unRAID rewrite that sector and clear the counter?
July 22, 20187 yr Author This is what my array looks like if I want to try and rebuild. Here are my logs from this morning unraid-diagnostics-20180722-0943.zip
August 6, 20187 yr Author Just to follow up I was able to rebuild just fine replacing disk 1 and parity 1. Parity 1 had to be replaced because i somehow set it on fire while checking disk 1 and parity 2 inside my case, it caught fire right at the sata power port as soon as I powered it back on. I also swapped my raid controller from a pcix4 slot to my pcix16, also I moved all my raid disks to the raid controller exclusively and my 2 ssd cache drives to my 6gb/s sata ports on my gigabyte motherboard. Hoping that it might clear up my drives dropping offline from my controller. Thanks again for all the help jonnie I really appreciate it. I wanted to try and document as much info while working on this in my thread in case I need to come back to it later or have similar issues in the future as I have a terrible memory! ~Acps
Archived
This topic is now archived and is closed to further replies.