crazycam425 Posted May 9, 2018 Share Posted May 9, 2018 Can someone please help me understand what these errors mean and how to fix them? As far as I can tell I am not loosing any data but I would like to resolve this as soon as possible please. These errors are constantly happening right now. Thank you! Link to comment
JorgeB Posted May 10, 2018 Share Posted May 10, 2018 Please post your diagnostics: Tools -> Diagnostics Link to comment
crazycam425 Posted May 10, 2018 Author Share Posted May 10, 2018 Here it is sorry I forgot to post that. unraid-diagnostics-20180509-2350.zip Link to comment
JorgeB Posted May 10, 2018 Share Posted May 10, 2018 Looks like you posted the diags from a different server. Link to comment
crazycam425 Posted May 10, 2018 Author Share Posted May 10, 2018 Dang I don't know why I can't think right now. I do have to servers and I grabbed the wrong one. Here is the correct diagnostics file. Thanks! lakesidebpo-diagnostics-20180510-0724.zip Also, I don't know if I was getting these errors before I added the second ssd to the pool. I just added the second ssd a couple of days ago and that's when i noticed these errors. Link to comment
JorgeB Posted May 10, 2018 Share Posted May 10, 2018 Unfortunately the syslog doesn't show the start of the problem, but looks like sde (cache2) dropped offline, you'll need to power down, check cables, and if it comes online run a scrub on the pool, make sure there are no uncorrectable errors. Link to comment
crazycam425 Posted May 10, 2018 Author Share Posted May 10, 2018 I swapped out the sata cable and this is what is happening now. not as many errors but a couple and a warning. I haven't done a scrub yet. What should i try now? lakesidebpo-diagnostics-20180510-1531.zip Just finished a scrub and it said no errors. Link to comment
JorgeB Posted May 11, 2018 Share Posted May 11, 2018 Still getting ATA errors: Quote May 10 15:30:37 LakesideBPO kernel: ata7.00: exception Emask 0x10 SAct 0x7 SErr 0x200100 action 0x6 frozen May 10 15:30:37 LakesideBPO kernel: ata7.00: irq_stat 0x08000000, interface fatal error May 10 15:30:37 LakesideBPO kernel: ata7: SError: { UnrecovData BadCRC } May 10 15:30:37 LakesideBPO kernel: ata7.00: failed command: READ FPDMA QUEUED May 10 15:30:37 LakesideBPO kernel: ata7.00: cmd 60/68:00:40:c0:e7/01:00:0e:00:00/40 tag 0 ncq dma 184320 in May 10 15:30:37 LakesideBPO kernel: res 40/00:04:40:c0:e7/00:00:0e:00:00/40 Emask 0x10 (ATA bus error) May 10 15:30:37 LakesideBPO kernel: ata7.00: status: { DRDY } May 10 15:30:37 LakesideBPO kernel: ata7.00: failed command: READ FPDMA QUEUED May 10 15:30:37 LakesideBPO kernel: ata7.00: cmd 60/10:08:b0:c1:e7/00:00:0e:00:00/40 tag 1 ncq dma 8192 in May 10 15:30:37 LakesideBPO kernel: res 40/00:04:40:c0:e7/00:00:0e:00:00/40 Emask 0x10 (ATA bus error) May 10 15:30:37 LakesideBPO kernel: ata7.00: status: { DRDY } May 10 15:30:37 LakesideBPO kernel: ata7.00: failed command: READ FPDMA QUEUED May 10 15:30:37 LakesideBPO kernel: ata7.00: cmd 60/78:10:c8:c1:e7/00:00:0e:00:00/40 tag 2 ncq dma 61440 in May 10 15:30:37 LakesideBPO kernel: res 40/00:04:40:c0:e7/00:00:0e:00:00/40 Emask 0x10 (ATA bus error) May 10 15:30:37 LakesideBPO kernel: ata7.00: status: { DRDY } May 10 15:30:37 LakesideBPO kernel: ata7: hard resetting link May 10 15:30:37 LakesideBPO kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 10 15:30:37 LakesideBPO kernel: ata7.00: configured for UDMA/133 May 10 15:30:37 LakesideBPO kernel: ata7: EH complete CRC errors usually result from a bad SATA cable, cache pool is a mess and full of read and write errors, when the cable issues are fixed best to backup, reformat and restore data. Link to comment
crazycam425 Posted May 11, 2018 Author Share Posted May 11, 2018 So I swapped out the cables and now I am getting a bunch of UDMA CRC Error Count errors for 3 of my 5 drives but i'm not getting a lot of the other errors. I'm not sure why but my unraid machine started freezing after about 10 minutes of operation and I have security cameras running through a docker on this machine and I need it to run flawlessly. Can you please take a look at this latest diagnostic file and let me know what I need to do? I really appreciate all the help. Thank you! lakesidebpo-diagnostics-20180511-1645.zip Link to comment
trurl Posted May 12, 2018 Share Posted May 12, 2018 1 hour ago, crazycam425 said: So I swapped out the cables and now I am getting a bunch of UDMA CRC Error Count errors for 3 of my 5 drives You probably disturbed the connections on other disks when swapping cables. Link to comment
JorgeB Posted May 12, 2018 Share Posted May 12, 2018 6 hours ago, crazycam425 said: So I swapped out the cables and now I am getting a bunch of UDMA CRC Error Count errors for 3 of my 5 drives 23 hours ago, johnnie.black said: CRC errors usually result from a bad SATA cable Link to comment
crazycam425 Posted May 12, 2018 Author Share Posted May 12, 2018 Would that be causing my machine to keep freezing? I'll try swapping out all the cables and post what happens. Link to comment
JorgeB Posted May 12, 2018 Share Posted May 12, 2018 Enough communication errors can cause freezing, or the appearance of freezing, either way it's got to be fixed before you can check other issues. Link to comment
pwm Posted May 12, 2018 Share Posted May 12, 2018 3 hours ago, crazycam425 said: Would that be causing my machine to keep freezing? I'll try swapping out all the cables and post what happens. Transfer errors means the OS have to reissue the transfer again and again - and the program threads that waits for the data to be read/written will have to freeze. If it's an important thread, then the outcome is that the machine - or an important service - will look frozen or very sluggish. Link to comment
crazycam425 Posted May 12, 2018 Author Share Posted May 12, 2018 Tonight I'm going to get the machine from the office and try swapping out all the sata cables and see if that fixes everything. It definitely is freezing though. My dockers and shares and everything stop working and it won't ever come back up until I hard reset the machin. Any other ideas I should try when as well as sata cables? Link to comment
crazycam425 Posted May 13, 2018 Author Share Posted May 13, 2018 I swapped out the cables. Balanced and scrubbed the cache and this is the diagnostics file. I am still getting one error but can't figure out how to fix it. Any suggestions?tower-diagnostics-20180512-1703.zip Link to comment
JorgeB Posted May 13, 2018 Share Posted May 13, 2018 Cache pool is still only using a device, to fix it it's easier to backup, reformat pool and restore, you can use this procedure to help with the backup/restore. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.