privateer Posted December 31, 2023 Share Posted December 31, 2023 Dec 30 19:50:23 Tower kernel: ata4.00: exception Emask 0x10 SAct 0x80000000 SErr 0x4090000 action 0xe frozen Dec 30 19:50:23 Tower kernel: ata4.00: irq_stat 0x00400040, connection status changed Dec 30 19:50:23 Tower kernel: ata4: SError: { PHYRdyChg 10B8B DevExch } Dec 30 19:50:23 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Dec 30 19:50:23 Tower kernel: ata4.00: cmd 60/20:f8:a0:00:00/00:00:00:02:00/40 tag 31 ncq dma 16384 in Dec 30 19:50:23 Tower kernel: res 40/00:f8:a0:00:00/00:00:00:02:00/40 Emask 0x10 (ATA bus error) Dec 30 19:50:23 Tower kernel: ata4.00: status: { DRDY } Dec 30 19:50:23 Tower kernel: ata4: hard resetting link Dec 30 19:50:26 Tower kernel: ata1: link is slow to respond, please be patient (ready=0) Dec 30 19:50:26 Tower kernel: ata2: link is slow to respond, please be patient (ready=0) Dec 30 19:50:29 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Dec 30 19:50:30 Tower kernel: ata1: COMRESET failed (errno=-16) Dec 30 19:50:30 Tower kernel: ata2: COMRESET failed (errno=-16) Dec 30 19:50:30 Tower kernel: ata2: hard resetting link Dec 30 19:50:31 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Dec 30 19:50:31 Tower kernel: ata1.00: configured for UDMA/133 Dec 30 19:50:33 Tower kernel: ata4: COMRESET failed (errno=-16) Dec 30 19:50:33 Tower kernel: ata4: hard resetting link Dec 30 19:50:33 Tower kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 30 19:50:33 Tower kernel: ata2.00: configured for UDMA/133 Dec 30 19:50:33 Tower kernel: ata2: EH complete Dec 30 19:50:38 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Dec 30 19:50:43 Tower kernel: ata4: COMRESET failed (errno=-16) Dec 30 19:50:43 Tower kernel: ata4: hard resetting link Dec 30 19:50:46 Tower kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Dec 30 19:50:46 Tower kernel: ata4.00: configured for UDMA/133 Dec 30 19:50:46 Tower kernel: ata4: EH complete Dec 30 19:50:47 Tower kernel: ata2.00: exception Emask 0x10 SAct 0x700027 SErr 0x4890000 action 0xe frozen Dec 30 19:50:47 Tower kernel: ata2.00: irq_stat 0x0c400040, interface fatal error, connection status changed Dec 30 19:50:47 Tower kernel: ata2: SError: { PHYRdyChg 10B8B LinkSeq DevExch } Dec 30 19:50:47 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Dec 30 19:50:47 Tower kernel: ata2.00: cmd 60/00:00:20:58:7d/04:00:85:01:00/40 tag 0 ncq dma 524288 in Dec 30 19:50:47 Tower kernel: res 40/00:10:60:5d:7d/00:00:85:01:00/40 Emask 0x10 (ATA bus error) Dec 30 19:50:47 Tower kernel: ata2.00: status: { DRDY } Dec 30 19:50:47 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Dec 30 19:50:47 Tower kernel: ata2.00: cmd 60/40:08:20:5c:7d/01:00:85:01:00/40 tag 1 ncq dma 163840 in Dec 30 19:50:47 Tower kernel: res 40/00:10:60:5d:7d/00:00:85:01:00/40 Emask 0x10 (ATA bus error) Dec 30 19:50:47 Tower kernel: ata2.00: status: { DRDY } Dec 30 19:50:47 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Dec 30 19:50:47 Tower kernel: ata2.00: cmd 60/d0:10:60:5d:7d/03:00:85:01:00/40 tag 2 ncq dma 499712 in Dec 30 19:50:47 Tower kernel: res 40/00:10:60:5d:7d/00:00:85:01:00/40 Emask 0x10 (ATA bus error) Dec 30 19:50:47 Tower kernel: ata2.00: status: { DRDY } Dec 30 19:50:47 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Dec 30 19:50:47 Tower kernel: ata2.00: cmd 60/00:28:30:61:7d/04:00:85:01:00/40 tag 5 ncq dma 524288 in Dec 30 19:50:47 Tower kernel: res 40/00:10:60:5d:7d/00:00:85:01:00/40 Emask 0x10 (ATA bus error) I have seen a bunch of errors related to ata but not sure what's exactly triggering things. I removed the cable attached to what's labeled as SATA3_2 on my mobo but still getting these issues. I rebooted while attempting to solve this but couldn't reboot from GUI, had to use the button. Then unraid couldn't unmount all the drives so I forced an unclean shutdown. Came back up and Drive 17 fell out of the array with no prior warning. When I go in for attributes on Disk 17 it has a high raw read rate and a high seek error rate but not sure if that's being caused by bad cables, or other hardware issue than the disk. I think I also may be triggering it when running mover but can't tell. Any thoughts? I tower-diagnostics-20231230-1952.zip Quote Link to comment
trurl Posted December 31, 2023 Share Posted December 31, 2023 Power problems? Are there any power splitters in your setup? Quote Link to comment
privateer Posted December 31, 2023 Author Share Posted December 31, 2023 (edited) 26 minutes ago, trurl said: Power problems? Are there any power splitters in your setup? I have SATA power splitters in my setup. Edited December 31, 2023 by privateer Quote Link to comment
itimpi Posted December 31, 2023 Share Posted December 31, 2023 5 hours ago, privateer said: I have SATA power splitters in my setup. If using SATA->SATA splitters, make sure that you do not attempt to split any SATA connector on a cable from the PSU more than 2 ways as doing so can cause issues. Quote Link to comment
privateer Posted December 31, 2023 Author Share Posted December 31, 2023 5 hours ago, itimpi said: If using SATA->SATA splitters, make sure that you do not attempt to split any SATA connector on a cable from the PSU more than 2 ways as doing so can cause issues. That's not present in my setup. Quote Link to comment
privateer Posted December 31, 2023 Author Share Posted December 31, 2023 Disk 17 shows "Unmountable: Unmounted or unsupportable file system." It was fairly full of data before this. Quote Link to comment
trurl Posted December 31, 2023 Share Posted December 31, 2023 1 hour ago, privateer said: Disk 17 shows "Unmountable: Unmounted or unsupportable file system." It was fairly full of data before this. Check filesystem on disk17 Quote Link to comment
privateer Posted December 31, 2023 Author Share Posted December 31, 2023 1 hour ago, trurl said: Check filesystem on disk17 Oof. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agf_freeblks 9058410, counted 9058399 in ag 2 agi_count 1984, counted 2048 in ag 2 agi_freecount 21, counted 13 in ag 2 agi_freecount 21, counted 13 in ag 2 finobt sb_icount 34880, counted 35744 sb_ifree 506, counted 455 sb_fdblocks 603095616, counted 616701578 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 Metadata corruption detected at 0x438a03, xfs_inode block 0xa63aa00/0x4000 Metadata corruption detected at 0x438a03, xfs_inode block 0xa63aa20/0x4000 bad CRC for inode 174303744 bad magic number 0x16 on inode 174303744 bad version number 0xffffffaa on inode 174303744 bad next_unlinked 0xc0fc21c3 on inode 174303744 inode identifier 9179762597904482375 mismatch on inode 174303744 bad CRC for inode 174303745 bad magic number 0xccd3 on inode 174303745 bad version number 0xffffffba on inode 174303745 inode identifier 3371903399051595482 mismatch on inode 174303745 bad CRC for inode 174303746 bad magic number 0xdbe0 on inode 174303746 bad version number 0xffffffdf on inode 174303746 inode identifier 50872455103499597 mismatch on inode 174303746 bad CRC for inode 174303747 bad magic number 0xdd24 on inode 174303747 bad version number 0xffffffbd on inode 174303747 bad next_unlinked 0x9f522d11 on inode 174303747 inode identifier 9340838863122723239 mismatch on inode 174303747 bad CRC for inode 174303748 bad magic number 0x2043 on inode 174303748 bad version number 0xffffffa2 on inode 174303748 bad next_unlinked 0xf10ffca3 on inode 174303748 inode identifier 1184165229794778217 mismatch on inode 174303748 bad CRC for inode 174303749 bad magic number 0x66b7 on inode 174303749 bad version number 0x79 on inode 174303749 bad next_unlinked 0xb51219b6 on inode 174303749 inode identifier 14679859918268388760 mismatch on inode 174303749 Lots more of the bad crc, bad magic, bad version, bad_next, inode lines. Several of these: imap claims a free inode 1155669479 is in use, would correct imap and clear inode A few of these with various folder names: entry "[FOLDER NAME]" at block 0 offset 152 in directory inode 6600634561 references free inode 1155669489 would clear inode number in entry at offset 152... These as well: entry "[FOLDER NAME]" in shortform directory 32911946759 references free inode 2600817779 would have junked entry "[FOLDER NAME]" in directory inode 32911946759 Many of both of these: disconnected dir inode 4888060274, would move to lost+found and would have reset inode 6600634561 nlinks from 164 to 140 Quote Link to comment
trurl Posted December 31, 2023 Share Posted December 31, 2023 Do it again without -n, if it asks for it use -L. Post the results. Quote Link to comment
privateer Posted January 1 Author Share Posted January 1 Fixed everything - thanks! Quote Link to comment
privateer Posted January 1 Author Share Posted January 1 19 hours ago, trurl said: Do it again without -n, if it asks for it use -L. Post the results. Sorry I didn't save the results. Ran it and it looked clean after I repaired. 30gb ended up in lost+found. However, a parity check started overnight and now I have tons of sync errors.. Quote Link to comment
trurl Posted January 1 Share Posted January 1 1 hour ago, privateer said: tons of sync errors Did you do the filesystem check from the command line? Sounds like you may have gotten the command wrong and invalidated parity. Better to use the webUI it will use the correct command. Quote Link to comment
trurl Posted January 1 Share Posted January 1 If you did the check of the sd device and not the md device then that would invalidate parity. Checking md device keeps parity in sync with changes. Quote Link to comment
privateer Posted January 1 Author Share Posted January 1 2 hours ago, trurl said: If you did the check of the sd device and not the md device then that would invalidate parity. Checking md device keeps parity in sync with changes. I did the MD device per the instructions in the Unraid docs. 2 hours ago, trurl said: Did you do the filesystem check from the command line? Sounds like you may have gotten the command wrong and invalidated parity. Better to use the webUI it will use the correct command. I used the UI, not the command line. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.