deano_southafrican Posted August 3, 2023 Share Posted August 3, 2023 If anyone better at this kinda stuff could tell me what they think I'd be very appreciative. After a reboot or even safe powerdown, when the system is started up it immediately starts up a parity check, sometimes finding sync errors and sometimes not. I don't want to keep running parity checks so I had a look at the logs and it seems my disk 2 has XFS errors with corrupted meta data... ``` Aug 3 23:07:13 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Aug 3 23:07:13 Tower kernel: ata2.00: cmd 60/40:f8:c0:75:1e/05:00:00:00:00/40 tag 31 ncq dma 688128 in Aug 3 23:07:13 Tower kernel: res 40/00:f0:00:71:1e/00:00:00:00:00/40 Emask 0x50 (ATA bus error) Aug 3 23:07:13 Tower kernel: ata2.00: status: { DRDY } Aug 3 23:07:13 Tower kernel: ata2: hard resetting link Aug 3 23:07:17 Tower kernel: mdcmd (37): nocheck cancel Aug 3 23:07:19 Tower kernel: ata2: found unknown device (class 0) Aug 3 23:07:20 Tower kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 3 23:07:20 Tower kernel: ata2.00: configured for UDMA/100 Aug 3 23:07:20 Tower kernel: ata2: EH complete Aug 3 23:07:22 Tower kernel: md: recovery thread: exit status: -4 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Metadata corruption detected at xfs_dir3_data_reada_verify+0x53/0x64 [xfs], xfs_dir3_data_reada block 0x3d02fa30 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair Aug 3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer: Aug 3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e ".x..3V.=.Md$RV. Aug 3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99 \M.V......{.mHE. Aug 3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d o..X.T.ul.....n. Aug 3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3 B.b*<.J1..X.]... Aug 3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94 ...0.G.O...;.... Aug 3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b .oG.......e....k Aug 3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2 .q.j...a..x.6... Aug 3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29 .'..(.X....^az.) Aug 3 23:07:34 Tower kernel: XFS (md2p1): Metadata CRC error detected at xfs_dir3_block_read_verify+0x7c/0xf1 [xfs], xfs_dir3_block block 0x3d02fa30 Aug 3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair Aug 3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer: Aug 3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e ".x..3V.=.Md$RV. Aug 3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99 \M.V......{.mHE. Aug 3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d o..X.T.ul.....n. Aug 3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3 B.b*<.J1..X.]... Aug 3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94 ...0.G.O...;.... Aug 3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b .oG.......e....k Aug 3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2 .q.j...a..x.6... Aug 3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29 .'..(.X....^az.) Aug 3 23:07:34 Tower kernel: XFS (md2p1): metadata I/O error in "xfs_da_read_buf+0x9a/0xff [xfs]" at daddr 0x3d02fa30 len 8 error 74 ``` It says to unmount the disk and run XFS repair, is there anything I should know before doing this? I have my data backed up so I'm not stressed but I'd like to not cause any further damage or complications. Also, what would have caused this? The system is on a UPS and inverter for backup power... Could it have been the docker issues with 6.12.0? (I upgraded when it was released but now I'm on 6.12.3) I dont think it's a faulty cable as I had replaced the ones for my main array fairly recently although still possible I guess... Thanks in advance! Quote Link to comment
deano_southafrican Posted August 3, 2023 Author Share Posted August 3, 2023 Here's the Diags just in case the above isn't enough. tower-diagnostics-20230803-2351.zip Quote Link to comment
Solution JorgeB Posted August 4, 2023 Solution Share Posted August 4, 2023 Your getting an unclean shutdown, start by adding 60 secs to the default timeout (Settings -> Disk Settings), if it doesn't help time how long it takes for the array to stop. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.