Disk Errors, throwing Parity check on bott but SMART status healthy


Go to solution Solved by JorgeB,

Recommended Posts

If anyone better at this kinda stuff could tell me what they think I'd be very appreciative.

 

After a reboot or even safe powerdown, when the system is started up it immediately starts up a parity check, sometimes finding sync errors and sometimes not. I don't want to keep running parity checks so I had a look at the logs and it seems my disk 2 has XFS errors with corrupted meta data...

```

Aug  3 23:07:13 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Aug  3 23:07:13 Tower kernel: ata2.00: cmd 60/40:f8:c0:75:1e/05:00:00:00:00/40 tag 31 ncq dma 688128 in
Aug  3 23:07:13 Tower kernel:         res 40/00:f0:00:71:1e/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Aug  3 23:07:13 Tower kernel: ata2.00: status: { DRDY }
Aug  3 23:07:13 Tower kernel: ata2: hard resetting link
Aug  3 23:07:17 Tower kernel: mdcmd (37): nocheck cancel
Aug  3 23:07:19 Tower kernel: ata2: found unknown device (class 0)
Aug  3 23:07:20 Tower kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug  3 23:07:20 Tower kernel: ata2.00: configured for UDMA/100
Aug  3 23:07:20 Tower kernel: ata2: EH complete
Aug  3 23:07:22 Tower kernel: md: recovery thread: exit status: -4
Aug  3 23:07:34 Tower kernel: XFS (md2p1): Metadata corruption detected at xfs_dir3_data_reada_verify+0x53/0x64 [xfs], xfs_dir3_data_reada block 0x3d02fa30 
Aug  3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair
Aug  3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer:
Aug  3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e  ".x..3V.=.Md$RV.
Aug  3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99  \M.V......{.mHE.
Aug  3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d  o..X.T.ul.....n.
Aug  3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3  B.b*<.J1..X.]...
Aug  3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94  ...0.G.O...;....
Aug  3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b  .oG.......e....k
Aug  3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2  .q.j...a..x.6...
Aug  3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29  .'..(.X....^az.)
Aug  3 23:07:34 Tower kernel: XFS (md2p1): Metadata CRC error detected at xfs_dir3_block_read_verify+0x7c/0xf1 [xfs], xfs_dir3_block block 0x3d02fa30 
Aug  3 23:07:34 Tower kernel: XFS (md2p1): Unmount and run xfs_repair
Aug  3 23:07:34 Tower kernel: XFS (md2p1): First 128 bytes of corrupted metadata buffer:
Aug  3 23:07:34 Tower kernel: 00000000: 22 aa 78 d4 c1 33 56 90 3d dd 4d 64 24 52 56 2e  ".x..3V.=.Md$RV.
Aug  3 23:07:34 Tower kernel: 00000010: 5c 4d af 56 e8 16 83 e2 c2 a2 7b 8d 6d 48 45 99  \M.V......{.mHE.
Aug  3 23:07:34 Tower kernel: 00000020: 6f ba fa 58 d9 54 aa 75 6c af d4 c7 1e 1c 6e 8d  o..X.T.ul.....n.
Aug  3 23:07:34 Tower kernel: 00000030: 42 a7 62 2a 3c ee 4a 31 d4 ab 58 a8 5d 81 ea a3  B.b*<.J1..X.]...
Aug  3 23:07:34 Tower kernel: 00000040: 9a be b5 30 d4 47 bf 4f 16 cd a8 3b e5 93 02 94  ...0.G.O...;....
Aug  3 23:07:34 Tower kernel: 00000050: f9 6f 47 83 de 9f 9d 95 0f f5 65 f5 1f 07 13 6b  .oG.......e....k
Aug  3 23:07:34 Tower kernel: 00000060: 05 71 fc 6a 93 fc f2 61 b5 c3 78 c2 36 18 0c e2  .q.j...a..x.6...
Aug  3 23:07:34 Tower kernel: 00000070: e2 27 e2 c7 28 a4 58 13 91 e7 da 5e 61 7a fb 29  .'..(.X....^az.)
Aug  3 23:07:34 Tower kernel: XFS (md2p1): metadata I/O error in "xfs_da_read_buf+0x9a/0xff [xfs]" at daddr 0x3d02fa30 len 8 error 74

```

It says  to unmount the disk and run XFS repair, is there anything I should know before doing this? I have my data backed up so I'm not stressed but I'd like to not cause any further damage or complications.

Also, what would have caused this? The system is on a UPS and inverter for backup power... Could it have been the docker issues with 6.12.0? (I upgraded when it was released but now I'm on 6.12.3)

I dont think it's a faulty cable as I had replaced the ones for my main array fairly recently although still possible I guess...

Thanks in advance!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.