Jump to content

Diagnosing parity check errors


Recommended Posts

I'm trying to figure what is causing the following parity check errors on my server.  So far I have replaced the motherboard, cpu, memory, and psu hoping this would solve the issue.  I wanted to upgrades the cpu anyways.  This was before the 8/23/2021 parity check but I'm still getting errors.  I'm currently running another parity check but sync errors are still coming up.

 

I have not replaced the hard drives, 2 LSI SAS9201-16e cards, 6 External Mini SAS cables, and a DIY DAS made up of 3 Dual Mini SAS SFF-8088 to SAS36P SFF-8087 Adapter and 6 Mini SAS 26Pin (SFF-8088) Male to 4 SATA 7Pin Female Cable.

 

Any ideas on where I should look next?

 

image.png.cfa05ec5d73d4eb6170e60087d8792a1.png

atlas-diagnostics-20210823-0831.zip

Link to comment

Disk10 has disconnected and doesn't appear in SMART. Check all connections, power and SATA, both ends, including splitters. Then post new diagnostics.

 

How long has disk10 been disabled? Did it just happen with this latest parity check or was it already disabled before you started the parity check?

 

Since your parity is invalid your best hope is to get disk10 working again.

Link to comment

disk10 became disabled during the the latest parity check. 

 

After building the new computer, I reattached the 6 External Mini SAS cables to the DAS and on boot-up, 4 of the hard drives were not recognized.  I unplugged and re-plugged the 6 External Mini SAS cables and on next boot-up all drives were recognized.  I then ran the latest parity check.

 

I suspect there's something going on with the SAS connection.  I will check all connections and reboot to see if disk10 comes back.

Link to comment
  • 4 weeks later...

I have replaced the sata and power connection and now disk ST4000DM000-1F2168_Z304M0WL is now showing up in Unassigned Devices.  I then assigned disk 10 to ST4000DM000-1F2168_Z304M0WL and get a warning that the disk data will be erased when the array is started.

 

I'm assuming instead of doing this, I need to do a "New Config" and recreate the parity drives.  I'm just not sure what the exact options I need to select on "New Config".  Also, will my other settings like Shares, dockers, and VM will still be there or will I really need to start from scratch?

 

 

Link to comment

I performed a "New Config" and parity was rebuilt.  I then ran a non correcting parity check and it returned errors.  I ran a 2nd non correcting parity check and it also returned errors.  I've attached the latest diagnostics.  Could the lsi 9201-16e cards have gone bad?  They are about the only things left that I haven't replaced besides the hard drives and power supply.

 atlas-diagnostics-20210922-0759.zip

Link to comment

I'm in the process of testing the disks starting with changing the parity disk.  I'm in the middle of a parity rebuild and I see the following in the system log.  I can't tell which disk has the corruption.  Not sure what "dm-2" is.

 

Sep 24 11:08:22 Atlas kernel: XFS (dm-2): Metadata corruption detected at xfs_buf_ioend+0x51/0x284 [xfs], xfs_inode block 0x1796d41c8 xfs_inode_buf_verify
Sep 24 11:08:22 Atlas kernel: XFS (dm-2): Unmount and run xfs_repair
Sep 24 11:08:22 Atlas kernel: XFS (dm-2): First 128 bytes of corrupted metadata buffer:
Sep 24 11:08:22 Atlas kernel: 00000000: 53 f8 8c c5 e2 3f 2c ba bf f3 6c 7f 50 4b 18 fa  S....?,...l.PK..
Sep 24 11:08:22 Atlas kernel: 00000010: 4c c8 06 8d 5b b5 0a 13 f6 e4 57 9d 8e e1 b0 86  L...[.....W.....
Sep 24 11:08:22 Atlas kernel: 00000020: d9 7e 70 f0 75 a8 8e 17 da b5 51 3a 59 31 38 f9  .~p.u.....Q:Y18.
Sep 24 11:08:22 Atlas kernel: 00000030: 2d 20 3f ef 04 d2 89 e5 57 67 5b 9d 6c 92 e7 72  - ?.....Wg[.l..r
Sep 24 11:08:22 Atlas kernel: 00000040: 3f 73 f8 9b b4 50 6e ae 74 11 01 27 40 76 3b 38  ?s...Pn.t..'@v;8
Sep 24 11:08:22 Atlas kernel: 00000050: ec 89 37 25 9d 42 11 e3 d3 28 2c 93 a8 e6 5c df  ..7%.B...(,...\.
Sep 24 11:08:22 Atlas kernel: 00000060: 01 77 8e a9 22 e2 bf 8b 6b 03 f2 c4 ce 23 3f 1e  .w.."...k....#?.
Sep 24 11:08:22 Atlas kernel: 00000070: ab 06 41 e8 81 d0 07 47 7f 3b ec 97 ba 47 f9 df  ..A....G.;...G..
Sep 24 11:08:22 Atlas kernel: XFS (dm-2): metadata I/O error in "xfs_imap_to_bp+0x5c/0xa2 [xfs]" at daddr 0x1796d41c8 len 32 error 117

 

atlas-diagnostics-20210924-1308.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...