Jump to content

Metadata CRC Error, Btree Block is suspect, Bad Magic and other XFS repair errors.


Go to solution Solved by blackbird2150,

Recommended Posts

Hi All,

 

Been in hardware hell the last few months trying to build, and then rebuild to a system that meets my needs. I've gone through two bad mobo's and now having resolved all other things, now have this "one" last error.

 

It was discovered as I was running mover and the logs started throwing errors when attempting to write to Disk 2.  Below are a sampling of the errors:

Metadata CRC error detected at 0x44228d, xfs_bnobt block 0x1ffffffe8/0x1000
btree block 8/1 is suspect, error -74
bad magic # 0x58414746 in btbno block 8/1

 

I then ran the xfs repair while under maintanence mode (-nv) and it spits out these errors regularly. But here's where I'm struggling, I can't figure out what is causing it.

 

I have swapped disks, replaced disks, taken discs out of my hotswap bay and plugged directly in, swapped the HBA SAS cables. I feel I've done every permutation i can think of. Unraid always reports DISK 2 as having the errors (again, changed disk, cable, removed backplane out of equation)... I'm at a loss.

 

My setup is:

Jonsbo N3 case

ASRock z690m-itx

i5-13500

2x16gb DDR 3200

LSI-9211-8i - all 8 3.5" drives are connected through this HBA.

2x nvme on the mobo.

 

Any thoughts on how to move forward?

 

Thanks!

 

Link to comment

Memtest is only definitive if it finds errors, sometimes memory produces errors only under certain loads.

 

Try running with only 1 stick and see if the disk errors change, then switch sticks and repeat.

 

Also, make sure you disable any XMP profiles. XMP is overclocking, regardless of whether the motherboard and memory says it should support it.

Link to comment
Posted (edited)

I was swapping the ram in and out of slots and now I can't get it to post.

 

Both ram sticks inserted - 32gb reconggized - yes to CRC errors

I put RAM A into Slot 1 - 16gb recongized - Yes to CRC errors

I put RAM A into Slot 2 - 16gb recongized - Yes to CRC errors

 

I put RAM B into Slot 1 - started to fail posting

 

Now no combination posts, even RAM A alone.

 

Can't tell if I damaged it or have uncovered my issue lol.

 

EDIT: I can get it to post! Very interesting... I ordered some new sticks as a swap out, and i'll continue working on these.

Edited by blackbird2150
Link to comment

So I swapped a new set of ram in from a different brand, again 2x16gb ddr4 3200 and they were immediately recognized, posted fine. Same issue - still just with disk 2.

 

From the recommended cards list, there is a section:

Quote

Keep in mind that they need to be forward breakout cables (reverse breakout look the same but won't work, as the name implies they work for the reverse, SATA goes on the board/HBA and the miniSAS on a backplane), sometimes they are also called Mini SAS (SFF-8xxx Host) to 4X SATA (Target), this is the same as forward breakout.

 

I can't tell if that is my problem. The Jonsbo n3 has a backplane, with 8 individual sata connects and shared molex power (2x), and 2* SFF-8087.

 

I guess next step is to get the LSI card replaced with the seller.

 

 

 

 

Link to comment

I've taken the disk with the serial number associated with Disk 2 out of the backplane and connected it directly with the mobo via Sata. Same Error. I then swapped in a clean drive connected to the mobo directly via sata. Same Error.

 

The strange part to me is the errors are the exact same each time I run the xfs repair -nv. Like down to the addresses.

 

Taking a step back it feels I've actually also eliminated the LSI card as a culprit by doing the direct to mobo. So unless there is some defect on the mobo that would cause the same error addresses to be corrupted across both PCI channels and Sata port channels, is it possible that unraid is reporting bad information somehow, like a stale error?

 

This particular drive doesn't have much data on it so I'm not worried about data loss, is there a reason I shouldn't try and run the actual xfs repair now and see if that helps?

Link to comment
  • Solution

Running XFS repair via terminal ended up fixing this issue. Sharing in case anyone else comes across this thread in the future.

 

I ran it on the disk in question (Disk 2 for me). 16tb drive, took about 18 hours. Throughout it would say "Candidate Found, validating, unable to validate continuing". At the end it just said "sorry, couldn't find a candidate." and ended.

 

But! that fixed it. All tests and checks clear now. My gut says it wasn't actually an error I still had, but likely did previously, but the system was still reporting it.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...