UNRAID 6.5.3 - Disk Issue from system log error?


Recommended Posts

Keep getting recurring errors in my system log:

Jan 15 10:06:19 NASserver kernel: BTRFS critical (device md6): corrupt leaf, slot offset bad: block=1187753492480, root=1, slot=159

Jan 15 10:06:19 NASserver kernel: BTRFS critical (device md6): corrupt leaf, slot offset bad: block=1187753492480, root=1, slot=159

 

And then this one repeats every 10 seconds:

Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: AER: Corrected error received: id=0000
Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=001a(Transmitter ID)
Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: device [1022:1453] error status/mask=00001000/00006000
Jan 15 10:53:36 NASserver kernel: pcieport 0000:00:03.2: [12] Replay Timer Timeout 

 

Not sure if I have had a flaky disk (md6) and that has caused the stability issues.  Need help in interpreting what is happening with the system.

 

Before I noticed the errors in the log this morning I was attempting to upgrade the system to 6.6.6 last night.  When I upgraded to 6.6.6 last night, the system stopped disk 6/md6 in the middle of the night and make it read-only and it really screwed up the system.  Backed off the update this morning to 6.5.3 and it is all stable again.  This is not the first issue with this system and the 6.6.x update.   Previous upgrades to 6.6.x can't stay up without the shares disappearing after 24 hours of uptime.  When on 6.5.x months of stable uptime.

Edited by gdeyoung
Link to comment

Disk6 has metadata corruption, best way to resolve is to backup/move the data, format that disk and move the data back.

 

As for what caused the corruption, I see you're using Ryzen, make sure RAM isn't overclocked, several cases on the forum where Ryzen will corrupt data with overclocked RAM, or better yet use a server with ECC RAM.

 

Also metadata for btrfs withi HDDs should be used with the dup (duplicate) profile, so it can recover from some corruptions, this is going to be the default for the future, but it's not currently, you can convert each disk with:

 

btrfs balance start -mconvert=dup /mnt/diskX

For the PCIe erros, this is usually a board compatibility problem, look for a bios update, change the offending card to a different PCIe slot, or replace the board with a different model.

Link to comment

Thanks for the reply.  I have been working on this and running into a few errors.  I'm trying to empty the md6 drive (the one with the BTRFS errors) to be able to reformat.  The metadata problem is so bad it fails on the copy so I can't empty the drive.  I have two copies of the data on other servers.  I just want to reformat the drive and sync the data back.  How do I reformat and not have a rebuild on a drive I can't empty?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.