BTRFS raid10 corrupt leaf , but fine after reboot

July 25, 20232 yr

Hello

I got a very weird problem yesterday , suddenly my raid10 with 6 Disks went read only with this error :

Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 8.8.8.8#53
Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 2003:dc:2f25:200:6ef0:49ff:fe45:27eb#53
Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 2003:dc:2f12:fb01:a236:9fff:febe:59dc#53
Jul 24 23:12:45 ramiroserver kernel: BTRFS critical (device sdh1): corrupt leaf: block=28466229657600 slot=142 extent bytenr=22370527346688 len=131072 invalid extent refs, have 1 expect >= inline 4222615553
Jul 24 23:12:45 ramiroserver kernel: BTRFS info (device sdh1): leaf 28466229657600 gen 610063 total ptrs 208 free space 59 owner 2
Jul 24 23:12:45 ramiroserver kernel: #011item 0 key (22370491891712 168 65536) itemoff 16230 itemsize 53
Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 365653 flags 1
Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 552994 flags 1
Jul 24 23:12:45 ramiroserver kernel: #011#011ref#0: extent data backref root 5 objectid 9942256 offset 409495535616 count 1
Jul 24 23:12:45 ramiroserver kernel: #011item 207 key (22370586890240 168 282624) itemoff 5259 itemsize 53
Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 554551 flags 1
Jul 24 23:12:45 ramiroserver kernel: #011#011ref#0: extent data backref root 5 objectid 9942256 offset 552465006592 count 1
Jul 24 23:12:45 ramiroserver kernel: BTRFS error (device sdh1): block=28466229657600 write time tree block corruption detected
Jul 24 23:12:45 ramiroserver kernel: BTRFS: error (device sdh1) in btrfs_commit_transaction:2460: errno=-5 IO failure (Error while writing out transaction)
Jul 24 23:12:45 ramiroserver kernel: BTRFS info (device sdh1: state E): forced readonly
Jul 24 23:12:45 ramiroserver kernel: BTRFS warning (device sdh1: state E): Skipping commit of aborted transaction.
Jul 24 23:12:45 ramiroserver kernel: BTRFS: error (device sdh1: state EA) in cleanup_transaction:1958: errno=-5 IO failure

I took the array offline and ran btrfs check without repair and it showed that everything is fine , so i restarted the server and now the array is fully working , i ran a scrub today with repair option that came back without any errors :

UUID:             2df74f3a-347d-4ae8-9f13-e760dc67014a
Scrub started:    Tue Jul 25 01:44:23 2023
Status:           finished
Duration:         11:32:36
Total to scrub:   15.90TiB
Rate:             401.11MiB/s
Error summary:    no errors found

I am running a dual 2011 system with ecc memory and had no error whatsoever before the btrfs error , the only problem is that due to the macvlan issue , i had 1 unclean reset a week before this , had the same thing 2 months ago thad corrupted my other raid1 array but that one did not came back after restart so i had to reformat it.

I am a litte worried now if this problem will come back and what is the real issue ?

Thank you

Quote

July 26, 20232 yr

Author

So i also ran a balance and it also had no errors , so i will i have to worry now that this can happen any time again ?

Quote

July 31, 20232 yr

Community Expert

On 7/25/2023 at 1:33 PM, ramiro said:
write time tree block corruption detected

This usually means bad RAM or other kernel memory corruption, recommend running memtest.

Quote

August 5, 20232 yr

Author

Shouldn't there atleast be an ecc error then , or can there be undetected memory errors with ecc ?

The array still runs fine , actually ran 2 scrubs already because of the monthly scrub.

Quote

August 6, 20232 yr

Community Expert

If you are using ECC RAM that's likely not the problem, could be other hardware issue or a btrfs bug, which Unraid release are you running? There has been an unusual number of reports of this happening with v6.12.x, so possibly some kerne/btrfs bug.

Quote

August 6, 20232 yr

Author

I am running 6.12.3 , i am running 6.12 since the RC release, could it maybe be a communcation problem between the cpu's (as far as i saw the qpi link also has error correction so it also shouldn't be a problem).

Quote

BTRFS raid10 corrupt leaf , but fine after reboot

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)