ramiro Posted July 25, 2023 Share Posted July 25, 2023 Hello I got a very weird problem yesterday , suddenly my raid10 with 6 Disks went read only with this error : Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 8.8.8.8#53 Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 2003:dc:2f25:200:6ef0:49ff:fe45:27eb#53 Jul 24 23:07:09 ramiroserver dnsmasq[10431]: using nameserver 2003:dc:2f12:fb01:a236:9fff:febe:59dc#53 Jul 24 23:12:45 ramiroserver kernel: BTRFS critical (device sdh1): corrupt leaf: block=28466229657600 slot=142 extent bytenr=22370527346688 len=131072 invalid extent refs, have 1 expect >= inline 4222615553 Jul 24 23:12:45 ramiroserver kernel: BTRFS info (device sdh1): leaf 28466229657600 gen 610063 total ptrs 208 free space 59 owner 2 Jul 24 23:12:45 ramiroserver kernel: #011item 0 key (22370491891712 168 65536) itemoff 16230 itemsize 53 Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 365653 flags 1 Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 552994 flags 1 Jul 24 23:12:45 ramiroserver kernel: #011#011ref#0: extent data backref root 5 objectid 9942256 offset 409495535616 count 1 Jul 24 23:12:45 ramiroserver kernel: #011item 207 key (22370586890240 168 282624) itemoff 5259 itemsize 53 Jul 24 23:12:45 ramiroserver kernel: #011#011extent refs 1 gen 554551 flags 1 Jul 24 23:12:45 ramiroserver kernel: #011#011ref#0: extent data backref root 5 objectid 9942256 offset 552465006592 count 1 Jul 24 23:12:45 ramiroserver kernel: BTRFS error (device sdh1): block=28466229657600 write time tree block corruption detected Jul 24 23:12:45 ramiroserver kernel: BTRFS: error (device sdh1) in btrfs_commit_transaction:2460: errno=-5 IO failure (Error while writing out transaction) Jul 24 23:12:45 ramiroserver kernel: BTRFS info (device sdh1: state E): forced readonly Jul 24 23:12:45 ramiroserver kernel: BTRFS warning (device sdh1: state E): Skipping commit of aborted transaction. Jul 24 23:12:45 ramiroserver kernel: BTRFS: error (device sdh1: state EA) in cleanup_transaction:1958: errno=-5 IO failure I took the array offline and ran btrfs check without repair and it showed that everything is fine , so i restarted the server and now the array is fully working , i ran a scrub today with repair option that came back without any errors : UUID: 2df74f3a-347d-4ae8-9f13-e760dc67014a Scrub started: Tue Jul 25 01:44:23 2023 Status: finished Duration: 11:32:36 Total to scrub: 15.90TiB Rate: 401.11MiB/s Error summary: no errors found I am running a dual 2011 system with ecc memory and had no error whatsoever before the btrfs error , the only problem is that due to the macvlan issue , i had 1 unclean reset a week before this , had the same thing 2 months ago thad corrupted my other raid1 array but that one did not came back after restart so i had to reformat it. I am a litte worried now if this problem will come back and what is the real issue ? Thank you Quote Link to comment
ramiro Posted July 26, 2023 Author Share Posted July 26, 2023 So i also ran a balance and it also had no errors , so i will i have to worry now that this can happen any time again ? Quote Link to comment
JorgeB Posted July 31, 2023 Share Posted July 31, 2023 On 7/25/2023 at 1:33 PM, ramiro said: write time tree block corruption detected This usually means bad RAM or other kernel memory corruption, recommend running memtest. Quote Link to comment
ramiro Posted August 5, 2023 Author Share Posted August 5, 2023 Shouldn't there atleast be an ecc error then , or can there be undetected memory errors with ecc ? The array still runs fine , actually ran 2 scrubs already because of the monthly scrub. Quote Link to comment
JorgeB Posted August 6, 2023 Share Posted August 6, 2023 If you are using ECC RAM that's likely not the problem, could be other hardware issue or a btrfs bug, which Unraid release are you running? There has been an unusual number of reports of this happening with v6.12.x, so possibly some kerne/btrfs bug. Quote Link to comment
ramiro Posted August 6, 2023 Author Share Posted August 6, 2023 I am running 6.12.3 , i am running 6.12 since the RC release, could it maybe be a communcation problem between the cpu's (as far as i saw the qpi link also has error correction so it also shouldn't be a problem). Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.