Write time tree block corruption detected / IO failure


Go to solution Solved by JorgeB,

Recommended Posts

Greetings, 
I noticed that my containers were not working anymore and hopped on the logs and sure enough, to my understanding, there seems to be something corrupted on the cache(?) due to and IO failure:

 

Quote

Apr  6 20:24:25 Tower kernel: BTRFS error (device loop2): block=154501120 write time tree block corruption detected
Apr  6 20:24:25 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction)
Apr  6 20:24:25 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Apr  6 20:24:25 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Apr  6 20:24:25 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure
Apr  6 20:25:35 Tower  emhttpd: read SMART /dev/sde
Apr  6 20:25:55 Tower unraid-api[2931]: ⚠️ Caught exception: read ECONNRESET
Apr  6 20:25:56 Tower unraid-api[2931]: ⚠️ UNRAID API crashed with exit code 1
Apr  6 20:36:07 Tower webGUI: Successful login user root from 172.17.0.2
Apr  6 20:39:30 Tower kernel: monitor_nchan[1365]: segfault at 5cb6d8 ip 000000000043ad80 sp 00007fff151e93e0 error 4 in bash[426000+c5000]
Apr  6 20:39:30 Tower kernel: Code: 55 48 89 fd 53 48 8b 5f 08 48 85 db 74 74 4c 8d 35 03 29 0b 00 4c 8d 2d 9b a0 0c 00 4c 8d 25 36 3d 0b 00 0f 1f 80 00 00 00 00 <48> 8b 43 08 48 83 3b 00 4c 89 e2 4c 89 f7 49 0f 44 d5 48 8b 30 31
Apr  6 20:54:28 Tower  emhttpd: spinning down /dev/sdd
Apr  6 20:56:53 Tower  emhttpd: spinning down /dev/sdc
Apr  6 21:06:09 Tower  emhttpd: spinning down /dev/sde

 

Searched a bit and saw that for many people the issue wasn't really related to an actual IO failure and so, I'd like to know if I really sould look into the physical side of things or not. 
As always, the diagnostic zip is in the atttached files.
Also, there isn't any important information on the server which is why there is no parity for the array and cache. Altough I might seriously consiter it if that could prevent such problems from happening again...

 

tower-diagnostics-20230407-0959.zip

Link to comment
  • Solution
10 minutes ago, Narvath said:

write time tree block corruption detected

This usually means bad RAM, and the diags show a clear bit flip:

 

parent transid verify failed on 23330816 wanted 144115188080351984 found 4496112

 

hex(4496112)=449AF0

hex(144115188080351984)=200000000449AF0

 

So start by running memtest

  • Upvote 1
Link to comment

Ok well, I was going to ask in what setting I should run it but in mere sefonds and I have 110+ errors... 

I'll try to test it in another slot juust in case but assuming thats really the ram, replace it and then what should I do for unraid to get out of read only? (and track which file was corrupted so that I can delete/replace it) 

 

Link to comment
  • 3 weeks later...

Done, it looks like basically every docker containers are dead or got their settings reset as I could not get their appdata folder backed up. Not the end of the world, simply annoying to set up everything back up. 
Thanks for the help, it's been very appreciated!
Now I'll go set up the appdata backup plugin and look into a raid 1 cache pool to prevent that kind of loss from happening again.

Thanks again

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.