Write time tree block corruption detected / IO failure

Narvath · April 7, 2023

Greetings,
I noticed that my containers were not working anymore and hopped on the logs and sure enough, to my understanding, there seems to be something corrupted on the cache(?) due to and IO failure:

Quote

Apr 6 20:24:25 Tower kernel: BTRFS error (device loop2): block=154501120 write time tree block corruption detected
Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction)
Apr 6 20:24:25 Tower kernel: BTRFS info (device loop2: state E): forced readonly
Apr 6 20:24:25 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction.
Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure
Apr 6 20:25:35 Tower emhttpd: read SMART /dev/sde
Apr 6 20:25:55 Tower unraid-api[2931]: ⚠️ Caught exception: read ECONNRESET
Apr 6 20:25:56 Tower unraid-api[2931]: ⚠️ UNRAID API crashed with exit code 1
Apr 6 20:36:07 Tower webGUI: Successful login user root from 172.17.0.2
Apr 6 20:39:30 Tower kernel: monitor_nchan[1365]: segfault at 5cb6d8 ip 000000000043ad80 sp 00007fff151e93e0 error 4 in bash[426000+c5000]
Apr 6 20:39:30 Tower kernel: Code: 55 48 89 fd 53 48 8b 5f 08 48 85 db 74 74 4c 8d 35 03 29 0b 00 4c 8d 2d 9b a0 0c 00 4c 8d 25 36 3d 0b 00 0f 1f 80 00 00 00 00 <48> 8b 43 08 48 83 3b 00 4c 89 e2 4c 89 f7 49 0f 44 d5 48 8b 30 31
Apr 6 20:54:28 Tower emhttpd: spinning down /dev/sdd
Apr 6 20:56:53 Tower emhttpd: spinning down /dev/sdc
Apr 6 21:06:09 Tower emhttpd: spinning down /dev/sde

Searched a bit and saw that for many people the issue wasn't really related to an actual IO failure and so, I'd like to know if I really sould look into the physical side of things or not.
As always, the diagnostic zip is in the atttached files.
Also, there isn't any important information on the server which is why there is no parity for the array and cache. Altough I might seriously consiter it if that could prevent such problems from happening again...

tower-diagnostics-20230407-0959.zip

JorgeB · April 7, 2023

10 minutes ago, Narvath said:

write time tree block corruption detected

This usually means bad RAM, and the diags show a clear bit flip:

parent transid verify failed on 23330816 wanted 144115188080351984 found 4496112

hex(4496112)=449AF0

hex(144115188080351984)=200000000449AF0

So start by running memtest

Narvath · April 7, 2023

Ok well, I was going to ask in what setting I should run it but in mere sefonds and I have 110+ errors...

I'll try to test it in another slot juust in case but assuming thats really the ram, replace it and then what should I do for unraid to get out of read only? (and track which file was corrupted so that I can delete/replace it)

JorgeB · April 7, 2023

Post new diags after array start when the RAM issue if solved.

Narvath · April 26, 2023

Ok, so it took quite a while to recieve the new RAM replacement but it's here now.
Here is the new diagnostic with the new RAM (which passed MemTest).
As of now, it seems like the network is unreachable (among some other things).
And I'm clueless on what should be done next...

tower-diagnostics-20230426-1123.zip

JorgeB · April 26, 2023

Cache has filesystem issues, likely form the previous bad RAM, backup what you can and reformat.

Narvath · April 26, 2023

Backup what is in the cache or everything on the array too?

JorgeB · April 26, 2023

Cache.

Narvath · April 26, 2023

Done, it looks like basically every docker containers are dead or got their settings reset as I could not get their appdata folder backed up. Not the end of the world, simply annoying to set up everything back up.
Thanks for the help, it's been very appreciated!
Now I'll go set up the appdata backup plugin and look into a raid 1 cache pool to prevent that kind of loss from happening again.

Thanks again

Write time tree block corruption detected / IO failure

Recommended Posts

Narvath

Link to comment

JorgeB

Link to comment

Narvath

Link to comment

JorgeB

Link to comment

Narvath

Link to comment

JorgeB

Link to comment

Narvath

Link to comment

JorgeB

Link to comment

Narvath

Link to comment

Join the conversation