Narvath Posted April 7, 2023 Share Posted April 7, 2023 Greetings, I noticed that my containers were not working anymore and hopped on the logs and sure enough, to my understanding, there seems to be something corrupted on the cache(?) due to and IO failure: Quote Apr 6 20:24:25 Tower kernel: BTRFS error (device loop2): block=154501120 write time tree block corruption detected Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction) Apr 6 20:24:25 Tower kernel: BTRFS info (device loop2: state E): forced readonly Apr 6 20:24:25 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction. Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure Apr 6 20:25:35 Tower emhttpd: read SMART /dev/sde Apr 6 20:25:55 Tower unraid-api[2931]: ⚠️ Caught exception: read ECONNRESET Apr 6 20:25:56 Tower unraid-api[2931]: ⚠️ UNRAID API crashed with exit code 1 Apr 6 20:36:07 Tower webGUI: Successful login user root from 172.17.0.2 Apr 6 20:39:30 Tower kernel: monitor_nchan[1365]: segfault at 5cb6d8 ip 000000000043ad80 sp 00007fff151e93e0 error 4 in bash[426000+c5000] Apr 6 20:39:30 Tower kernel: Code: 55 48 89 fd 53 48 8b 5f 08 48 85 db 74 74 4c 8d 35 03 29 0b 00 4c 8d 2d 9b a0 0c 00 4c 8d 25 36 3d 0b 00 0f 1f 80 00 00 00 00 <48> 8b 43 08 48 83 3b 00 4c 89 e2 4c 89 f7 49 0f 44 d5 48 8b 30 31 Apr 6 20:54:28 Tower emhttpd: spinning down /dev/sdd Apr 6 20:56:53 Tower emhttpd: spinning down /dev/sdc Apr 6 21:06:09 Tower emhttpd: spinning down /dev/sde Searched a bit and saw that for many people the issue wasn't really related to an actual IO failure and so, I'd like to know if I really sould look into the physical side of things or not. As always, the diagnostic zip is in the atttached files. Also, there isn't any important information on the server which is why there is no parity for the array and cache. Altough I might seriously consiter it if that could prevent such problems from happening again... tower-diagnostics-20230407-0959.zip Quote Link to comment
Solution JorgeB Posted April 7, 2023 Solution Share Posted April 7, 2023 10 minutes ago, Narvath said: write time tree block corruption detected This usually means bad RAM, and the diags show a clear bit flip: parent transid verify failed on 23330816 wanted 144115188080351984 found 4496112 hex(4496112)=449AF0 hex(144115188080351984)=200000000449AF0 So start by running memtest 1 Quote Link to comment
Narvath Posted April 7, 2023 Author Share Posted April 7, 2023 Ok well, I was going to ask in what setting I should run it but in mere sefonds and I have 110+ errors... I'll try to test it in another slot juust in case but assuming thats really the ram, replace it and then what should I do for unraid to get out of read only? (and track which file was corrupted so that I can delete/replace it) Quote Link to comment
JorgeB Posted April 7, 2023 Share Posted April 7, 2023 Post new diags after array start when the RAM issue if solved. 1 Quote Link to comment
Narvath Posted April 26, 2023 Author Share Posted April 26, 2023 Ok, so it took quite a while to recieve the new RAM replacement but it's here now. Here is the new diagnostic with the new RAM (which passed MemTest). As of now, it seems like the network is unreachable (among some other things). And I'm clueless on what should be done next... tower-diagnostics-20230426-1123.zip Quote Link to comment
JorgeB Posted April 26, 2023 Share Posted April 26, 2023 Cache has filesystem issues, likely form the previous bad RAM, backup what you can and reformat. Quote Link to comment
Narvath Posted April 26, 2023 Author Share Posted April 26, 2023 Backup what is in the cache or everything on the array too? Quote Link to comment
Narvath Posted April 26, 2023 Author Share Posted April 26, 2023 Done, it looks like basically every docker containers are dead or got their settings reset as I could not get their appdata folder backed up. Not the end of the world, simply annoying to set up everything back up. Thanks for the help, it's been very appreciated! Now I'll go set up the appdata backup plugin and look into a raid 1 cache pool to prevent that kind of loss from happening again. Thanks again 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.