April 7, 20233 yr Greetings, I noticed that my containers were not working anymore and hopped on the logs and sure enough, to my understanding, there seems to be something corrupted on the cache(?) due to and IO failure: Quote Apr 6 20:24:25 Tower kernel: BTRFS error (device loop2): block=154501120 write time tree block corruption detected Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2418: errno=-5 IO failure (Error while writing out transaction) Apr 6 20:24:25 Tower kernel: BTRFS info (device loop2: state E): forced readonly Apr 6 20:24:25 Tower kernel: BTRFS warning (device loop2: state E): Skipping commit of aborted transaction. Apr 6 20:24:25 Tower kernel: BTRFS: error (device loop2: state EA) in cleanup_transaction:1982: errno=-5 IO failure Apr 6 20:25:35 Tower emhttpd: read SMART /dev/sde Apr 6 20:25:55 Tower unraid-api[2931]: ⚠️ Caught exception: read ECONNRESET Apr 6 20:25:56 Tower unraid-api[2931]: ⚠️ UNRAID API crashed with exit code 1 Apr 6 20:36:07 Tower webGUI: Successful login user root from 172.17.0.2 Apr 6 20:39:30 Tower kernel: monitor_nchan[1365]: segfault at 5cb6d8 ip 000000000043ad80 sp 00007fff151e93e0 error 4 in bash[426000+c5000] Apr 6 20:39:30 Tower kernel: Code: 55 48 89 fd 53 48 8b 5f 08 48 85 db 74 74 4c 8d 35 03 29 0b 00 4c 8d 2d 9b a0 0c 00 4c 8d 25 36 3d 0b 00 0f 1f 80 00 00 00 00 <48> 8b 43 08 48 83 3b 00 4c 89 e2 4c 89 f7 49 0f 44 d5 48 8b 30 31 Apr 6 20:54:28 Tower emhttpd: spinning down /dev/sdd Apr 6 20:56:53 Tower emhttpd: spinning down /dev/sdc Apr 6 21:06:09 Tower emhttpd: spinning down /dev/sde Searched a bit and saw that for many people the issue wasn't really related to an actual IO failure and so, I'd like to know if I really sould look into the physical side of things or not. As always, the diagnostic zip is in the atttached files. Also, there isn't any important information on the server which is why there is no parity for the array and cache. Altough I might seriously consiter it if that could prevent such problems from happening again... tower-diagnostics-20230407-0959.zip
April 7, 20233 yr Solution 10 minutes ago, Narvath said: write time tree block corruption detected This usually means bad RAM, and the diags show a clear bit flip: parent transid verify failed on 23330816 wanted 144115188080351984 found 4496112 hex(4496112)=449AF0 hex(144115188080351984)=200000000449AF0 So start by running memtest
April 7, 20233 yr Author Ok well, I was going to ask in what setting I should run it but in mere sefonds and I have 110+ errors... I'll try to test it in another slot juust in case but assuming thats really the ram, replace it and then what should I do for unraid to get out of read only? (and track which file was corrupted so that I can delete/replace it)
April 26, 20233 yr Author Ok, so it took quite a while to recieve the new RAM replacement but it's here now. Here is the new diagnostic with the new RAM (which passed MemTest). As of now, it seems like the network is unreachable (among some other things). And I'm clueless on what should be done next... tower-diagnostics-20230426-1123.zip
April 26, 20233 yr Cache has filesystem issues, likely form the previous bad RAM, backup what you can and reformat.
April 26, 20233 yr Author Done, it looks like basically every docker containers are dead or got their settings reset as I could not get their appdata folder backed up. Not the end of the world, simply annoying to set up everything back up. Thanks for the help, it's been very appreciated! Now I'll go set up the appdata backup plugin and look into a raid 1 cache pool to prevent that kind of loss from happening again. Thanks again
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.