openam Posted December 19, 2023 Share Posted December 19, 2023 (edited) My docker just stopped working. I did try adding a new docker using the TRaSH guide. Not sure if something I did in there could have caused the problems. There are some BTFRS errors in the syslog Dec 18 03:18:05 palazzo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Dec 18 03:18:05 palazzo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 There was similar issue the other day. I restarted it, and it seemed to be working fine. I think I may try restarting, and disabling the 2 new docker containers from the TRaSH guide. palazzo-diagnostics-20231218-1929.zip palazzo-diagnostics-20231215-1819.zip Edited December 23, 2023 by openam Quote Link to comment
JorgeB Posted December 19, 2023 Share Posted December 19, 2023 Pool is detecting data corruption, good idea to run memtest. Also change docker network to ipvlan. Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 Attached picture of memtest run. Did 4+ passes with 0 error. I have also rebooted and set network to ipvlan. It's running parity check again, because of unclean shutdown. I'm guessing that's because the errors that were present. It looks like some of the errors in the syslog had to do with btrfs. The only btrfs drive I have is the cache drive. Is it possible that there is just something corrupt in the docker.img. Would blowing that away, and rebuilding it be of any use? If so is are there detailed instructions for that? Quote Link to comment
itimpi Posted December 20, 2023 Share Posted December 20, 2023 The diagnostics show that there appears to be corruption on the cache drive itself, and internally within the docker.img file (the loop device which is presumably on the same drive). You need to fix the cache level before attempting to fix the docker.img level. Probably best to start with a scrub of the cache drive to see how that goes. Instructions for recreating/repopulating the docker.img file are here in the online documentation accessible via the Manual link at the bottom of the Unraid GUI. Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 What do you mean by a scrub of the cache drive? How do I go about that? Quote Link to comment
JorgeB Posted December 20, 2023 Share Posted December 20, 2023 Click on the pool then scroll down to the scrub section. 1 Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 It finished the scrub, and didn't find anything I did however see this in the syslog. I'm guessing I might just be able to delete that file? Dec 19 21:59:25 palazzo webGUI: Successful login user root from 10.0.0.127 Dec 19 23:31:36 palazzo monitor: Stop running nchan processes Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1 Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 56, gen 0 Dec 20 01:41:50 palazzo shfs: copy_file: /mnt/cache/tv/parents/Show Name/Season 01/Show Name.01x05.Episode Name.mkv /mnt/disk5/tv/parents/Show Name/Season 01/Show Name.01x05.Episode Name.mkv.partial (5) Input/output error Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1 Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 57, gen 0 Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1 Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 58, gen 0 Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1 Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 59, gen 0 Dec 20 01:41:50 palazzo kernel: BTRFS warning (device sdk1): csum failed root 5 ino 207444295 off 572157952 csum 0x6d807861 expected csum 0xbda7158d mirror 1 Dec 20 01:41:50 palazzo kernel: BTRFS error (device sdk1): bdev /dev/sdk1 errs: wr 0, rd 0, flush 0, corrupt 60, gen 0 Dec 20 02:20:03 palazzo kernel: PMS LoudnessCmd[971]: segfault at 0 ip 000014ad5e111080 sp 000014ad595240c8 error 4 in libswresample.so.4[14ad5e109000+18000] likely on CPU 6 (core 2, socket 0) Dec 20 02:20:03 palazzo kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Dec 20 02:21:36 palazzo kernel: PMS LoudnessCmd[4247]: segfault at 0 ip 000014ef176be080 sp 000014ef120790c8 error 4 in libswresample.so.4[14ef176b6000+18000] likely on CPU 7 (core 3, socket 0) Dec 20 02:21:36 palazzo kernel: Code: 01 cf 4c 39 c7 72 e3 c3 cc cc 8d 04 49 48 98 4d 89 c1 49 29 c1 48 63 c2 48 63 c9 49 39 f9 76 75 f2 0f 10 05 02 05 ff ff 66 90 <0f> bf 16 0f 57 c9 f2 0f 2a ca f2 0f 59 c8 f2 0f 11 0f 0f bf 14 06 Quote Link to comment
JorgeB Posted December 20, 2023 Share Posted December 20, 2023 14 minutes ago, openam said: and didn't find anything It did find 20 checksum errors, run a correcting scrub and post the output of that. Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 You're right, but it says they uncorrectable. Apparently I don't how to read that summary. Quote Link to comment
JorgeB Posted December 20, 2023 Share Posted December 20, 2023 The corrupt files should be listed in the syslog, delete/restore them, then re-run a scrub. Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 I was down to just 6, but delete it and ran again, and now it's showing 71. Do I just keep playing whack-a-mole? Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 Oh man. I've been deleting them out of /mnt/user instead of /mnt/cache Quote Link to comment
trurl Posted December 20, 2023 Share Posted December 20, 2023 /mnt/cache is included in /mnt/user. 1 Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 Every time I delete some and re-run it appears to get worse. Quote Link to comment
trurl Posted December 20, 2023 Share Posted December 20, 2023 I know you did memtest but btrfs csum errors are often caused by bad RAM. Did you have memory problems in the past? Quote Link to comment
openam Posted December 20, 2023 Author Share Posted December 20, 2023 I do not remember having memory problems in the past (which may mean I personally have them 😁). Another thing I did recently (about 1 week ago) was upgrade from 6.9.x to 6.12.x. Which makes me wonder if it's something like this guy was seeing, I have heard of people running memtest for many more hours than I did. Should I let in run all night? Quote Link to comment
JorgeB Posted December 21, 2023 Share Posted December 21, 2023 Couple of hours is usually enough to detect a serious issue. Quote Link to comment
openam Posted December 21, 2023 Author Share Posted December 21, 2023 I ended up letting it run all night, and still shows no errors. I did order some cheap replacement memory sticks that should be here later tonight though. Would it be possible to just switch out my entire cache drive for another SSD? I have a smaller one sitting around that I could format and throw in. Then try to copy over important things from the old device if it'll let me using unassigned devices? Where is the configuration for the dockers all stored, by that I mean the setup of template configurations, not the appdata, or volume mappings. Quote Link to comment
Solution JorgeB Posted December 21, 2023 Solution Share Posted December 21, 2023 Copy what you can from the current pool and then reformat or use a different device to see if it no longer happens. Quote Link to comment
openam Posted December 21, 2023 Author Share Posted December 21, 2023 What's the best way to copy from the current pool, just targeting `/mnt/cache` and rysnc to a new drive? Quote Link to comment
JorgeB Posted December 21, 2023 Share Posted December 21, 2023 For example, you can also use Dynamix File manager or midnight commander. Quote Link to comment
dirkinthedark Posted December 21, 2023 Share Posted December 21, 2023 I think Im having the same issue Quote Link to comment
openam Posted December 21, 2023 Author Share Posted December 21, 2023 So after I copy everything to a new cache do I just update the shares reference to use the new cache? Also I got these warnings, but that kind of makes sense since I'm coping from one to the other. Quote Link to comment
trurl Posted December 21, 2023 Share Posted December 21, 2023 5 hours ago, dirkinthedark said: I think Im having the same issue Start your own thread with a desciption of your problems and your diagnostics. Quote Link to comment
dirkinthedark Posted December 22, 2023 Share Posted December 22, 2023 I have, Im simply letting this user know many are having the same issue currently. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.