andyjayh Posted March 24, 2019 Share Posted March 24, 2019 Hi all, my UniFi docker has just failed for some reason. I noticed this yesterday along with a few other containers not responding so I rebooted my server. Everything came back ok apart from the UniFi Controller container which although had a green light against it, it only showed as 'server starting' when I connected to the Webui. When I look in the sys log I see the following error repeating all the time the controller container is running; Mar 23 21:35:03 Media kernel: BTRFS warning (device loop2): csum failed root 1796 ino 10101 off 41115648 csum 0xb3cf73fa expected csum 0x293aeffb mirror 1 This morning I got a message from Fix Common Problems that said the Docker image was read only. I stopped Docker and increased the size to see if this was the issue but no avail. Stopped Docker again and ran a scrub; scrub status for 266b6e65-1004-4619-8039-7a234e50f0f1 scrub started at Sun Mar 24 18:14:34 2019 and finished after 00:00:03 total bytes scrubbed: 3.65GiB with 2 errors error details: csum=2 corrected errors: 0, uncorrectable errors: 2, unverified errors: 0 All other dockers run fine, only when I start this one I see the csum errors appear in the sys log. The UniFi Controller never starts and all I get is a message saying server starting when I try and connect to the WebUI. I assume I need to remove the Docker Image file and restore from a CA backup? Obviously this will take all my containers back to the last backup, will this be a problem? I run Plex, Sonarr, Radarr, nzbget UniFi-video and Tautulli. Quote Link to comment
Squid Posted March 24, 2019 Share Posted March 24, 2019 2 minutes ago, andyjayh said: I assume I need to remove the Docker Image file and restore from a CA backup? Obviously this will take all my containers back to the last backup, will this be a problem? I run Plex, Sonarr, Radarr, nzbget UniFi-video and Tautulli. Nope. All the appdata is stored outside of the docker.img. Remove the docker image then go to Apps, Previous Apps, check off what you want and you're back in business Quote Link to comment
andyjayh Posted March 24, 2019 Author Share Posted March 24, 2019 43 minutes ago, Squid said: Nope. All the appdata is stored outside of the docker.img. Remove the docker image then go to Apps, Previous Apps, check off what you want and you're back in business Ah I see. So leave the appdata well alone in this case. Will give this a go, thanks. Quote Link to comment
andyjayh Posted March 24, 2019 Author Share Posted March 24, 2019 Hmm, not great. Did the above and what I am seeing in the Sys Log now is; Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2 Mar 24 21:54:59 Media kernel: BTRFS critical (device sdl1): corrupt node: root=7 block=93667393536 slot=86, bad key order, current (18446744073709551606 128 986823262208) next (18446744073707454454 128 986826604544) When I go to the WebUI it is cycling between a message about repairing the database and then server startup. Quote Link to comment
Squid Posted March 24, 2019 Share Posted March 24, 2019 (edited) 1 hour ago, andyjayh said: Hmm, not great. Did the above and what I am seeing in the Sys Log now is; Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2 Mar 24 21:54:59 Media kernel: BTRFS critical (device sdl1): corrupt node: root=7 block=93667393536 slot=86, bad key order, current (18446744073709551606 128 986823262208) next (18446744073707454454 128 986826604544) When I go to the WebUI it is cycling between a message about repairing the database and then server startup. Which means that the docker corruption was caused by an underlying file system problem on the cache drive. Not in a position to assist right now, but if @johnnie.blackis around he can help no problems Sent via telekinesis Edited March 24, 2019 by Squid Quote Link to comment
andyjayh Posted March 24, 2019 Author Share Posted March 24, 2019 Eek, sounds bad Thanks for your help though. Quote Link to comment
Squid Posted March 24, 2019 Share Posted March 24, 2019 1 hour ago, andyjayh said: Eek, sounds bad Thanks for your help though. Probably not. He's just the resident expert. Wouldn't be a bad idea to post your diagnostics though... Quote Link to comment
JorgeB Posted March 25, 2019 Share Posted March 25, 2019 10 hours ago, andyjayh said: Mar 24 21:54:59 Media kernel: BTRFS warning (device sdl1): csum failed root 5 ino 17093788 off 4571136 csum 0x3235bd95 expected csum 0x00000000 mirror 2 These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> Diagnostics Quote Link to comment
andyjayh Posted March 25, 2019 Author Share Posted March 25, 2019 Probably not. He's just the resident expert. Wouldn't be a bad idea to post your diagnostics though...Thanks, I hope not Good idea and I’ll sort that when I get home. Sent from my iPhone using Tapatalk Pro Quote Link to comment
andyjayh Posted March 25, 2019 Author Share Posted March 25, 2019 These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> DiagnosticsBrilliant, I’ll do that as soon as I get back home Sent from my iPhone using Tapatalk Pro Quote Link to comment
andyjayh Posted March 26, 2019 Author Share Posted March 26, 2019 23 hours ago, johnnie.black said: These are checksum errors, possible the result of one device dropping offline, please post the diagnostics: Tools -> Diagnostics Apologies for not doing this yesterday but here is my diagnostics report. I've also just noticed in my Dashboard that it is showing the 'Flash Log Docker' as 100% on the second of the three graphs! Not sure where this is located but none of my drives are full or near full including the flash drive. media-diagnostics-20190326-0843.zip Quote Link to comment
JorgeB Posted March 26, 2019 Share Posted March 26, 2019 Syslog rotated so can't see the beginning of the problem but it does look like a cache device dropped offline, reboot and after a few minutes of array usage grab and post new diags. Quote Link to comment
andyjayh Posted March 26, 2019 Author Share Posted March 26, 2019 18 minutes ago, johnnie.black said: Syslog rotated so can't see the beginning of the problem but it does look like a cache device dropped offline, reboot and after a few minutes of array usage grab and post new diags. Ok, thanks for looking. I've restarted the server and uploaded a new diagnostics file. Learning curve but I now understand that what I was seeing in the Dashboard was that my Docker Log was filling up and I suspect this is causing part of the issue, don't know if this was able to take one of the cache drives offline and therefore caused the corruption I am now experiencing? I need to understand why my log file is now filling up so quickly when my server has in the past been up for months at a time without issue. UniFi Controller and Video are the new Dockers so I suspect one of these is writing large amounts of log file entries? media-diagnostics-20190326-0944.zip Quote Link to comment
JorgeB Posted March 26, 2019 Share Posted March 26, 2019 At some time in the past there were errors writing to both cache devices, these a re hardware errors: Mar 26 09:36:26 Media kernel: BTRFS info (device sdl1): bdev /dev/sdl1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Mar 26 09:36:26 Media kernel: BTRFS info (device sdl1): bdev /dev/sdk1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 See here for more info on what to do: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582 Quote Link to comment
andyjayh Posted March 26, 2019 Author Share Posted March 26, 2019 Ok thanks for this. I've created the following script as suggested in the linked post; #!/bin/bash btrfs dev stats -c /mnt/cache if [[ $? -ne 0 ]]; then /usr/local/emhttp/webGui/scripts/notify -i warning -s "ERRORS on cache pool"; fi Ran with the following output; [/dev/sdl1].write_io_errs 7 [/dev/sdl1].read_io_errs 0 [/dev/sdl1].flush_io_errs 0 [/dev/sdl1].corruption_errs 0 [/dev/sdl1].generation_errs 0 [/dev/sdk1].write_io_errs 9 [/dev/sdk1].read_io_errs 0 [/dev/sdk1].flush_io_errs 0 [/dev/sdk1].corruption_errs 0 [/dev/sdk1].generation_errs 0 /tmp/user.scripts/tmpScripts/btrfs check/script: line 4: syntax error: unexpected end of file Not quite sure about the syntax error? As expected, some write errors. As suggested in your linked post this is most likely a cable fault so until I can purchase another pair of cable I should try and reseat them? Once I do this the only thing to do is scrub the cache drive? I can do this via selecting the primary cache drive and running the scrub tool? I've done this and it's reporting no errors. Quote Link to comment
JorgeB Posted March 26, 2019 Share Posted March 26, 2019 57 minutes ago, andyjayh said: Not quite sure about the syntax error? Sometimes copying from the forum inserts extra characters, copy/paste to notepad first. Quote Link to comment
andyjayh Posted April 16, 2019 Author Share Posted April 16, 2019 I'll start a new thread for this as it's clearly an issue with my Cache drives and so far I can't resolve and got completely stuck with how to fix the issues on the drives. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.