September 17, 20223 yr I noticed all my Dockers were not working. Upon trying to restart the Docker service I was getting "Docker service unable to start." I did a reboot to see if that would fix the issue. The diagnostics posted are before the reboot. Upon reboot, Docker failed to start again. I noticed a lot of BTRFS errors on my cache drives in the logs. So I ran a Scrub. No help - tons of errors in the logs. I don't know what to do next. These are relatively new redundant cache drives, just a few weeks old. The first diagnostics was right when I first noticed the problem. The second diagnostics is after the reboot and scrub. Please help! Brad tower-diagnostics-20220916-2012.zip tower-diagnostics-20220916-2055.zip
September 17, 20223 yr Author This is the last thing in the log when I try to start the Docker service: Sep 16 21:06:08 Tower root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error. Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 6449542684566666880 Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 12792500698256506278 Sep 16 21:06:08 Tower kernel: BTRFS warning (device loop2): couldn't read tree root Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): open_ctree failed Sep 16 21:06:08 Tower root: mount error Sep 16 21:06:08 Tower emhttpd: shcmd (651): exit status: 1 Should I recreate the docker image file or is something bigger happening here?
September 17, 20223 yr Author Okay, I have recreated the docker image and my docks are all working again. I'm just not sure what the underlying issue is/was. Can anything be determined from the logs?
September 17, 20223 yr Community Expert Solution 6 hours ago, BradJ said: I'm just not sure what the underlying issue is/was. Cache2 dropped offline: Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00 Sep 14 04:40:02 Tower kernel: blk_update_request: I/O error, dev sdk, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0 ### [PREVIOUS LINE REPEATED 1 TIMES] ### Sep 14 04:40:02 Tower kernel: BTRFS warning (device sdj1): lost page write due to IO error on /dev/sdk1 (-5) Check/replace cables and also see here for better pool monitoring.
September 17, 20223 yr Author JorgeB to the rescue again! I ran the script and I have tons of errors: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 304324335 [/dev/sdc1].read_io_errs 4301948 [/dev/sdc1].flush_io_errs 2290865 [/dev/sdc1].corruption_errs 14826483 [/dev/sdc1].generation_errs 16809 Do you recommend I replace the cache2 cable and then run another scrub?
September 17, 20223 yr Community Expert Yes, check/replace power cable also, and check that all errors were corrected by the scrub.
September 22, 20223 yr Author There may have been some tension on the SATA cable. I rerouted and reseated the SATA cable. I ran another Scrub and no errors are being reported. I reset the BTRFS stats according to the post you referenced about the BTRFS monitor script. After re-running the script all errors are now 0. The script is now scheduled to run daily to monitor the cache pool. Once again, thank you JorgeB. I would be lost without you.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.