BradJ Posted September 17, 2022 Share Posted September 17, 2022 I noticed all my Dockers were not working. Upon trying to restart the Docker service I was getting "Docker service unable to start." I did a reboot to see if that would fix the issue. The diagnostics posted are before the reboot. Upon reboot, Docker failed to start again. I noticed a lot of BTRFS errors on my cache drives in the logs. So I ran a Scrub. No help - tons of errors in the logs. I don't know what to do next. These are relatively new redundant cache drives, just a few weeks old. The first diagnostics was right when I first noticed the problem. The second diagnostics is after the reboot and scrub. Please help! Brad tower-diagnostics-20220916-2012.zip tower-diagnostics-20220916-2055.zip Quote Link to comment
BradJ Posted September 17, 2022 Author Share Posted September 17, 2022 This is the last thing in the log when I try to start the Docker service: Sep 16 21:06:08 Tower root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error. Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 6449542684566666880 Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 12792500698256506278 Sep 16 21:06:08 Tower kernel: BTRFS warning (device loop2): couldn't read tree root Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): open_ctree failed Sep 16 21:06:08 Tower root: mount error Sep 16 21:06:08 Tower emhttpd: shcmd (651): exit status: 1 Should I recreate the docker image file or is something bigger happening here? Quote Link to comment
BradJ Posted September 17, 2022 Author Share Posted September 17, 2022 Okay, I have recreated the docker image and my docks are all working again. I'm just not sure what the underlying issue is/was. Can anything be determined from the logs? Quote Link to comment
Solution JorgeB Posted September 17, 2022 Solution Share Posted September 17, 2022 6 hours ago, BradJ said: I'm just not sure what the underlying issue is/was. Cache2 dropped offline: Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00 Sep 14 04:40:02 Tower kernel: blk_update_request: I/O error, dev sdk, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0 ### [PREVIOUS LINE REPEATED 1 TIMES] ### Sep 14 04:40:02 Tower kernel: BTRFS warning (device sdj1): lost page write due to IO error on /dev/sdk1 (-5) Check/replace cables and also see here for better pool monitoring. Quote Link to comment
BradJ Posted September 17, 2022 Author Share Posted September 17, 2022 JorgeB to the rescue again! I ran the script and I have tons of errors: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 304324335 [/dev/sdc1].read_io_errs 4301948 [/dev/sdc1].flush_io_errs 2290865 [/dev/sdc1].corruption_errs 14826483 [/dev/sdc1].generation_errs 16809 Do you recommend I replace the cache2 cable and then run another scrub? Quote Link to comment
JorgeB Posted September 17, 2022 Share Posted September 17, 2022 Yes, check/replace power cable also, and check that all errors were corrected by the scrub. Quote Link to comment
BradJ Posted September 22, 2022 Author Share Posted September 22, 2022 There may have been some tension on the SATA cable. I rerouted and reseated the SATA cable. I ran another Scrub and no errors are being reported. I reset the BTRFS stats according to the post you referenced about the BTRFS monitor script. After re-running the script all errors are now 0. The script is now scheduled to run daily to monitor the cache pool. Once again, thank you JorgeB. I would be lost without you. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.