Jump to content

BTRFS issues


BradJ
Go to solution Solved by JorgeB,

Recommended Posts

I noticed all my Dockers were not working.  Upon trying to restart the Docker service I was getting "Docker service unable to start."

 

I did a reboot to see if that would fix the issue.   The diagnostics posted are before the reboot.

 

Upon reboot, Docker failed to start again.  I noticed a lot of BTRFS errors on my cache drives in the logs.  So I ran a Scrub.  No help - tons of errors in the logs.

 

I don't know what to do next.  These are relatively new redundant cache drives,  just a few weeks old.  

 

The first diagnostics was right when I first noticed the problem.  The second diagnostics is after the reboot and scrub.

 

Please help!

 

Brad

 

tower-diagnostics-20220916-2012.zip tower-diagnostics-20220916-2055.zip

Link to comment

This is the last thing in the log when I try to start the Docker service:

 

Sep 16 21:06:08 Tower root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error.
Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 6449542684566666880
Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 12792500698256506278
Sep 16 21:06:08 Tower kernel: BTRFS warning (device loop2): couldn't read tree root
Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): open_ctree failed
Sep 16 21:06:08 Tower root: mount error

Sep 16 21:06:08 Tower emhttpd: shcmd (651): exit status: 1

 

Should I recreate the docker image file or is something bigger happening here?

 

Link to comment
  • Solution
6 hours ago, BradJ said:

I'm just not sure what the underlying issue is/was. 

 

Cache2 dropped offline:

 

Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s
Sep 14 04:40:02 Tower kernel: sd 2:0:0:0: [sdk] tag#22 CDB: opcode=0x2a 2a 00 00 00 08 80 00 00 08 00
Sep 14 04:40:02 Tower kernel: blk_update_request: I/O error, dev sdk, sector 2176 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
### [PREVIOUS LINE REPEATED 1 TIMES] ###
Sep 14 04:40:02 Tower kernel: BTRFS warning (device sdj1): lost page write due to IO error on /dev/sdk1 (-5)

 

Check/replace cables and also see here for better pool monitoring.

Link to comment

JorgeB to the rescue again! :)

 

I ran the script and I have tons of errors:

[/dev/sdb1].write_io_errs 0
[/dev/sdb1].read_io_errs 0
[/dev/sdb1].flush_io_errs 0
[/dev/sdb1].corruption_errs 0
[/dev/sdb1].generation_errs 0
[/dev/sdc1].write_io_errs 304324335
[/dev/sdc1].read_io_errs 4301948
[/dev/sdc1].flush_io_errs 2290865
[/dev/sdc1].corruption_errs 14826483
[/dev/sdc1].generation_errs 16809
 

Do you recommend I replace the cache2 cable and then run another scrub?

 

Link to comment

There may have been some tension on the SATA cable.  I rerouted and reseated the SATA cable.

 

I ran another Scrub and no errors are being reported.

 

I reset the BTRFS stats according to the post you referenced about the BTRFS monitor script.  After re-running the script all errors are now 0.

 

The script is now scheduled to run daily to monitor the cache pool.

 

Once again, thank you JorgeB.  I would be lost without you.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...