[6.7] BTRFS errors, ACPI BIOS Errors


Recommended Posts

Hey Folks,

I could really use some help. I've run into trouble that I noticed after having upgraded to 6.7 from 6.6.7, but it might not be related to the upgrade.

 

I first noticed that Docker was filling up /var/log and my Docker containers were stopping on me. I turned off a number of my Docker containers and rebooted to see if I could isolate which Docker was log-spamming. That helped, but eventually my docker containers stopped again after a few hours, complaining about running out of disk space (on /mnt/user/appdata) where there's clearly enough disk space.

 

In that process, I was exploring the system logs and found the errors below.

 

I downgraded back to 6.6.7 from 6.7 to see if it'd fix it and it hasn't. The ACPI BIOS errors are present in 6.6.7. As are the BTRFS write errors... so maybe it's likely coincidence that these errors have come about shortly after the 6.7 upgrade. 

 

Now I'm now seeing errors from Fix Common Problems that "**** Unable to write to cache ****   **** Unable to write to Docker Image ****"... which is no surprise if the cache disks are having issues.

 

Any clue where to start on this issue?

May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330)
May 14 10:18:18 lenny kernel: loop: Write error at byte offset 852369408, length 4096.
May 14 10:18:18 lenny kernel: print_req_error: I/O error, dev loop2, sector 1664768
May 14 10:18:18 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0
May 14 11:07:15 lenny kernel: loop: Write error at byte offset 876773376, length 4096.
May 14 11:07:15 lenny kernel: print_req_error: I/O error, dev loop2, sector 1712448
May 14 11:07:15 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 21, rd 0, flush 0, corrupt 0, gen 0
May 14 19:19:21 lenny kernel: loop: Write error at byte offset 1133211648, length 4096.
May 14 19:19:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 2213280
May 14 19:19:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 22, rd 0, flush 0, corrupt 0, gen 0
May 14 20:06:46 lenny kernel: loop: Write error at byte offset 12656640, length 4096.
May 14 20:06:46 lenny kernel: print_req_error: I/O error, dev loop2, sector 24720
May 14 20:06:46 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 23, rd 0, flush 0, corrupt 0, gen 0
May 14 20:17:13 lenny kernel: dhcpcd[1674]: segfault at 88 ip 00000000004216d2 sp 00007ffd89a6dd70 error 4 in dhcpcd[407000+31000]
May 14 20:17:50 lenny kernel: loop: Write error at byte offset 146325504, length 4096.
May 14 20:17:50 lenny kernel: print_req_error: I/O error, dev loop2, sector 285792
May 14 20:17:50 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 24, rd 0, flush 0, corrupt 0, gen 0
May 14 20:20:20 lenny kernel: loop: Write error at byte offset 159694848, length 4096.
May 14 20:20:20 lenny kernel: print_req_error: I/O error, dev loop2, sector 311904
May 14 20:20:20 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 25, rd 0, flush 0, corrupt 0, gen 0
May 14 20:22:21 lenny kernel: loop: Write error at byte offset 16392192, length 4096.
May 14 20:22:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 32016
May 14 20:22:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 26, rd 0, flush 0, corrupt 0, gen 0
May 14 20:25:00 lenny kernel: loop: Write error at byte offset 167190528, length 4096.
May 14 20:25:00 lenny kernel: print_req_error: I/O error, dev loop2, sector 326528
May 14 20:25:00 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 27, rd 0, flush 0, corrupt 0, gen 0
May 14 20:26:51 lenny kernel: loop: Write error at byte offset 2207227904, length 4096.
May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 4310784
May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 28, rd 0, flush 0, corrupt 0, gen 0
May 14 20:26:51 lenny kernel: loop: Write error at byte offset 65536, length 4096.
May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128
May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128
May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 29, rd 0, flush 0, corrupt 0, gen 0
May 14 20:26:51 lenny kernel: BTRFS error (device loop2): error writing primary super block to device 1
May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in write_all_supers:3781: errno=-5 IO failure (1 errors while writing supers)
May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in cleanup_transaction:1846: errno=-5 IO failure

 

 

Edited by kmwoley
Attach diagnostics
Link to comment

You can ignore the ACPI errors, though a bios update might help.

 

The btrfs errors are from the docker image, caused by the cache pool filesystem being fully allocated, i.e., it's giving not enough space errors.

 

See below to fix it, but also keep in mind that you're running a raid1 with different size devices, and the free space reported on the GUI will be wrong, the max usable size is the same as the smallest device, so you're pretty close to actually being out of space.

 

https://lime-technology.com/forums/topic/62230-out-of-space-errors-on-cache-drive/?do=findComment&comment=610551

Link to comment
23 hours ago, johnnie.black said:

The btrfs errors are from the docker image, caused by the cache pool filesystem being fully allocated, i.e., it's giving not enough space errors.

 

Thanks for the pointer. After reading that thread, here's what I did... I'd appreciate if you could check my understanding and help me to know how to avoid this in the future.

 

1) Checked the space reported by btrfs...

root@lenny:~# btrfs fi usage /mnt/cache
Overall:
    Device size:                 352.13GiB
    Device allocated:            238.48GiB
    Device unallocated:          113.64GiB
    Device missing:                  0.00B
    Used:                        182.85GiB
    Free (estimated):             84.24GiB      (min: 84.24GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              376.52MiB      (used: 0.00B)

Data,RAID1: Size:118.21GiB, Used:90.79GiB
   /dev/sdb1     118.21GiB
   /dev/sdc1     118.21GiB

Metadata,RAID1: Size:1.00GiB, Used:645.45MiB
   /dev/sdb1       1.00GiB
   /dev/sdc1       1.00GiB

System,RAID1: Size:32.00MiB, Used:48.00KiB
   /dev/sdb1      32.00MiB
   /dev/sdc1      32.00MiB

Unallocated:
   /dev/sdb1     113.64GiB
   /dev/sdc1       1.05MiB

The way I read that, my RAID1 cache drives have 27.42GB free. Accept for that 1.05MB that's unallocated on /dev/sdc1 which I'm assuming is the problem...

 

2) I deleted some files off the drive(s) to make some space and ran the balance command as recommended:

btrfs balance start -dusage=75 /mnt/cache

That command completed with no errors.

 

Afterwords, here's what free space is reported...

root@lenny:/mnt/cache# btrfs fi usage /mnt/cache
Overall:
    Device size:                 352.13GiB
    Device allocated:            188.16GiB
    Device unallocated:          163.97GiB
    Device missing:                  0.00B
    Used:                        179.96GiB
    Free (estimated):             84.62GiB      (min: 84.62GiB)
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              330.16MiB      (used: 0.00B)

Data,RAID1: Size:92.05GiB, Used:89.41GiB
   /dev/sdb1      92.05GiB
   /dev/sdc1      92.05GiB

Metadata,RAID1: Size:2.00GiB, Used:585.58MiB
   /dev/sdb1       2.00GiB
   /dev/sdc1       2.00GiB

System,RAID1: Size:32.00MiB, Used:32.00KiB
   /dev/sdb1      32.00MiB
   /dev/sdc1      32.00MiB

Unallocated:
   /dev/sdb1     138.80GiB
   /dev/sdc1      25.16GiB

 

So, my question is... how much free space on the cache do I actually have now? How do I report on it and monitor it so I can set up some alerts when it's getting close to being a problem in the future?

 

Given the catastrophic nature of running out of cache space (i.e. my entire infrastructure ground to a halt), I can't rely on this system if this is going to happen without warning in the future.

 

Any guidance would be helpful.

Thanks!

Link to comment
5 minutes ago, kmwoley said:

Accept for that 1.05MB that's unallocated on /dev/sdc1 which I'm assuming is the problem...

Correct, because of that no new metadata chunks could be allocated.

 

5 minutes ago, kmwoley said:

So, my question is... how much free space on the cache do I actually have now?

Smallest device is 128GB, you currently have around 96.6GB (90GiB) used, so you have around 31.4GB free.

Link to comment

Thanks for your help. I kinda get it, but when looking at the disk usage before making these changes I would have also said that I had 31GB free. Am I interpreting that first `btrfs fi usage...` command wrong?

 

How am I to know in the future when a balance operation is required before I "run out of space"?

 

I ask because I'd like to write a cron job that warns me before I run out of space.

 

Thanks again for the help.

Link to comment
4 hours ago, kmwoley said:

How am I to know in the future when a balance operation is required before I "run out of space"?

This shouldn't happen again on the newer kernels after a balance, but if you want to keep an eye on it monitor the unallocated space for the smallest device, if it gets below 1GiB there could be a problem again.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.