kmwoley Posted May 15, 2019 Share Posted May 15, 2019 (edited) Hey Folks, I could really use some help. I've run into trouble that I noticed after having upgraded to 6.7 from 6.6.7, but it might not be related to the upgrade. I first noticed that Docker was filling up /var/log and my Docker containers were stopping on me. I turned off a number of my Docker containers and rebooted to see if I could isolate which Docker was log-spamming. That helped, but eventually my docker containers stopped again after a few hours, complaining about running out of disk space (on /mnt/user/appdata) where there's clearly enough disk space. In that process, I was exploring the system logs and found the errors below. I downgraded back to 6.6.7 from 6.7 to see if it'd fix it and it hasn't. The ACPI BIOS errors are present in 6.6.7. As are the BTRFS write errors... so maybe it's likely coincidence that these errors have come about shortly after the 6.7 upgrade. Now I'm now seeing errors from Fix Common Problems that "**** Unable to write to cache **** **** Unable to write to Docker Image ****"... which is no surprise if the cache disks are having issues. Any clue where to start on this issue? May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT1._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT2._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT0._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT3._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 07:51:55 lenny kernel: ACPI BIOS Error (bug): Could not resolve [\_SB.PCI0.SAT0.SPT5._GTF.DSSP], AE_NOT_FOUND (20180810/psargs-330) May 14 10:18:18 lenny kernel: loop: Write error at byte offset 852369408, length 4096. May 14 10:18:18 lenny kernel: print_req_error: I/O error, dev loop2, sector 1664768 May 14 10:18:18 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0 May 14 11:07:15 lenny kernel: loop: Write error at byte offset 876773376, length 4096. May 14 11:07:15 lenny kernel: print_req_error: I/O error, dev loop2, sector 1712448 May 14 11:07:15 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 21, rd 0, flush 0, corrupt 0, gen 0 May 14 19:19:21 lenny kernel: loop: Write error at byte offset 1133211648, length 4096. May 14 19:19:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 2213280 May 14 19:19:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 22, rd 0, flush 0, corrupt 0, gen 0 May 14 20:06:46 lenny kernel: loop: Write error at byte offset 12656640, length 4096. May 14 20:06:46 lenny kernel: print_req_error: I/O error, dev loop2, sector 24720 May 14 20:06:46 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 23, rd 0, flush 0, corrupt 0, gen 0 May 14 20:17:13 lenny kernel: dhcpcd[1674]: segfault at 88 ip 00000000004216d2 sp 00007ffd89a6dd70 error 4 in dhcpcd[407000+31000] May 14 20:17:50 lenny kernel: loop: Write error at byte offset 146325504, length 4096. May 14 20:17:50 lenny kernel: print_req_error: I/O error, dev loop2, sector 285792 May 14 20:17:50 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 24, rd 0, flush 0, corrupt 0, gen 0 May 14 20:20:20 lenny kernel: loop: Write error at byte offset 159694848, length 4096. May 14 20:20:20 lenny kernel: print_req_error: I/O error, dev loop2, sector 311904 May 14 20:20:20 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 25, rd 0, flush 0, corrupt 0, gen 0 May 14 20:22:21 lenny kernel: loop: Write error at byte offset 16392192, length 4096. May 14 20:22:21 lenny kernel: print_req_error: I/O error, dev loop2, sector 32016 May 14 20:22:21 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 26, rd 0, flush 0, corrupt 0, gen 0 May 14 20:25:00 lenny kernel: loop: Write error at byte offset 167190528, length 4096. May 14 20:25:00 lenny kernel: print_req_error: I/O error, dev loop2, sector 326528 May 14 20:25:00 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 27, rd 0, flush 0, corrupt 0, gen 0 May 14 20:26:51 lenny kernel: loop: Write error at byte offset 2207227904, length 4096. May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 4310784 May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 28, rd 0, flush 0, corrupt 0, gen 0 May 14 20:26:51 lenny kernel: loop: Write error at byte offset 65536, length 4096. May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128 May 14 20:26:51 lenny kernel: print_req_error: I/O error, dev loop2, sector 128 May 14 20:26:51 lenny kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 29, rd 0, flush 0, corrupt 0, gen 0 May 14 20:26:51 lenny kernel: BTRFS error (device loop2): error writing primary super block to device 1 May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in write_all_supers:3781: errno=-5 IO failure (1 errors while writing supers) May 14 20:26:51 lenny kernel: BTRFS: error (device loop2) in cleanup_transaction:1846: errno=-5 IO failure Edited May 17, 2019 by kmwoley Attach diagnostics Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
kmwoley Posted May 15, 2019 Author Share Posted May 15, 2019 8 hours ago, johnnie.black said: Please post the diagnostics: Tools -> Diagnostics Done. Attached to the original post. Sorry I left those off. Thanks! Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 You can ignore the ACPI errors, though a bios update might help. The btrfs errors are from the docker image, caused by the cache pool filesystem being fully allocated, i.e., it's giving not enough space errors. See below to fix it, but also keep in mind that you're running a raid1 with different size devices, and the free space reported on the GUI will be wrong, the max usable size is the same as the smallest device, so you're pretty close to actually being out of space. https://lime-technology.com/forums/topic/62230-out-of-space-errors-on-cache-drive/?do=findComment&comment=610551 Quote Link to comment
kmwoley Posted May 16, 2019 Author Share Posted May 16, 2019 23 hours ago, johnnie.black said: The btrfs errors are from the docker image, caused by the cache pool filesystem being fully allocated, i.e., it's giving not enough space errors. Thanks for the pointer. After reading that thread, here's what I did... I'd appreciate if you could check my understanding and help me to know how to avoid this in the future. 1) Checked the space reported by btrfs... root@lenny:~# btrfs fi usage /mnt/cache Overall: Device size: 352.13GiB Device allocated: 238.48GiB Device unallocated: 113.64GiB Device missing: 0.00B Used: 182.85GiB Free (estimated): 84.24GiB (min: 84.24GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 376.52MiB (used: 0.00B) Data,RAID1: Size:118.21GiB, Used:90.79GiB /dev/sdb1 118.21GiB /dev/sdc1 118.21GiB Metadata,RAID1: Size:1.00GiB, Used:645.45MiB /dev/sdb1 1.00GiB /dev/sdc1 1.00GiB System,RAID1: Size:32.00MiB, Used:48.00KiB /dev/sdb1 32.00MiB /dev/sdc1 32.00MiB Unallocated: /dev/sdb1 113.64GiB /dev/sdc1 1.05MiB The way I read that, my RAID1 cache drives have 27.42GB free. Accept for that 1.05MB that's unallocated on /dev/sdc1 which I'm assuming is the problem... 2) I deleted some files off the drive(s) to make some space and ran the balance command as recommended: btrfs balance start -dusage=75 /mnt/cache That command completed with no errors. Afterwords, here's what free space is reported... root@lenny:/mnt/cache# btrfs fi usage /mnt/cache Overall: Device size: 352.13GiB Device allocated: 188.16GiB Device unallocated: 163.97GiB Device missing: 0.00B Used: 179.96GiB Free (estimated): 84.62GiB (min: 84.62GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 330.16MiB (used: 0.00B) Data,RAID1: Size:92.05GiB, Used:89.41GiB /dev/sdb1 92.05GiB /dev/sdc1 92.05GiB Metadata,RAID1: Size:2.00GiB, Used:585.58MiB /dev/sdb1 2.00GiB /dev/sdc1 2.00GiB System,RAID1: Size:32.00MiB, Used:32.00KiB /dev/sdb1 32.00MiB /dev/sdc1 32.00MiB Unallocated: /dev/sdb1 138.80GiB /dev/sdc1 25.16GiB So, my question is... how much free space on the cache do I actually have now? How do I report on it and monitor it so I can set up some alerts when it's getting close to being a problem in the future? Given the catastrophic nature of running out of cache space (i.e. my entire infrastructure ground to a halt), I can't rely on this system if this is going to happen without warning in the future. Any guidance would be helpful. Thanks! Quote Link to comment
JorgeB Posted May 16, 2019 Share Posted May 16, 2019 5 minutes ago, kmwoley said: Accept for that 1.05MB that's unallocated on /dev/sdc1 which I'm assuming is the problem... Correct, because of that no new metadata chunks could be allocated. 5 minutes ago, kmwoley said: So, my question is... how much free space on the cache do I actually have now? Smallest device is 128GB, you currently have around 96.6GB (90GiB) used, so you have around 31.4GB free. Quote Link to comment
kmwoley Posted May 17, 2019 Author Share Posted May 17, 2019 Thanks for your help. I kinda get it, but when looking at the disk usage before making these changes I would have also said that I had 31GB free. Am I interpreting that first `btrfs fi usage...` command wrong? How am I to know in the future when a balance operation is required before I "run out of space"? I ask because I'd like to write a cron job that warns me before I run out of space. Thanks again for the help. Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 4 hours ago, kmwoley said: How am I to know in the future when a balance operation is required before I "run out of space"? This shouldn't happen again on the newer kernels after a balance, but if you want to keep an eye on it monitor the unallocated space for the smallest device, if it gets below 1GiB there could be a problem again. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.