hugenbdd Posted January 31, 2017 Share Posted January 31, 2017 Hi I think I'm having a BTRFS issue. 2x240GB SSD unRaid 6.2.4 When I start to move file over to the cache drive, it will set /mnt/cache to read only causing all my dockers to start to fail. Seeing this in the syslog Jan 31 01:11:28 Tower kernel: ------------[ cut here ]------------ Jan 31 01:11:28 Tower kernel: WARNING: CPU: 7 PID: 26931 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Jan 31 01:11:28 Tower kernel: Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod e1000e ptp pps_core coretemp kvm_intel kvm mpt3sas ahci raid_class i2c_i801 i2c_core libahci i5500_temp scsi_transport_sas acpi_cpufreq [last unloaded: ipmi_devintf] Jan 31 01:11:28 Tower kernel: CPU: 7 PID: 26931 Comm: kworker/u50:13 Tainted: G W 4.4.30-unRAID #2 Jan 31 01:11:28 Tower kernel: Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0c 05/15/2012 Jan 31 01:11:28 Tower kernel: Workqueue: writeback wb_workfn (flush-btrfs-18) Jan 31 01:11:28 Tower kernel: 0000000000000000 ffff880566523600 ffffffff8136f79f 0000000000000000 Jan 31 01:11:28 Tower kernel: 0000000000001054 ffff880566523638 ffffffff8104a4ab ffffffff812ada13 Jan 31 01:11:28 Tower kernel: 0000000000001000 ffff880603277400 ffff88062ff7a960 ffff880566523734 Jan 31 01:11:28 Tower kernel: Call Trace: Jan 31 01:11:28 Tower kernel: [<ffffffff8136f79f>] dump_stack+0x61/0x7e Jan 31 01:11:28 Tower kernel: [<ffffffff8104a4ab>] warn_slowpath_common+0x8f/0xa8 Jan 31 01:11:28 Tower kernel: [<ffffffff812ada13>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Jan 31 01:11:28 Tower kernel: [<ffffffff8104a568>] warn_slowpath_null+0x15/0x17 Jan 31 01:11:28 Tower kernel: [<ffffffff812ada13>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Jan 31 01:11:28 Tower kernel: [<ffffffff812c4e16>] btrfs_clear_bit_hook+0x143/0x272 Jan 31 01:11:28 Tower kernel: [<ffffffff812db58b>] clear_state_bit+0x8b/0x155 Jan 31 01:11:28 Tower kernel: [<ffffffff812db88d>] __clear_extent_bit+0x238/0x2c3 Jan 31 01:11:28 Tower kernel: [<ffffffff812dbd49>] clear_extent_bit+0x12/0x14 Jan 31 01:11:28 Tower kernel: [<ffffffff812dc2dc>] extent_clear_unlock_delalloc+0x46/0x18f Jan 31 01:11:28 Tower kernel: [<ffffffff8111f019>] ? igrab+0x32/0x46 Jan 31 01:11:28 Tower kernel: [<ffffffff812d8c93>] ? __btrfs_add_ordered_extent+0x288/0x2cf Jan 31 01:11:28 Tower kernel: [<ffffffff812c8c13>] cow_file_range+0x300/0x3bd Jan 31 01:11:28 Tower kernel: [<ffffffff812c988f>] run_delalloc_range+0x321/0x331 Jan 31 01:11:28 Tower kernel: [<ffffffff812dc915>] writepage_delalloc.isra.14+0xaa/0x126 Jan 31 01:11:28 Tower kernel: [<ffffffff812dea19>] __extent_writepage+0x150/0x1f7 Jan 31 01:11:28 Tower kernel: [<ffffffff812ded16>] extent_write_cache_pages.isra.10.constprop.24+0x256/0x30c Jan 31 01:11:28 Tower kernel: [<ffffffff812da4da>] ? submit_one_bio+0x81/0x88 Jan 31 01:11:28 Tower kernel: [<ffffffff812df214>] extent_writepages+0x46/0x57 Jan 31 01:11:28 Tower kernel: [<ffffffff812c69ca>] ? btrfs_direct_IO+0x28e/0x28e Jan 31 01:11:28 Tower kernel: [<ffffffff812c555f>] btrfs_writepages+0x23/0x25 Jan 31 01:11:28 Tower kernel: [<ffffffff810c3738>] do_writepages+0x1b/0x24 Jan 31 01:11:28 Tower kernel: [<ffffffff8112a5d4>] __writeback_single_inode+0x3d/0x151 Jan 31 01:11:28 Tower kernel: [<ffffffff8112ab89>] writeback_sb_inodes+0x20d/0x3ad Jan 31 01:11:28 Tower kernel: [<ffffffff8112ad9a>] __writeback_inodes_wb+0x71/0xa9 Jan 31 01:11:28 Tower kernel: [<ffffffff8112af80>] wb_writeback+0x10b/0x195 Jan 31 01:11:28 Tower kernel: [<ffffffff8112b4c9>] wb_workfn+0x18e/0x22b Jan 31 01:11:28 Tower kernel: [<ffffffff8112b4c9>] ? wb_workfn+0x18e/0x22b Jan 31 01:11:28 Tower kernel: [<ffffffff8105aede>] process_one_work+0x194/0x2a0 Jan 31 01:11:28 Tower kernel: [<ffffffff8105b894>] worker_thread+0x26b/0x353 Jan 31 01:11:28 Tower kernel: [<ffffffff8105b629>] ? rescuer_thread+0x285/0x285 Jan 31 01:11:28 Tower kernel: [<ffffffff8105fb24>] kthread+0xcd/0xd5 Jan 31 01:11:28 Tower kernel: [<ffffffff8105fa57>] ? kthread_worker_fn+0x137/0x137 Jan 31 01:11:28 Tower kernel: [<ffffffff81629f7f>] ret_from_fork+0x3f/0x70 Jan 31 01:11:28 Tower kernel: [<ffffffff8105fa57>] ? kthread_worker_fn+0x137/0x137 Jan 31 01:11:28 Tower kernel: ---[ end trace cc2c8a28b871c88c ]--- root@Tower:/mnt/cache# touch test.txt touch: cannot touch 'test.txt': No space left on device Appears to happen when I get about 40-50% full on the cache drive. When I "move" files off, it appears to go away. I have run the scrub tool several times without any errors. Possible that my SSD's are going bad? Thanks Dave Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 Post your diagnostics and the output of: Code: btrfs fi show /mnt/cache Code: btrfs fi df /mnt/cache Code: btrfs device stats /mnt/cache Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 Sorry... Posting below root@Tower:/var/log# btrfs fi show /mnt/cache Label: none uuid: e2234ca5-51be-4ccb-9302-3194731c77e3 Total devices 2 FS bytes used 67.43GiB devid 1 size 223.57GiB used 223.57GiB path /dev/sdg1 devid 2 size 223.57GiB used 223.57GiB path /dev/sdh1 root@Tower:/var/log# btrfs device stats /mnt/cache [/dev/sdg1].write_io_errs 0 [/dev/sdg1].read_io_errs 0 [/dev/sdg1].flush_io_errs 0 [/dev/sdg1].corruption_errs 0 [/dev/sdg1].generation_errs 0 [/dev/sdh1].write_io_errs 0 [/dev/sdh1].read_io_errs 0 [/dev/sdh1].flush_io_errs 0 [/dev/sdh1].corruption_errs 0 [/dev/sdh1].generation_errs 0 Currently, I can write the cache file, but still getting the Kernel warnings with fs/btrfs/extent-tree. Thanks Dave Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 You didn't post this one: btrfs fi df /mnt/cache Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 oops root@Tower:/var/log# btrfs fi df /mnt/cache Data, RAID1: total=222.54GiB, used=66.79GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=1.00GiB, used=728.33MiB GlobalReserve, single: total=256.00MiB, used=0.00B Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 Problem is with the docker image, you should delete and re-create it. Cache is fine, although there's a lot of slack on the filesystem, you should run a balance: btrfs balance start -dusage=75 /mnt/cache Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 Stopped the array, disabled Docker (set to no), deleted the docker.img file. Tried to run the balance command, but got an error. root@Tower:/mnt/cache# rm docker.img root@Tower:/mnt/cache# ls -ltr total 0 drwxrwxrwx 1 nobody users 246 Jan 30 22:41 appdata/ drwxrwxrwx 1 nobody users 16 Jan 31 01:00 Download/ drwxrwxrwx 1 nobody users 32 Jan 31 01:00 Movies/ drwxrwxrwx 1 nobody users 24 Jan 31 01:00 TV/ drwxrwxrwx 1 nobody users 240 Jan 31 08:36 transcode/ root@Tower:/mnt/cache# btrfs balance start -dusage=75 /mnt/cache ERROR: error during balancing '/mnt/cache': No space left on device There may be more info in syslog - try dmesg | tail root@Tower:/mnt/cache# btrfs fi show /mnt/cache Label: none uuid: e2234ca5-51be-4ccb-9302-3194731c77e3 Total devices 2 FS bytes used 30.86GiB devid 1 size 223.57GiB used 223.57GiB path /dev/sdg1 devid 2 size 223.57GiB used 223.57GiB path /dev/sdh1 root@Tower:/mnt/cache# btrfs fi df /mnt/cache Data, RAID1: total=222.54GiB, used=30.15GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=1.00GiB, used=725.61MiB GlobalReserve, single: total=256.00MiB, used=0.00B root@Tower:/mnt/cache# btrfs device stats /mnt/cache [/dev/sdg1].write_io_errs 0 [/dev/sdg1].read_io_errs 0 [/dev/sdg1].flush_io_errs 0 [/dev/sdg1].corruption_errs 0 [/dev/sdg1].generation_errs 0 [/dev/sdh1].write_io_errs 0 [/dev/sdh1].read_io_errs 0 [/dev/sdh1].flush_io_errs 0 [/dev/sdh1].corruption_errs 0 [/dev/sdh1].generation_errs 0 Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 Was afraid of that as almost all space is allocated, v6.3 includes a more recent kernel and btrfs-progs that a lot better at avoiding this. You can try to delete some files (the larger the better) and run balance again, if it still doesn't work best to backup cache and re-format, you can follow this procedure, but format cache instead of replacing: https://lime-technology.com/forum/index.php?topic=48508.msg516110#msg516110 Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 Okay, trying to format the cache drives. Is there a how to on the format? I don't see any option to format the drives. I have removed them, started the array, stopped the array, then added them back. But it does not look like it has formated them. Didn't find much with a search either. Thanks Dave Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 Leave just cache1 assigned, change cache slots to 1, click on it and change file system to xfs, start array and you'll have the option to format, do it, stop array, change cache slots to 2, assign the other cache device and format again. Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 That did it, thanks for the quick support. Now just have to move my appdata back over to /mnt/cache and redo all the dockers. Thanks Dave Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 You can follow the rest of the cache replace procedure to move appdata back. It's important to regularly monitor the file system slack, especially on unRAID <6.3, when there's 20% or bigger difference between total an used, run a balance, so the same thing doesn't happen again. This is bad: Data, RAID1: total=222.54GiB, used=66.79GiB This is good: Data, RAID10: total=906.00GiB, used=882.08GiB Quote Link to comment
hugenbdd Posted January 31, 2017 Author Share Posted January 31, 2017 So I want to be clear what your saying. Hit the "Balance" button when in the cache drive, under "Balance Status"? Or, run the command you gave me? btrfs balance start -dusage=75 /mnt/cache If it's the command, can I put it in a cron job weekly? Or will 6.3 be out soon enough. Thanks Dave Quote Link to comment
JorgeB Posted January 31, 2017 Share Posted January 31, 2017 You can use the command on the cache page, by default it will run a full balance, so it will take longer, the command I gave only balances chunks where data use is at 75% or below, so it's faster, and you use lower values, 50, 25, etc, you can also replace the default command options (-dconvert=raid1 -mconvert=raid1) by other filters, eg (-dusage=50). No need to do it daily, monitor the file system for a few days/weeks and see how it progresses, a balance will re-right the data (all data if it's a full balance) so it will cause wear on the SSDs. Quote Link to comment
John_M Posted January 31, 2017 Share Posted January 31, 2017 This ought to be a FAQ. I just rebalanced my cache after reading this thread. Before: root@Lapulapu:~# btrfs fi df /mnt/cache Data, RAID1: total=236.44GiB, used=44.44GiB System, RAID1: total=32.00MiB, used=64.00KiB Metadata, RAID1: total=2.00GiB, used=303.91MiB GlobalReserve, single: total=38.64MiB, used=0.00B After: root@Lapulapu:~# btrfs fi df /mnt/cache Data, RAID1: total=45.00GiB, used=44.43GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=297.05MiB GlobalReserve, single: total=31.78MiB, used=0.00B Quote Link to comment
JonathanM Posted January 31, 2017 Share Posted January 31, 2017 This ought to be a FAQ. I just rebalanced my cache after reading this thread. Before: root@Lapulapu:~# btrfs fi df /mnt/cache Data, RAID1: total=236.44GiB, used=44.44GiB System, RAID1: total=32.00MiB, used=64.00KiB Metadata, RAID1: total=2.00GiB, used=303.91MiB GlobalReserve, single: total=38.64MiB, used=0.00B After: root@Lapulapu:~# btrfs fi df /mnt/cache Data, RAID1: total=45.00GiB, used=44.43GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=297.05MiB GlobalReserve, single: total=31.78MiB, used=0.00B Looks like it should be pretty easy to parse the output of the fi df command and recommend whether or not to balance. Squid, how about a FCP addon? Quote Link to comment
JorgeB Posted February 1, 2017 Share Posted February 1, 2017 Yes, maybe a FCP warning would be best, but note that normally btrfs should delete unused data chunks and a manual balance should not be needed, but in practice this does not always happen, especially on older kernels, so it's a good idea to keep ahead of possible file system full issues. Issues can happen when all available space is allocated like what happened to the OP: devid 1 size 223.57GiB used 223.57GiB path /dev/sdg1 devid 2 size 223.57GiB used 223.57GiB path /dev/sdh1 So filesystem is completely allocated, here we can see the actual used space: Data, RAID1: total=222.54GiB, used=66.79GiB System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=1.00GiB, used=728.33MiB GlobalReserve, single: total=256.00MiB, used=0.00B Totals are the different allocated type of chunks, mainly data and metadata, and used is the actual used space, issues arise when the file system is all allocated and metadata is close to full, so for any new writes btrfs tries to allocated a new metadata chunk and fails, resulting in an out of space error. So although it's good practice to keep a low slack (difference between total and used), it's only a problem when the device is almost all allocated, so ideally there should be a suggestion to run a balance when there's and big slack and and a warning when there's slack and the device is almost all allocated. Quote Link to comment
Squid Posted February 1, 2017 Share Posted February 1, 2017 Johnnie. Let me know what commands and output I'm looking for Sent from my LG-D852 using Tapatalk Quote Link to comment
John_M Posted February 1, 2017 Share Posted February 1, 2017 This is fascinating stuff. So while my cache pool had a lot of slack Data, RAID1: total=236.44GiB, used=44.44GiB there was still sufficient free metadata space Metadata, RAID1: total=2.00GiB, used=303.91MiB for it not to be an immediate problem, while the OP was running dangerously low on free metadata space. Metadata, RAID1: total=1.00GiB, used=728.33MiB I've just installed 6.3.0rc9 on this server so I'll be interested to see if it becomes unbalanced again, and if so, how quickly. Quote Link to comment
JorgeB Posted February 1, 2017 Share Posted February 1, 2017 Johnnie. Let me know what commands and output I'm looking for For now I think you should just add a warning if there's some slack and the allocated space is near max, there's a newer command that will eventually replace btrfs fi df that gives all the info needed, I was not sure if it was available on v6.2 but it is: btrfs fi usage /mnt/cache Overall: Device size: 2.79TiB Device allocated: 1.78TiB Device unallocated: 1.01TiB Device missing: 0.00B Used: 1.57TiB Free (estimated): 624.84GiB (min: 624.84GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) There is also info by device but not needed for this, the values of interest are "Device unallocated" and slack (the difference between "Used" and "Device allocated"). So based on these values, give a warning if there's significant slack (eg difference between used and allocated is larger than 20%) AND unallocated space is 5% or less of total device size, user should run a balance to reclaim unused chunks, eg: btrfs balance start -dusage=50 /mnt/cache 50 means it will only balanced chunks that are 50% or less occupied, this should be enough to reclaim most unused space but it can be changed to a higher value if needed. For the suggestion to run a balance only based on slack let me monitor some devices for a while and see how they behave because in part due to how the cache is used, ie, constantly filled and emptied by the mover, some slack may be unavoidable, and not an issue as long as there is enough unallocated space. Quote Link to comment
hugenbdd Posted February 1, 2017 Author Share Posted February 1, 2017 For the suggestion to run a balance only based on slack let me monitor some devices for a while and see how they behave because in part due to how the cache is used, ie, constantly filled and emptied by the mover, some slack may be unavoidable, and not an issue as long as there is enough unallocated space. I do run the mover every morning as I can easily fill up the cache in a day. Quote Link to comment
hugenbdd Posted February 2, 2017 Author Share Posted February 2, 2017 About 80GiB put on Cache today, then ran the Mover Data, RAID1: total=122.00GiB, used=46.53GiB Data, single: total=1.00GiB, used=0.00B System, RAID1: total=32.00MiB, used=48.00KiB Metadata, RAID1: total=2.00GiB, used=629.91MiB GlobalReserve, single: total=224.00MiB, used=1.83MiB After running Balance Data, RAID1: total=48.00GiB, used=46.52GiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=1.00GiB, used=634.72MiB GlobalReserve, single: total=224.00MiB, used=0.00B Do you still not think I need to run the balancer in a cron job? Quote Link to comment
JorgeB Posted February 2, 2017 Share Posted February 2, 2017 Only when the data total starts getting closer to the device size, and since most chunks will be empty run balance with a low usage setting, eg: btrfs balance start -dusage=5 /mnt/cache This should reclaim most unused chunks while avoiding unnecessary writes. Quote Link to comment
killeriq Posted July 7, 2018 Share Posted July 7, 2018 Hello, It seems i have similar problem... First Transmission was not able to download says "No space left on device" so i tried to back it up, but i get the same error in MC also not sure why i got "user0" with same folders as "user" Tried also those commands, but still "no space" error root@unRAIDTower:~# btrfs fi show /mnt/cache Label: none uuid: 2bc7fced-04dc-491d-9449-09d79d7c8f5e Total devices 1 FS bytes used 74.12GiB devid 1 size 111.79GiB used 84.02GiB path /dev/sdd1 root@unRAIDTower:~# btrfs fi df /mnt/cache Data, single: total=83.01GiB, used=73.98GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, used=144.22MiB GlobalReserve, single: total=65.17MiB, used=0.00B root@unRAIDTower:~# btrfs device stats /mnt/cache [/dev/sdd1].write_io_errs 0 [/dev/sdd1].read_io_errs 0 [/dev/sdd1].flush_io_errs 0 [/dev/sdd1].corruption_errs 0 [/dev/sdd1].generation_errs 0 root@unRAIDTower:~# btrfs balance start -dusage=75 /mnt/cache Done, had to relocate 7 out of 87 chunks root@unRAIDTower:~# Any clue what elso to try? or only format whole "cache drive" ? Thanks Quote Link to comment
JorgeB Posted July 7, 2018 Share Posted July 7, 2018 5 hours ago, killeriq said: also not sure why i got "user0" with same folders as "user" Not the same, user0 excludes the cache, user includes the cache. 5 hours ago, killeriq said: Tried also those commands, but still "no space" error Also not the same problem, you cache still has available space. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.