Jump to content
kl0wn

Out Of Space Errors On Cache Drive

36 posts in this topic Last Reply

Recommended Posts

I'm about ready to throw this box out the window...With that being said, here we go:

 

I recently started moving all of my docker data and images to the cache drive. The data in this share existed on all disks in the RAID which was horrible for performance (Share Name: /mnt/user/cache_only). So I made the share cache only and began moving data from Disk1, Disk 2, Disk 3... to /mnt/cache/cache_only. I was successful in moving most of the appdata except for Plex. About halfway thru the transfer it said that my target had no space left, even though there was 60G worth of free space available. I aborted the process, moved some files back to other disks and started the process again. I checked the unraid console and it's showing some really odd values for the cache disk. First off this is a 120G SSD, but when this ugly bug peaks it's head out it will just slap whatever size it wants for "Disk Size" in this example it's showing 69 with 0B free. This is causing some serious issues with the dockers, cache data and I'm worried at some point (this has happend 4 times just today) that my docker data is going to get corrupt. What's even more odd is I stopped trying to transfer the remaining plex files over because of this error. However, now UnRAID is just wigging out on its own with no intervention. Screenshots and syslog attached.

 

 

Size 69.png

BTRFS_FS_DF.png

 

tower-diagnostics-20171219-1244.zip

Edited by kl0wn

Share this post


Link to post

What's the history of your cache pool?  That is, was it always single-device?  How long ago was it formatted?

Share this post


Link to post

That's a problem with df thinking the disk is full:

 

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdk1       112G   65G     0 100% /mnt/cache

Release you're on has bug and does not include btrs-usage on the diags, can you post the output of:

 

btrfs fi usage /mnt/cache

I talked with @bonienlin another thread of the possibility of the GUI getting the free space from btrfs-usage, if it's possible it should be much more accurate and avoid these issues.

 

Share this post


Link to post

T

1 minute ago, johnnie.black said:

That's a problem with df thinking the disk is full:

 


Filesystem      Size  Used Avail Use% Mounted on
/dev/sdk1       112G   65G     0 100% /mnt/cache

Release you're on has bug and does not include btrs-usage on the diags, can you post the output of:

 


btrfs fi usage /mnt/cache

I talked with @bonienlin another thread of the possibility of the GUI getting the free space from btrfs-usage, if it's possible it should be much more accurate and avoid these issues.

 

 

That's not going to correct 'disk full'.  The 'free space' is being accurately reported.  The question is, why is 'Data single' still only letting him use half the storage?

Share this post


Link to post
8 minutes ago, limetech said:

What's the history of your cache pool?  That is, was it always single-device?  How long ago was it formatted?

 

Single Cache Disk, never in a pool. I'm not too sure on the age...I believe I got it 3-4 years ago, so it would have been then.

 

8 minutes ago, johnnie.black said:

That's a problem with df thinking the disk is full:

 


Filesystem      Size  Used Avail Use% Mounted on
/dev/sdk1       112G   65G     0 100% /mnt/cache

Release you're on has bug and does not include btrs-usage on the diags, can you post the output of:

 


btrfs fi usage /mnt/cache

I talked with @bonienlin another thread of the possibility of the GUI getting the free space from btrfs-usage, if it's possible it should be much more accurate and avoid these issues.

 

 

So the dockers and the rest of the system are looking at what the GUI is pulling? Is there a way to modify this behavior?

 

btrfs-usage-term.png

Edited by kl0wn

Share this post


Link to post
2 minutes ago, kl0wn said:

Single Cache Disk, never in a pool. I'm not too sure on the age...I believe I got it 3-4 years ago, so it would have been then.

 

Ok, please try this command and tell me if it gets out-of-space error:

 

cp /boot/bzimage /mnt/cache

That just copies a 4M file (the kernel image) to cache, bypassing shfs.  If that worked, you can then just delete the file:

rm /mnt/cache/bzimage

Share this post


Link to post
1 minute ago, limetech said:

 

Ok, please try this command and tell me if it gets out-of-space error:

 


cp /boot/bzimage /mnt/cache

That just copies a 4M file (the kernel image) to cache, bypassing shfs.  If that worked, you can then just delete the file:


rm /mnt/cache/bzimage

 

Worked no problem:

 

 

bzimage-test.png

Share this post


Link to post
T

 

That's not going to correct 'disk full'.  The 'free space' is being accurately reported.  The question is, why is 'Data single' still only letting him use half the storage?

 

Yes, there are2 different issues, total and free space not being correctly reported and the filesystem being fully allocated, this is because of the SSD allocation issue and it shouldn't happen anymore  on kernel 4.14

 

OP, you need to balance your cache:

 

btrfs balance start -dusage=75 /mnt/cache

If you get ENOSPC lower the 75 until you can complete a balance, e.g. try -dusage=50, 25 and so on, then do again with an higher number until you can do with at least 75.

 

 

Share this post


Link to post
5 minutes ago, johnnie.black said:

 

Yes, it's 2 different issues, total and free space not being correctly reported and the filesystem being fully allocated, this is because of the SSD allocation issue and it shouldn't happen anymore  on kernel 4.14

 

OP, you need to balance your cache:

 


btrfs balance start -dusage=75 /mnt/cache

If you get ENOSPC lower the 75 until you can complete a balance, e.g. try -usage=50, 25 and so on, then do again with an higher number until you can do with at least 75.

 

OK so in the meantime what should I do? Is the kernel purpose built by unraid or can I yank it and apply it myself?

 

Also - I can't balance because it appears that is also looking at the value provided to the GUI...

 

 

btrfs-balance.png

Share this post


Link to post
Also - I can't balance because it appears that is also looking at the value provided to the GUI...

  No, that is not about the GUI, it's about not having any space for new chunks, try -dusage=1, if that still fails you'll need to delete some files and try again, the larger the files the better chance of working.

 

 

 

 

Share this post


Link to post
4 minutes ago, kl0wn said:

Is the kernel purpose built by unraid or can I yank it and apply it myself?

 

Latest rcs are already using kernel 4.14, but anyone coming from lower kernels still needs to do a first balance if the filesystem is fully or close to fully allocated like yours is.

Share this post


Link to post
Just now, johnnie.black said:

 

Latest rcs are already using kernel 4.14, but anyone coming from lower kernels still needs to do a first balance if the filesystem is fully or close to fully allocated like yours is.

 

OK but what's considered fully/close to being allocated? I have 50Gigs out of 120 left. I'll move my 20G docker image off and attempt to run the balance because 1 wouldn't work either haha

Share this post


Link to post

Could also reformat the device:

 

  1. Copy all the files off that device you care about
  2. Stop array
  3. Change format of Cache to xfs
  4. Start Array
  5. Reformat cache disk
  6. Stop array
  7. Change format of cache cache back to btrfs
  8. Start array
  9. Reformat cache disk
  10. Copy files back.

There are simpler ways to reformat, but that will work.

 

Share this post


Link to post
1 minute ago, limetech said:

Could also reformat the device:

 

  1. Copy all the files off that device you care about
  2. Stop array
  3. Change format of Cache to xfs
  4. Start Array
  5. Reformat cache disk
  6. Stop array
  7. Change format of cache cache back to btrfs
  8. Start array
  9. Reformat cache disk
  10. Copy files back.

There are simpler ways to reformat, but that will work.

 

 

If I can get this done without reformatting that'd be ideal. I have Plex metadata on this drive that's ~40G and has probably 800k plus in directories alone. Moving that to one of the disks I have in the box will take a ridonculous amount of time. Is this going to be an issue for other users after the official release? If so, is it mentioned that this need to be done?

Share this post


Link to post
2 minutes ago, kl0wn said:

OK but what's considered fully/close to being allocated?

 

image.png.0e8f645e4714d0cc4593ffe57526feb9.png

 

btrfs first allocates chunks, mostly data or metadata before it can write to them, although you have lots of data chunks with free space your metadata is very close to being full:

 

image.png.c67378a6e60aed1021dcd048cb4cddd4.png

 

so btrfs tries to create a new chunk for metadata, it can't, hence the ENOSPC error.

Share this post


Link to post
1 minute ago, johnnie.black said:

 

image.png.0e8f645e4714d0cc4593ffe57526feb9.png

 

btrfs first allocates chunks, mostly data or metadata before it can write to them, although you have lots of data chunks with free space your metadata is very close to being full:

 

image.png.c67378a6e60aed1021dcd048cb4cddd4.png

 

so btrfs tries to create a new chunk for metadata, it can't, hence the ENOSPC error.

 

aaaaah OK thanks for the education on that. I wasn't aware that's how it worked. I'll keep moving things off the cache and see if I can get it to balance. If I go the reformat route, should I just go with XFS? I haven't had any issues like this with that file system.

Share this post


Link to post
If I go the reformat route, should I just go with XFS? I haven't had any issues like this with that file system.

 

If you don't plan to create a multidevice pool and/or are not using or don't care about checksums and snapshots it's probably best to use xfs, btrfs is getting better with new kernel releases but it still has some quirks/issues.

 

 

Share this post


Link to post
1 hour ago, limetech said:

T

 

That's not going to correct 'disk full'.  The 'free space' is being accurately reported.  The question is, why is 'Data single' still only letting him use half the storage?

No but the different way if reporting free space will be based on amount of active file data stored on the drive which is indicative of how much free space the BTRFS volume could get after a balance.

 

Most design ideas for BTRFS are great. But there are a couple of things that they implemented in a way that make you think maybe the Bastard Operator from Hell had a relative that was developer.

Share this post


Link to post

Balance Complete. I'll start moving data back and see if the issue pops up again. Thanks for all the help!

 

 

BalanceComplete.png

Share this post


Link to post

Looks good, keep an eye on the it until you upgrade to a release using 4.14, I recommend scheduling a weekly balance for anyone running kernel 4.13 or older, even on 4.14 it's a good idea to keep an eye on at least initially, it's supposed to be fixed but it's too early to tell if it really is.

Share this post


Link to post

MY cache drive is seeing the exact same thing since RC15.

 

Anything specific stats people would need? Nothing will start for me.

 

I was able to run the copy command without issue despite showing as 0B available in the UI. That said, its a 750GB SSD, and it shows as only 313GB used/313GB available.

 

Upon reboot it will return to normal for about 5 minutes.

 

Overall:
    Device size:                 698.64GiB
    Device allocated:            698.64GiB
    Device unallocated:           24.00KiB
    Device missing:                  0.00B
    Used:                        291.74GiB
    Free (estimated):            406.48GiB      (min: 406.48GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              424.28MiB      (used: 0.00B)

Data,single: Size:696.62GiB, Used:290.14GiB
   /dev/sdg1     696.62GiB

Metadata,single: Size:2.02GiB, Used:1.60GiB
   /dev/sdg1       2.02GiB

System,single: Size:4.00MiB, Used:96.00KiB
   /dev/sdg1       4.00MiB

Unallocated:
   /dev/sdg1      24.00KiB
 

My balance is running currently, although not 100% sure how long a balance should take?

 

hades-diagnostics-20171219-2049.zip

Edited by wickedathletes

Share this post


Link to post

EDIT: I'm a moron and didn't read your entire post. My balance took around 15 minutes. To be fair I never ran the operation and my SSD was in operation for 4 years so I'm unsure if that effects anything. However, I'm guessing yours may take a bit longer due to the size of the SSD.

Edited by kl0wn

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.