Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Large copy/write on btrfs cache pool locking up server temporarily

Featured Replies

14 hours ago, aptalca said:

Good question

@limetech is this issue on the radar? We'd like to be able to use btrfs cache pools but the io issue is a non-starter

 

Thanks

There are 126 posts in this topic, can someone please write a tldr?

  • Replies 195
  • Views 46.4k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • It looks to be resolved, I've disabled my weekly balance and so far so good, but I'd like to wait a few more weeks before saying it's fixed for sure.

  • This is fixed in next beta... just hang on a bit..

  • Running the latest beta 6.9.0-beta25 Formatted my SSD's to the new partition alignment.   Massive boost in speed on my Samsung 860Evo and Qvo drives! And not locking up when I do a

Posted Images

  • Community Expert
19 hours ago, limetech said:

There are 126 posts in this topic, can someone please write a tldr?

IIRC the issue is mostly for anyone using a Samsung drive because they have a different NAND erase block size and partitions starting on sector 64 aren't optimal.

 

https://forums.unraid.net/topic/58381-large-copywrite-on-btrfs-cache-pool-locking-up-server-temporarily/?do=findComment&comment=641245

 

also posting the post he links to since it's not going directly there:

https://forums.unraid.net/topic/44104-unassigned-devices-managing-disk-drives-and-remote-shares-outside-of-the-unraid-array/?do=findComment&comment=640178

 

i have also been having my server locking up in the morning lately. i have recently put a second SSD in my cache in the BTRFS format. here is on of the diagnostics i was able to grab before i shut the computer down. 

finalizer-diagnostics-20191122-0449.zip

  • 3 weeks later...

I'm stumbling across this issue just before I setup my new build for unRAID. I have 2 970 EVO PLUS 500GB's I was going to use a cache pool using BTRFS. Guess I won't bother with that, can't hardware RAID them with current hardware, not going to get anything else to do that. I want redundancy with them, that's why I bought 2. Should I just get different SSD's? If so, which ones are good with no issues?

9 hours ago, Iceman24 said:

I'm stumbling across this issue just before I setup my new build for unRAID. I have 2 970 EVO PLUS 500GB's I was going to use a cache pool using BTRFS. Guess I won't bother with that, can't hardware RAID them with current hardware, not going to get anything else to do that. I want redundancy with them, that's why I bought 2. Should I just get different SSD's? If so, which ones are good with no issues?

If you were planning to use them for this specific purpose and have the option to return them still, I'd say to do that. Right now there is no idea if/when this issue will be resolved. 

 

As far as what to replace them with, I think anything non-Samsung will do. I believe the issue is specific to their drives, but someone correct me if I'm wrong.

7 minutes ago, deusxanime said:

If you were planning to use them for this specific purpose and have the option to return them still, I'd say to do that. Right now there is no idea if/when this issue will be resolved. 

 

As far as what to replace them with, I think anything non-Samsung will do. I believe the issue is specific to their drives, but someone correct me if I'm wrong.

I have some 1tb Adata drives and have the issue. I figured it was because they are kind of cheaper drives. I bought some 860s to swap them out with. If I don't use them here, I have plenty of places to use them

19 minutes ago, FearlessUser said:

I have some 1tb Adata drives and have the issue. I figured it was because they are kind of cheaper drives. I bought some 860s to swap them out with. If I don't use them here, I have plenty of places to use them

Good to know, I thought it was only Samsung drives affected. Definitely want to do some research before purchasing new ones then to be sure they'll work correctly. I blew $600 (at the time) on two 850 EVO 1TB drives specifically to use as my unRAID cache drives in a mirror and was quite frustrated that it didn't work (and still doesn't a couple years later!). Hopefully others will be spared the pain and expense.

I can dump one and keep one as XFS single cache drive for now, maybe adding another later if issue is resolved. I will need regular backups of data though from the cache drive that will house Dockers, etc.

 

Edit:

I'd much rather get drives that work, but which ones are those? I can't find an answer.

Edited by Iceman24

I would like to chime in here as well.

 

I have 2x1 TB NVME drives in a RAID1 using BTRFS. My radarr/sab downloads also all sit on the cache. During heavy downloading my iowait also goes as high as 40%. All dockers become unusable during this time. Running 6.7.2.  

 

System resources are not a problem with 64 GB of ram and a Ryzen 3900x, it seems to be the implementation of RAID1 Btrfs cache pools.

Edited by bobo89

3 hours ago, bobo89 said:

I would like to chime in here as well.

 

I have 2x1 TB NVME drives in a RAID1 using BTRFS. My radarr/sab downloads also all sit on the cache. During heavy downloading my iowait also goes as high as 40%. All dockers become unusable during this time. Running 6.7.2.  

 

System resources are not a problem with 64 GB of ram and a Ryzen 3900x, it seems to be the implementation of RAID1 Btrfs cache pools.

 

Everyone posting in here running 6.7.2 should upgrade to 6.8 Stable to at least remove the chance of your slowdown being from the "writes starves reads" bug.

  • 2 weeks later...

I'm on latest release 6.8 using two ADATA SU635 480GB 3D-NAND SATA SSD's in a BTRFS pool and I also have this issue.  i just built my server and read to avoid the Samsung Disks but seems I also get it with the Adata disk.  Will try with COW/checksums disabled as @johnnie.black mentioned.

nas4x12-diagnostics-20200102-1903.zip

Edited by drjUnraid

So it's been a couple of years and this is still an issue?  That's unfortunate.  I'm in the process of building a new 6.8 server and was planning on using a couple Samsung SSD drives for a cache pool.  Has anyone got that working without having the issues mentioned in this thread and if so using what SSD drives?  Thanks!

I'm using two Samsung 860 EVO 1TB drives in my cache pool in Raid1 and the server is NOT locking up for me when I transfer large files.  I already bought the drives before I saw this thread, but can still return them.  I like tweaking and tuning stuff so I was trying to reproduce the issues others are seeing in this thread before making the decision to possibly return the drives.  I can copy a 50GB file to the cache pool and don't see any issues.

 

My main Unraid server is still running 5.0.  I recently upgraded my backup server from 5.0rc11 to 6.8.  Also, I swapped the case from a 4U Norco 4020 to a silent mid tower because I'm relocating the server to a different location (noise is an issue) and added the SSDs.  

 

I installed a bunch of docker containers and a couple of VMs.  Tonight when I shutdown the server to add the second cache drive, after restart my VMs are no longer visible in the GUI.  Don't know why, started another thread on that issue here:  

 

 

My Hardware Components:
CPU: Intel Xeon E3-1220 Sandy Bridge
Motherboard: Supermicro X9SCM-IIF-O
RAM: 32GB - 4x Super Talent DDR3-1333 8GB ECC Micron
Controllers: 1x IBM M1015.  Flashed in IT mode.
Case: Antec P101 Silent
Power Supply: CORSAIR HX750
Flash: 4GB Cruzer Micro
Parity Drive: 1x4TB Seagate ST4000DM000 5900RPM 64MB 4x1000GB CC43
Data Drives:  5x4TB Seagate ST4000DM000 5900RPM 64MB 4x1000GB CC43
Cache Drives: 2x1TB Samsung SSD 860 EVO 1TB
 

Hard drives are connected to the M1015.  SSDs are connected to SATA3 ports on the motherboard.

 

Multiple times I copied a 50GB file from a Win10 PC to my Unraid server over gigabit ethernet:

 

transfer.jpg.28377d8610c8af89a63d3ab0a47d980b.jpg

 

Cache pool during transfer:

1285968676_cachepool.thumb.jpg.258b976d424bf451264f62c48ba5f944.jpg

 

Top during transfer:

top.jpg.e234fa4b2ad6f39a454477a608a4d361.jpg

 

So, during the transfer I was at about 2 load average, highest I saw was ~3.  I still need to figure out what's going on with the VMs, so I couldn't test with those.  But during the transfer I used several docker containers and didn't notice any performance impacts, including:

  • Krusader - browsing files/folders on the server
  • CouchDB - exploring the GUI/interface
  • dukuwiki - Editing wiki pages
  • Oracle Database - browsing with the console

 

Everything appears to be working for me with 2 Samsung SSDs in my cache pool while copying large files.  Should my test have reproduced the problem others are seeing?  Anything else I can/should try? 

 

Best Regards,

Jimmy

 

Edited by JimmyJoe

  • 2 weeks later...

I had similar symptoms, using an older Samsung 830 SSD as a single Btrfs LUKS-encrypted cache.  When copying very large file, iowait would hit the 80's and then at some point the system became unresponsive, and write speeds were around 80 MB/s.  Howerver, moving to XFS LUKS-encrypted did not help things at all.

 

In my case, it had to do with LUKS-encryption.  Moving to non-encrypted cache, either Btrfs or XFS, iowait would be much lower, and write speeds at 200.  However, I'm on an i7-3770 which has AES acceleration and have barely any CPU utilization  

 

One guess is that the 830 controller doesn't handle incompressible data as well, but looking at reviews, that's where it shined compared to Sandforce controllers.  

 

Some searching lead me to this post:

 

Quote

For large writes, the default multiqueue scheduler can end up filling multiple queues of sequential IO that look like random IO (to some devices that have trouble with internal multiqueue scheduling), so it may be worth trying the "none" queuing algorithm to see if this improves things.

Setting the IO Scheduler to none for my cache drive helped a bit, but lowering nr_requests with any IO scheduler helped more, at least in my case.

Edited by robobub

  • 1 month later...

Exact same issue happening to me.  Server locks up completely when copying to BTRFS cache drive (single drive)

 

Seeing IOWAIT up to 50% plus

 

Samsung 850 Pro 2TB SSD using motherboard SATA

 

Raised in bug section as a problem.

 

Frankly surprised this doesn't appear to be getting looked into by LT, given how Samsung make arguably the most popular SSDs in the world?

Edited by sdamaged

  • Author
10 hours ago, sdamaged said:

Exact same issue happening to me.  Server locks up completely when copying to BTRFS cache drive (single drive)

 

Seeing IOWAIT up to 50% plus

 

Samsung 850 Pro 2TB SSD using motherboard SATA

 

Raised in bug section as a problem.

 

Frankly surprised this doesn't appear to be getting looked into by LT, given how Samsung make arguably the most popular SSDs in the world?

LT dropped by once and asked for a summary, then crickets. Try emailing them and linking to this thread

Messaged LT, lets hope they can help get this fixed!

  • 4 weeks later...

Hi there,


i just started with Unraid but i am also affected - i have 2x 1TB 860 qvo SSD's

My IO wait goes >60 sometimes and the server locks up almost fully. During rebalance etc i see 2 x 500 Mbyte/s so bandwidth or controller is hardly an issue.


I tried configuring the ssd's as raid1 and raid0, same issue. Did try to figure out how to change it to XFS, but unfortunately i found out, that the btrfs raid1 did not work as expected - and so i am currently re-playing the backups & downloading meta data :( This is very annoying!
I hope this gets fixed soon! Can't be so difficult to allow for a partition offset ?


Server : UnraidPro 6.8.3, T620 2 x 2690v1 Xeon, 128GB, 8x8TB, 5x14TB - ssd's are on 2118IT p16 (trim enabled).

I was seeing this with a pool of 2 512Gb SSDs. I have since switched to a single Intel NVME drive and the problem has gone.

so this seems then also related to all the other cases when unraid seems frozen / unresponsive etc.

Why is no one looking into this ???

 

Can't be so difficult to allow a different partition offset for some disks ?

I just bought this PRO license and thought i am getting some support for this as well.

The system otherwise looks really nice and promising, but if the issues are not being fixed ??

 

 

 

Edited by ephigenie

On 3/29/2020 at 9:13 PM, allanp81 said:

I was seeing this with a pool of 2 512Gb SSDs. I have since switched to a single Intel NVME drive and the problem has gone.

Ok i mean this is also a possibility "just throw more money at the problem" .

However i think this should concern the Limetech Team and there needs to be a bugfix for this.

 

The docker is up, because i tried before to update "one" docker image. Took 1h, i gave up (binhex-plexpass). This is so bad.

I have a Single SSD in my old box running plain Debian and 40+ Containers (it was my previous media server) and

have never had those kind of performance issues. This is really a shame. I don't think its near anywhere acceptable

having a 128gb, dual xeon, 2 x ssd bla bla server idling there basically completely and utterly busy with himself only.

I used mergerfs in my old box before and it was performing really nice. Now i thought this does look better

and neatly integrated and for me in order not to fiddle around anymore with those things i bought into Unraid.

I just later saw unfortunately there are solutions based on ZFS as well that have emerged to have nice interfaces now as well...

And docker etc.

 

However. Now can we get this fixed please ? What more information is needed to narrow done on that bug ?

 

1433790756_Screenshot2020-03-3113_00_50.thumb.png.29ff3de80e7ee1cd3d457fa5cf6a96ee.png777445060_Screenshot2020-03-3113_00_23.png.6441e0b85f8be0db258cef34db2cb1c3.png

 

  • 4 weeks later...

Same issues with 2 MX500's formatted BTRFS. Extremely disconcerting that this has been a known issue for so long. Seriously thinking about moving away from Unraid tbh. 

@limetech, bumping this thread your way again, we got your attention in November but lost you since then.
Issue is, anyone using Samsung SSDs (among other brands too) in a btrfs cache pool in unraid will see performance fall off a cliff due to partitions starting on sector 64. E.g., if you transfer a large file from/to the btrfs cache pool, all the dockers in unraid will lock up.
 

@wgards, best option for now is to drop your cache down to one drive and reformat to XFS.

  • Community Expert
36 minutes ago, wgards said:

Same issues with 2 MX500's formatted BTRFS. Extremely disconcerting that this has been a known issue for so long.

The problem is that it doesn't affect everyone, I have a pool of MX500 for more than a year working without any issues.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.