Large copy/write on btrfs cache pool locking up server temporarily


Recommended Posts

On 3/14/2018 at 4:28 PM, dnoyeb said:

I kicked it over to 2048 which seems to work well.  I've read that you should do it in multiples of 1536 however, so in Gparted if you kicked it to 3mb, you'd be set.  

 

If I pull the drive out again, I'll probably kick it to 3mb instead.  I've got some more details over in that other thread (unassigned devices) showing how much it impacted my copies.  I'm EXTREMELY happy now with the performance of doing large scale downloads...  I'll probably test with 4x70gb downloads today and see how it handles it....  I'm on gigabit, so i'll routinely pull down at full gigabit; which really is a great test.

 

First thanks for finding this. I have been wondering why I had such high IO wait for a while. 

I found this post about alignment on  940 EVO. It suggest to align at sector "12288 (6144 KiB), which is a multiple of 1536 KiB and 2048 KiB". 

 

I guess I'll swap a few SSD around while waiting for a proper fix...

 

Link to comment

So, I'm at a point where I can make some changes to my server drive layout and trying to decide what the best move forward plan is for now. I currently still have 2x Samsung 850 EVO 1TB drives in BTRFS RAID1 for my cache pool. Pretty much the worst case scenario it seems after reading back through this thread again. (And let me tell you that my ~$600 investment is really chapping my hide that unRAID people seems to not care about these issues at all, rendering it crippled.)

 

Should I just split up the pool and reformat one of the drives in XFS to use as my new cache drive/pool, and then I can just do whatever with the other in Unassigned Devices? If I do that, what is the issue with alignment on Samsung SSDs and will that still cause problems? Or is that only an issue with BTRFS and not a problem with XFS? In which case the only solution to get the max performance out of them would be to use a tool to fix the alignment, rendering them both only usable in UD since unRAID won't work with them that way?

 

Or should I just say screw it and buy a new SSD to use as cache to avoid the above issues? Thought I still can't use BTRFS RAID1 since that is not a manufacturer specific issue right? Ugh, where is the "just works" I wanted when moving to (and paying for) this product...

  • Like 1
Link to comment
1 hour ago, deusxanime said:

Should I just split up the pool and reformat one of the drives in XFS to use as my new cache drive/pool, and then I can just do whatever with the other in Unassigned Devices? If I do that, what is the issue with alignment on Samsung SSDs and will that still cause problems? Or is that only an issue with BTRFS and not a problem with XFS? In which case the only solution to get the max performance out of them would be to use a tool to fix the alignment, rendering them both only usable in UD since unRAID won't work with them that way?

 

That's exactly what i have done. I have not experienced alignment issues with an XFS formatted Samsung Evo drive.

Link to comment
On 8/2/2018 at 1:53 AM, thomast_88 said:

 

That's exactly what i have done. I have not experienced alignment issues with an XFS formatted Samsung Evo drive.

 

Thanks, I'm going to give that a try to start with. I'll have to resolve to do more backups of my containers and VMs, but otherwise hopefully the performance improvement will be worth it.

Link to comment
  • 8 months later...
On 8/23/2017 at 3:50 PM, aptalca said:

What I did was
1) mount a second ssd through unassigned devices plugin,
2) shut down all Dockers and VMs (turn off the services in the settings so they don't automatically restart when the array starts),
3) rsync all data from cache to unassigned device (rsync preserves permissions, timestamps, etc. with the option "a"),
4) stop the array,
5) change the disk format from btrfs to xfs and
6) restart the array.

It will format the cache drive, which takes about a minute. Then you can transfer your data back to the cache drive and enable the docker and VM services

If you don't have a spare ssd, you can rsync to an array disk as well. Make sure you use a disk share and not a user share for that (ie. /mnt/diskX)

 I did these instructions exactly.  Now upon restarting the array the system complains "Unmountable: No file system (no btrfs devices)"

 

I stop the array go into the drive config and the xfs as a format option is non-existent. Only options are auto, btrfs and btrfs-encrypted.  xfs is nowhere to be found.  I'm hesitant to remove the drive from the config and re-add the blank xfs formatted cache drive.

Link to comment
15 minutes ago, joedotmac said:

I stop the array go into the drive config and the xfs as a format option is non-existent. Only options are auto, btrfs and btrfs-encrypted.  xfs is nowhere to be found.

As long as there are multiple cache slots shown when the array is stopped, xfs is unavailable. Do you have a full backup of the data? I'm unclear what you are trying to do, and at what step you got hung up.

Link to comment

I ran into the issue described in the thread where btrfs formatted cache drive trim operations are not occurring.  This results in the cache drive space filling up.  It's mentioned in the thread there's a fix possible with a kernel update which hasn't yet been implemented.  The current fix is to change the format of the cache disk from btrfs to xfs. 

 

I have a full backup of the data, performed a rsync -av to an unassigned drive with everything that was on the cache drive. 

 

I'm hung up around step 5,6 and the result.  Unable for the system to make available the xfs option for the cache drive.

 

How does someone eliminate the two empty slots of my cache drive configuration so that xfs is available as a format selection?

 

I can change the amount of slots for the array disks, but the cache drive section has three slots and the option to change the available slots to a single cache drive is grayed out.  The selection currently indicates three, and I'm using a single cache drive.

Edited by joedotmac
Link to comment
  • 5 months later...

Following this thread hoping for an update in the future on this. Like others, multi-disk Samsung SSD Cache in BTFRS has terrible performance when moving large amounts of data. Unraid and all my dockers would lock up.

Moved to single disk SSD XFS and it flies now. Also moved to nightly backups of my appdata but I'd ultimately like to get a pool going again in case of drive failure. Anyone know if this issue with btfrs has been reported as a bug yet?

Link to comment
11 hours ago, Zuluster said:

Following this thread hoping for an update in the future on this. Like others, multi-disk Samsung SSD Cache in BTFRS has terrible performance when moving large amounts of data. Unraid and all my dockers would lock up.

Moved to single disk SSD XFS and it flies now. Also moved to nightly backups of my appdata but I'd ultimately like to get a pool going again in case of drive failure. Anyone know if this issue with btfrs has been reported as a bug yet?

I don't think it was confirmed as a btrfs bug. It could be specific to unraid's implementation of btrfs. Not sure. We need the unraid team to look into it and chime in.

Link to comment

I will join in on having this issue.

6x Axiom 120gb SSD in btrfs raid10

 

Performance has never been great, but I was able to confirm that docker sab extracting and downloading to the cache pool destroys performance and causes the server to lag horribly.

Netdata pops with the warning:

 

btrfs-lag-issue.png.5f020bbe4015f24276100fbd99933e85.png

 

I am curious if this is any way related to this other issue, which is being worked on for 6.8rc:

 

Btw, kudos to johnnie.black reporting and hounding that one to resolution.

 

For this one though, I will keep watching and maybe this can get some attention in the future.

 

 

 

Edited by semtex41
fixed formattting
Link to comment

It would be nice to have this fixed. I bought two 1TB Samsung 850 EVOs a couple years ago when I switched to unRAID specifically to use as my cache drive in RAID1. Cost over $300 each and I've never been able to run them as a mirror properly in that time. Had to split them due to this issue and use one as my cache/appdata with XFS filesystem and just use the other as an unassigned drive with my VMs on it. Pretty big waste of space and money sadly. 

Link to comment
1 hour ago, 0xPCP said:

Just ran into this issue myself. Setup two Samsung 970s In a btrfs cache pool. I was getting intermittent system hangs. Time to move all the data around and reformat to xfs unfortunately. 

 

Welcome to unRAID! It is a great system, sorry that your first experience as you are setting up is running into this year+ old bug that feels like it should have been resolved by now (considering Samsung is probably one of the most popular, if not the most popular, consumer SSD brand). Unfortunately it seems like some things get ignored/slip through the cracks, but for the most part it is really awesome. Kind of reminds me of Plex in that regard...

Link to comment
59 minutes ago, deusxanime said:

 

Welcome to unRAID! It is a great system, sorry that your first experience as you are setting up is running into this year+ old bug that feels like it should have been resolved by now (considering Samsung is probably one of the most popular, if not the most popular, consumer SSD brand). Unfortunately it seems like some things get ignored/slip through the cracks, but for the most part it is really awesome. Kind of reminds me of Plex in that regard...

At least now I know what the issue is. I was running memtests all night and not finding anything.

Link to comment
  • 2 weeks later...
13 hours ago, dis3as3d said:

+1 on this, I'm hitting the same issue with 2x Samsung 970s in a btrfs cache pool as well.  Has anyone found any fixes?

Your setup is almost identical to mine. There was no fix other than removing one drive from the cache pool and reformatting to ZFS. Definitively a weird edge case.

Link to comment
20 hours ago, dis3as3d said:

@0xPCP  Did reformatting into XFS(Assuming you meant that) fix your issue?  I'm still hitting the same issue both a single drive formatted as XFS as a cache drive and as an unassigned drive mounted outside any Arrays/Pools/Cache drives.

No issues at all with xfs formatted cache drive here (Samsung one, too)

Link to comment
6 hours ago, aptalca said:

No issues at all with xfs formatted cache drive here (Samsung one, too)

Seems like plenty of folks have Samsung drives working ok.  I'm wondering if it's something specific to the 970 EVO 1TB.  Seems like Unraid doesn't really register it all correctly.  For example: it has two temp sensors, but Unraid seems to monitor the lower temp of the two.  I'd expect you'd want to monitor the hotter of the two.

 

#	Attribute Name					Flag	Value	Worst	Threshold	Type	Updated	Failed	Raw Value
-	Critical warning				0x00
-	Temperature						42 Celsius
-	Available spare					100%
-	Available spare threshold		10%
-	Percentage used					0%
-	Data units read					1,122,860 [574 GB]
-	Data units written				1,181 [604 MB]
-	Host read commands				1,741,759
-	Host write commands				8,081
-	Controller busy time			3
-	Power cycles					44
-	Power on hours					4 (4h)
-	Unsafe shutdowns				31
-	Media and data integrity errors	0
-	Error information log entries	0
-	Warning comp. temperature time	0
-	Critical comp. temperature time	0
-	Temperature sensor 1			42 Celsius
-	Temperature sensor 2			51 Celsius

 

Link to comment
6 hours ago, Vitalsignser said:

Is this something Unraid is looking at? I'd like to use two Samsung NVe drives for cache. It seems like the current fix is to only use a single drive formatted into XFS? 

Good question

@limetech is this issue on the radar? We'd like to be able to use btrfs cache pools but the io issue is a non-starter

 

Thanks

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.