BVD

Members
  • Posts

    330
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by BVD

  1. Your PHP tuning needs to be 'compatible' (for lack of a better term) with your database tuning - if, for instance, you allow 200 php processes, but only allow 100 DB connections in your DB config, nextcloud will complain. In addition, one tuning variable within postgres.conf can impact another - the paragraph directly following the postgresql.conf sample variables has a brief explanation, where you have a total number of worker processes, maint processes, and allocations of memory each, which should be less than or equal to your shared buffers. You'll want to check out the postgres config reference linked in the guide for full explanations of each of the options and their impacts. The database tuning information is in the respective DB's page, linked from the main page above - the postgres one is more complete at this point, as it's what I've primarily used for anything that needs a database for many years, while my MariaDB/MySQL experience is strictly relegated to working on other's environments (so I don't have everything already written up for it to just copy from my deployment notes).
  2. This is the reason I use the pre/post scripts capability of sanoid and stick with zfs snapshots. Qemu has the ability to quiesce the VM which you do in the prescript section, snapshot occurs, then post-script to resume normal operations. Edit: Just to be clear, I do something similar to this to ensure my snapshot's consistency. It *is* still only crash consistent, which is the reason all my "complex" applications (databases etc) run in docker containers - I then have the granularity to individualize the pre/post scripts to the specific applications needs for application consistent backups via snapshot.
  3. Cross-posting here, as I finally made some progress on documenting some ZFS on UnRAID performance related stuff I'm using my vacation week to work on some of this as a passion project - hope it's found to be helpful by some, and just note, it'll continue to grow/evolve as time allows 👍
  4. I posted what I hope to be a single place to reference for performance related tuning/information over in the guides/general section which had a couple pieces I thought were applicable, so I figured I'd cross post the relevant bits here - it goes into a great deal of the performance tuning one might wish to undertake with LSIO's Nextcloud+Postgres+Nginx containers, and I'm posting here in hopes that it can help some folks and reaches the right audience: Postgres - My recommendation for nextcloud's backing database - guide to keeping your applications performance snappy using PG to back systems with millions of files, 10's or even hundreds of applications, and how to monitor and tune for your specific HW with your unique combination of applications Nextcloud - Using nextcloud, onlyoffice, elasticsearch, redis, postgres, nginx, some custom cron tasks, and customization of the linuxserver container (...and zfs) to get highly performant app responsiveness even while using apps like facial recognition, full text search, and online office file editing (among many others). I've collaborated some with the Ibracorp folks on their work towards an eventual nextcloud guide to help with some performance stuff, along with my notes as something of a 'first draft' for their written version, so hopefully there'll be something of a video version of this eventually for those who prefer that. Haven't finished documenting the whole of the facial recog part, nor elasticsearch. Just for some context: I've migrated my entire family off of google drive using nextcloud, including my dad's contracting business and my wife's small wfh custom clothing shop, have about 1.1m files across 11 users (all of which are doing automatic phone photo and contacts backups) and 16 linked devices between phones and computers, and 17 calendars - the time for initial login page is less than a second, and scrolling through even the photos in gallery mode is almost instantaneous. Super impressed with the Nextcloud dev's work!!!
  5. Hello All! As I'd alluded to in my earlier SR-IOV guide, I've been (...slowly...) working on turning my server config/deployment notes into something that'd at least have the opportunity to be more useful to others as they're using UnRAID. To get to the point as quickly as possible: The UnRAID Performance Compendium I'm posting this in the General section as it's all eventually going to run the gambit, from stuff that's 'generically UnRAID', to container/DB performance tuning, VMs, and so on. It's all written from the perspective of *my* servers though, so it's tinged with ZFS throughout - what this means in practice is that, while not all of the information/recommendations provided will apply to each person's systems, at least some part of them should be useful to most, if not all (all is the goal!). I've been using ZFS almost since it's arrival on the open source scene, starting back with the release of OpenSolaris back in late 2008, and using it as my filesystem of choice wherever possible ever since. I've been slowly documenting my setup as time's gone on, and as I was already doing so for myself, I thought it might be helpful to build it out a bit further in a form that could be referenced by others (if they so choose). I derive great satisfaction from doing things like this, relishing the times when work's given me projects where I get to create and then present technical content to technical folks... But with the lockdown, I haven't gotten out much, and work's been so busy with other things, I haven't much been able to scratch that itch. However, I'm on vacation this week, and finally have a few of them polished up to the point that I feel like they can be useful! Currently guides included are (always changing, updated 08.03.22): The Intro Why would we want ZFS on UnRAID? What can we do with it? - A primer on what our use-case is for adding ZFS to UnRAID, what problems it helps solve, and why we should care. More of an opinion piece, but with some backing data enough that I feel comfortable and confident in the stance taken here. Also details some use cases for ZFS's feature sets (automating backups and DR, simplifying the process of testing upgrades of complex multi-application containers prior to implementing them into production, things like that). Application Deployment and Tuning: Ombi - Why you don't need to migrate to MariaDB/MySQL to be performant even with a massive collection / user count, and how to do so Sonarr/Radarr/Lidarr - This is kind of a 'less done' version of the Ombi guide currently (as it's just SQLite as well), but with some work (in progress / not done) towards getting around a few of the limitations put in place by the application's hard-coded values Nextcloud - Using nextcloud, onlyoffice, elasticsearch, redis, postgres, nginx, some custom cron tasks, and customization of the linuxserver container (...and zfs) to get highly performant app responsiveness even while using apps like facial recognition, full text search, and online office file editing. Haven't finished documenting the whole of the facial recog part, nor elasticsearch. Postgres - Keeping your applications performance snappy using PG to back systems with millions of files, 10's or even hundreds of applications, and how to monitor and tune for your specific HW with your unique combination of applications MariaDB - (in progress) - I don't use Maria/MySQL much personally, but I've had to work with it a bunch for work and it's pretty common in homelabbing with how long of a history it has and the dev's desire to make supporting users using the DB easier (you can get yourself in a whole lot more trouble a whole lot quicker by mucking around without proper research in PG than My/Maria imo). Personally though? Postgres all the way. Far more configurable, and more performant with appropriate resources/tuning. General UnRAID/Linux/ZFS related: SR-IOV on UnRAID - The first guide I created specifically for UnRAID, posted directly to the forum as opposed to in github. Users have noted going from 10's of MB/s up to 700MB/s when moving from default VM virtual NIC's over to SR-IOV NICs (see the thread for details Compiled general list of helpful commands - This one isn't ZFS specific, and I'm trying to add things from my bash profile aliases and the like over time as I use them. This one will be constantly evolving, and includes things like "How many inotify watchers are in use... And what the hell is using so many?", restarting a service within an LSIO container, bulk downloading from archive.org, and commands that'll allow you to do unraid UI-only actions from the CLI (e.g. stop/start the array, others). Common issues/questions/general information related to ZFS on UnRAID - As I see (or answer) the same issues fairly regularly in the zfs plugin thread, it seemed to make sense to start up a reference for these so it could just be linked to instead of re-typing each time lol. Also includes information on customization of the UnRAID shell and installing tools that aren't contained in the Dev/Nerdpacks so you can run them as though they're natively included in the core OS. Hosting the Docker Image on ZFS - squeezing the most performance out of your efforts to migrate off of the xfs/btrfs cachepool - if you're already going through the process of doing so, might as well make sure it's as highly performant as your storage will allow You can see my (incomplete / more to be added) backlog of things to document as well on the primary page in case you're interested. I plan to post the relevant pieces where they make sense as well (e.g. the Nextcloud one to the lsio nextcloud support thread, cross-post this link to the zfs plugin page... probably not much else at this point, but just so it reaches the right audience at least). Why Github for the guides instead of just posting them here to their respective locations? I'd already been working on documenting my homelab config information (for rebuilding in the event of a disaster) using Obsidian, so everything's already in markdown... I'd asked a few times about getting markdown support for the forums so I could just dump them here, but I think it must be too much of a pain to implement, so github seemed the best combination of minimizing amount of time re-editing pre-existing stuff I'd written, readability, and access. Hope this is useful to you fine folks! HISTORY: - 08.04.2022 - Added Common Issues/general info, and hosting docker.img on ZFS doc links - 08.06.2022 - Added MariaDB container doc as a work-in-progress page prior to completion due to individual request - 08.07.2022 - Linked original SR-IOV guide, as this is closely tied to network performance - 08.21.2022 - Added the 'primer' doc, Why ZFS on UnRAID and some example use-cases
  6. @stuoningur Finally had some time to sit down and type - As I was doing some quick napkin math though thinking about your situation today, some points on the test setup/config: 4 disk raidz1, so 3 'disks worth of IOPs' rough rule of thumb is 100 IOPs per HDD in a raidz config (varies a lot more than that of course the default block size of 128k means 'each IOP is 128k' 128KB * 3 disks * 100 IOPs, equals ~38MB your test was for 256k block size, halving the IO - which results in our ~20MB/s Outside of the above: Confirm your zpool has ashift set to 12 zfs dataset. Nearly all newer non-enterprise drives (ironwolf being the SMB/NAS market instead) are 4k sectors (w/ 512b emulation). Huge potential overhead depending on the implementation, and really no downsides to this, so it's win/win. Lots of good background information on this out there for further reading if interested Check your zfs dataset's configuration to ensure it's a one-to-one match for what you're comparing against - Wendell did his tests without case sensitivity, no compression, etc Validate your disk health via SMART, ensuring no UDMA/CRC/reallocated sectors/etc are being encountered (which could easily contribute to hugely reduced performance Ensure the system is completely idle otherwise at the time of the test And finally, validate your hardware against the comparison point - Wendell's system had a 32GB l2arc, so the point about ensuring the file tested is bigger than the l2arc miiiiiiight've been one of those 'do as I say, not as I forgot to do' kind of things (he's a wicked busy dude, small misses happen to us all! However I don't think that's the case here, as ~45-60MB/s per drive for a 4 disk z1 is actually pretty average / not exactly unheard of performance levels) Assuming the config 100% matches (or at least 'within reason'), the rest is unfortunately just going to be going through those steps mentioned earlier, ruling out one by one until the culprit's determined.
  7. Likely a permissions issue with the location - should have "drwxr-xr-x" perms for the nextcloud appdata dir, and if running as PUID/PGID 99/100, then ownership of that directory should be "nobody:users". If that's not the case, I'd correct the permissions and re-attempt, ensuring as little customization as possible (e.g. use the built in database to trial it, and if this works correctly, then the issue is likely related to whichever DB/web proxy container you're running).
  8. @SliMat I'm mobile right now so cant 100% validate the specific log locations, but you should hopefully be able to find them pretty easy with a bit of googling for it. From the container side of things, I know nextcloud has a support diagnostic bundle that it can collect as well... but if I recall correctly, I think it included significantly more sensitive info than I'd feel comfortable asking you to post up here (not like "oh you're hacked now!1!!1!!" sensitive, but more local connectivity details, user account addresses, things like that). We could always move to DMs or something in that case? Up to you. As an alternative (or maybe in parallel with, up to you!), I could probably swing some time today to do a remote session together and I can help you poke at it a bit. I'm normally so starved for time that I couldnt afford to read an email notification (lol), but I'm on vacation this week and times a luxury I can finally afford (for a bit anyway hehehe). If you're willing to put in the work necessary to learn + self-teach/research + possibly a little additional coaching, I'd be happy to invest some time in helping you do so. (I should be back to the house around 1400-1500CT just for awareness in the event you wanted to take me up on a second set of eyes on it)
  9. Before anything further, I'd want to check a couple of the files (rename to proper extension and open in whatever application uses that ext). Are the files indeed safe and accessible? Whatever it takes, make another copy of the data once that's confirmed - whether it's a usb thumb drive, heck even an old blank dvd if it comes to it.... just something *OFF* the array in case its eventually found to be the cause. Once you have a safe, secondary, non-array dependent (an unassigned drive would technically be alright, as long as it's not something related to memory/segfaults), then we should start looking at the /var/log messages for the host, spot check a couple file's ACLs to make sure they've not gotten bent out of shape, and compare the host logs with nextcloud and (whatever you're using for RDBMS - maria/mysql/postgres).
  10. Honestly too much to type - I'll hang loose for a bit while you're scoping this out further and just wait to hear back on whether youd like a second set of eyes on it with ya. Best of luck, and enjoy the journey!
  11. This sounds like one of two things: 1. You're using the external storage add-on and its misbehaving 2. inotify's lost its shit Assuming you dont have a backup, that's your first act - browse your nextcloud share from the unraid console (your mapped directory for /data), then go to <yourUser> -> files_trashbin -> files Copy everything out somewhere off the server in case it's your array causing the grief. Once you've confirmed your backups good, then we can worry about the NC side of it.
  12. Whole bunch to go through there, too much to type right now, but a few things to consider: * Your test is for 256k IO using the random read/write algorithm, with sync enabled. * The default zfs dataset has a 128k block size (half the test block size), so two write actions for each 1 from fio. With sync, you're having to physically finish and validate the write to disk before continuing, not an ideal workload for HDDs anyway. * On top of that, weve got a 64 IO depth (which is essentially "how long can my queue be") is essentially halved by the default dataset blocksize; sort of "cancelling it out", down to 32 in effect The most important part though is this - in order to properly test your storage, the test needs to be representative of the workload. I pretty strongly doubt you'll primarily be doing synchronous random r/w 256k IO across some ~20TB of space, but in the event you do have at least some workload like that, youll just ensure that one dataset on the pool is optimally configure to handle it in order to ensure your results are "the best this hardware can provide". Also, would be happy to set aside some time with you still of course! As an FYI (just given the time of your response here), I'm in GMT-5, assuming we're basically opposite hours of eachother, but im certain we could make some time thatd work for us both. You just lemme know if/when you'd like to do so. I'm actually working on some zfs performance documentation geared towards unraid on github currently (going over different containers with recommendations on both how to configure their datasets as well as test+tune, general "databases on zfs" stuff, tunable options from the unraid/hypervisors side and when/how to use them, and so on), and the above post has been enough to kick me in the rear and get back to it. It's been an off and on thing for (months? Hell, idk), but I'll try to share it out as soon as at least *some* portion of it is fully "done". Maybe itll help someone else down the line 👍
  13. Also wanted to mention - if you ever decide you need that extra ~400MB/s or so (1.1GB/s being well within reason for a 10Gb connection), the above would still definitely apply - and the journey's most of the fun!!
  14. Glad to hear it, and happy to help! As luck would have it, I'm actually working on some performance tuning for 40Gb this week, and the number of things that play into it are wild to think about. Things most would never think of like chip architecture, bios settings (NUMA, memory interleaving, etc), driver specific flags, and so on - super interesting stuff!
  15. I couldnt call it a "known issue" per se, just that achieving full 10Gb throughput almost always takes some tuning. 10Gb is a whole other can of worms, and achieving that level of throughput requires both careful planning, and a decent amount of optimization (both host and hypervisor side). For one, you're far more likely to need to worry about your peak CPU frequency, context switching, and high interrupt counts. Beyond that, you're much more likely to encounter IO bottlenecks in "weird" (or at least previously unexpected) places. You'll have to start by determining where the bottleneck is. For instance, are you simply mounting an SMB share? Have you tested NFS to see if you get similar IO behavior? What about share passthrough in the VM config (this usually sucks)? Tried the virtio network driver instead of virtio-net? And so on and so on. Just changing things without at least having an inkling of where you're bottlenecking is a recipe for pain. I'd start by installing something like the netdata docker container; start it up, initiate your 10GB copy, and look for anything that appears to spike in the reported statistics. High IRQ remapping? Single core pegged at 100% utilization? What else does that core have going on if so? What's the reported disk utilization at that time? Once you figure out what the bottleneck is, then you can start doing research on how to correct it; the solution will be unique to your configuration and the cause of the bottleneck, so just be prepared to do a little googling, and some trial and error along the way. Happy hunting!
  16. If you've 30-40 minutes free today, we can do a quick remote session and take a look if youd like? We can probably sort out the cause in 10-15, but buffer never hurt. Shoot me a DM if youd like, I'll be around off and on throughout the day 👍
  17. I honestly dont see them including it in the base OS, instead opting to better support ZFS datasets within the UI for normal cache/share/etc operations. e.g. youd be able to create a zfs cache pool, but only if the optional zfs driver plugin is installed; otherwise, only btrfs and xfs are listed as options. Thatd be how I'd do it anyway - forego any potential legal risk (no matter how small), limit memory footprint by leaving it out of the base for those who dont need/want it, and numerous other reasons.
  18. The last time I tried it was about a month and a half ago, and at least at that time, it was all sorts of borked. The QEMU folks are working on it though - the first kernel with support built in was 5.18, so it's not even fully supported by the hypervisor yet (6.10.3 is still 5.15; 5.18 still seems to have some 'teething' issues last I'd heard). And just to be clear, by 'fully supported', I'm referring to the various alder lake specific components (IPI/thread director/et al) - doesn't mean it 'won't work', just that it 'won't function explicitly as designed'.
  19. Absolutely this - also, can confirm it was 10th gen, with the intro of Xe graphics for their integrated GPU's. Intel had already laid the groundwork to support Xe with the open source community (including working with ffmpeg) prior to release to help ensure a "smooth" release. Plex's tweaks to it were out of tree and significantly behind current though, so they received none of those fixes/enhancements. I assume it has to do with their attempts to lock down transcoding to paid plex pass holders. Same issue with their SQLite implementation - they've hacked the hell out of it such that it's basically unsupportable by anyone but them... which means if it goes paws up, you're SOL when it comes to many manual DB repair options. I hate that plex has become the defacto media streaming tool given all this, and wish I could convince enough of my users to move away from it that I could stop supporting it 😓
  20. @calvados the next time this happens, some other things to look at: * Autosnapshot tools - sanoid/syncoid/auto-snapshot.sh, anything that automatically handles snapshot management and is currently configured on the system can cause this. If you have it, kill *that* process (e.g. sanoid) first, then retry. * intermittent commands - you can try "umount -l /mnt/path" to do a "lazy" unmount; pretty commonly needed for a variety of reasons. Glad you got it sorted for now!
  21. @calvados can you check for anything in use on the pool? ps -aux | grep cxUrPool Short of this, you can also try setting the mounpoint to legacy.
  22. Is your nginx (swag/nginx-proxy-manager) container listed as a trusted proxy in you config.ini file? That's usually where most people get hung up that make it as far as this.
  23. Aside from everything else, doesn't someone else's ownership of the unraid.<anything> domain constitute a problem (if not at least an area of concern) re: trademarks? Or how would all that work from a legal perspective?
  24. Please dont take this the wrong way, and I'm coming at this from a place of respect in helping you protect your data, but if you're not willing to do the legwork to research / find things of this nature, zfs might not be best suited for you... ZFS is super powerful, but can also be quite dangerous if deployed without proper care, or at least a willingness to seek out answers ones self. Again, I understand you may be strapped for time, and I dont mean this as any kind of slight, not at all. Just that you'll need a willingness/time/patience to search out your own answers in order to be successful with zfs in the long run. Apologies in advance for any offense, none meant!