BVD

Members
  • Posts

    330
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by BVD

  1. @ljm42Sorry to bug ya with a seemingly unnecessary follow up, but wanted to ask - do you know, or happen to recall, how this was initially sorted out? Debug's annoying, and pktt felt like crazy overkill, but I honestly couldnt think of any other way around it, and am wondering if theres some other simpler method that would've pointed to the answer (initially I mean, prior to it being added to tips and tweaks). Thanks for your time!
  2. Since it's all bash, I'd increase syslog's logging level to debug and re-test, then evaluate based on what you see there. If you haven't attempted rebooting the host yet, that's worth a shot prior to investing the effort into debug - there are times where something in memory isnt updated to reflect config changes, which a reboot would then fix. Not common, but worth a shot. If both of those come up empty (e.g. you find its something network related), you're next bet is unfortunately likely a packet trace (ugh). It's only been about 24 hours, and we're mostly volunteer here - we do what we can, but sometimes it takes us a bit. Little patience goes a long way ❤️
  3. @sonofdbn glad you got it sorted, and happy to have been able to help! 🎉
  4. Can you expand on what you mean (what the "this" part is)? How the plugin works, or snapshots? Or maybe the filesystem itself and how snapshots work within that filesystem? Not sure. If you're unfamiliar specifically with advanced filesystems, or just havent worked with them previously, the first post in the zfs plugin support page has some handy reference links towards the bottom that should help as a primer.
  5. Did you ever track this down to determine the cause? I don't have any systems which exhibit this behavior, but I'd be curious to dive into it if you haven't already and are interested... Fuse is a HUGE pain in the nuts to troubleshoot from a performance perspective, but maybe if we could at least track it down to a specific set of variables from the HW/Network side where this ends up being seen (e.g. typically occurs on systems with low max freq., has turbo boost disabled or unavailable, and is dual CPU, just for instance) and maybe even what fuse is doing during that time that's taking so long (enable debug tracing, check out what the fuse, kernel, and filesystem API's are doing, which calls those APIs are making and which are taking the longest, perhaps get a flame graph to make visualizing that part a bit easier). I'd guess the reason behind a simple `mv filename /location` being slower than copying over smb has to do with the fuse interrupt queue getting thrashed - with mv being a single-threaded-with-no-optimizations type of tool whose thread is on the same server doing the fuse ops, you're almost certainly going to have more interrupts than with SMB given it's massive history of perf optimizations and the fact that those optimizations were designed against our use case (network file transfers). I could see how with * either * a limited number of cores (causing high IRQ from the kernel side), or lower frequency (fuse's interrupt queue getting stacked up waiting for cpu to finish it's actions with the filesystem/etc), SMB could be more performant. ... Im rambling. It's late. Anyway, hoping to hear your thoughts!
  6. Why'd you use sudo? All that should be required is just the command 'updater.phar' - to rule out permissions issues having been introduced, I'd restore the app and DB from backup and re-run normally, just in case. If you're looking for further help, you'll want to increase the logging level of nextcloud from the cli, restart the container, then post up what you're seeing.
  7. This sounds like a space issue - you're at something like 0.4% free space, and even if that equates to multiple GB of space, you have to take into account the fact that the filesystem has journaling/maintenance to address that uses your free space to do so. Any filesystem (and each disk contains a single unique filesystem - unraid just presents the array as a single namespace for it) starts acting more and more wonky the closer you get to capacity; even for write-once-read-many things like you'd typically put on the unraid array, 3% is the bare minimum I'd want free (5% is better). Add more capacity, and I'm near certain your issue will go away.
  8. The power consumption one is something I thought was already there, but when I went to look for it, didn't see it immediately off-hand. The second is the more interesting one to me honestly - I see a TON of people talking about idle power consumption, and I'm just wondering how much time people actually have these servers idle. For myself, I went back 96 hours, and there's only 2 hours in there that I'd call 'idle'. There's almost always at least one plex transcode going, tdarr will often see new media and get cracking away on converting it, Nextcloud is now up to 7 users and there usually someone working on a document or sharing files, checking their email, or syncing something with their PCs/phones, just tons of stuff. I'm just wondering whether 'idle power consumption' is a thing folks actually see (and how much) - even if my system utilization is above 'typical', it makes me wonder if instead we shouldn't be thinking about the system's efficiency (instructions per clock and the like) vs 'mine idles at 30w' ... For the record, if I saw 30w on mine, it'd mean something's broken/misreporting 😅
  9. +1 for this; one of several things (some of which are admittedly self-inflicted lol) keeping me from heading to 6.10. My eth0 is a shared IPMI/host onboard, and I don't use it for anything other than IPMI as it's 1Gb and vlan'd off at the switch level anyway, instead accessing everything over eth1's 10Gb interface.
  10. I'd like it better if they sorta threw them and the docker logs in with the syslog (separate files of course) so you could just have them *all* sent to the configured syslog server.
  11. This should be moved to the support section IMO - this sub-forum is for guides created to assist with different components of unraid.
  12. @Partizanct That was actually intentional on my part 😅 This is primarily because there's precious few that applies to 'everything'. For instance, everyone says 'ashift=12' for the pool, right? But that means that our physical layer block size is set to 4K, and there's a huge amount of NAND out there that's 8k, even some that're 16k, leaving a lot to be desired. Or what about setting dnodesize to auto? This is great, but really. works best with xattr set to SA, and if you're not accessing the data primarily over NFS/iSCSI/SMB, you could actually lose (not much, likely, but some) performance. Heck, setting xattr to sa also means that pool's linux only now, losing the portability to BSD kernels (and others), as it's a Linux only thing. I'd hate to recommend something like that too broadly, then the user find out years later when they try to move the pool to some hot new BSD based system that's got all the new bells and whistles that they simply can't because some guy online said it was a good idea and they never looked any further into it right? Better that those values get researched, their implications understood, and folks choose what's best for them and their specific situation. Recommendations differ for HDD vs NAND as well. The other part of my reasoning also goes back to what I feel is required for someone to be successful with ZFS (will to learn, ability to research, and time to invest in both). For this one doc at least, the idea isn't to give someone an all inclusive summary of the best way to use ZFS on UnRAID overall, but to spark that something that gets them into the game if they read it and find themselves thinking 'this could've saved me hours last week on X, I wonder what else it can do...' I do give more explicit detail where possible though - for instance, postgres has it's fileset configuration laid out, with explanations of why. for each, same as I hope to continue to do with each other app as I find time to get them translated from my deployment notes to the docs github. I mentioned there's precious little I'd say applies globally, but that which does boils down to: atime = off compression = (at least something more than 'off' - again, whether to use lz4 or zstd still kinda depends, as if someone were using the old Westmere or Nehalem procs, lz4 is probably it for them) Everything else has sane defaults for most systems, with recommendations for specific deployments needs... I'm sorry in advance - I know this isn't super helpful in and of itself! I just hope my reasoning on why I did it this way makes some kind of sense at least!
  13. This sounds like a DB or data consistency issue, especially given it happened after (I assume you had to do a hard reset) reboot. Have you tried restoring a backup and starting the container pointing to that backup (restoring both the DB and the nextcloud appdata being a requirement here as both are dependencies), and if so, does it fail at the same point? If it does, I'd start looking into methods to verify the integrity of your database, as well as the troubleshooting documentation from nextcloud re: their appdata dir and DB indices.
  14. Have you checked permissions for the files it's specifying issues accessing? Confirmed that the container is running as that expected user account?
  15. Pull up the container's console and check the available storage from the container's side - just 'df -h' will show you something like: Filesystem Size Used Avail Use% Mounted on overlay 80G 30G 51G 37% / tmpfs 64M 0 64M 0% /dev tmpfs 24G 0 24G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm shfs 91T 61T 30T 68% /data If nothing there's showing near full, do the same from your unraid terminal. Depending on what's full, it may be as simple as a restart of the container, up to possibly remounting a tmpfs volume with a larger capacity specified - whatever the case, finding out what's full should give you the breadcrumb trail needed to research and correct it. If nothing is legitimately showing 'full', I'd increase the nextcloud logging level and reattempt. Edit - assuming you've verified you've got all your array disks set to available to the share mounted to the /data dir of course, though I'm sure you've covered that lol. Doesn't matter how much free space you have in the array if the nextcloud containers share isn't allowed to use em all hehehe
  16. @Partizanct (and @Marshalleq if you're interested of course, more the merrier!) would you have time to give this a once over? Why would we want ZFS on UnRAID? What can we do with it? This is much less a 'technical thing can technically be done X way' doc than a 'here's why you might be interested in it, what problems it solves, and in what ways'. Given this, and that those types of reference material can often be interpreted numerous differing ways by different folks, I just want to make sure it's at least coherent, without going so deep into the weeds that someone newer to ZFS would just click elsewhere after seeing the encyclopedia britannica thrown at em as their 'introduction' lol. Open to any and all feedback here - again, this isn't supposed to get super technical, and has a unique goal of explaining why someone should care, as opposed to the rest of them which go over how to actually do the stuff once you've decided you * do * care enough to put forth the effort, so there's no such thing as 'bad' or 'useless' feedback for this type of thing imo. Anyway, thanks for your time!
  17. 114k tracks, you friggin MONSTER YOU!!! I'm super interested in taking a look at this, it sounds like an exciting challenge to me 🎉. Maybe start by getting some measurements to quantify the slowness, just something you can easily reproduce (like 'when I go to the activity page, it takes X seconds to load', stuff like that), then incrementally test a few things to see what we come out with. When you say super slow with operations... Are you using lidarr extended, referring to those scripts maybe, or do you mean the basic lidarr maintenance stuffs? With that many tracks, what's your sqlite DB size? ~2GB or so maybe? At that size.. Man, there's a bunch of additional stuff to think about - the WAL size, the compile time parameters for max pages before a checkpoint, hell, even the option to cache all instead of metadata only as an option of last resort lol. Few starting points: 1. You want to vacuum that sucker regularly, especially after mass media changes - I'd start with this, especially if you've never done it before. With the container stopped of course, but just cd to the dir then - "sqlite3 lidarr.db VACUUM;" 2. Definitely try the txg modification from the postgres side of things - as long as all your pools are redundant (z1/z2/z3/mirror) make the change, then start the container and evaluate. - "echo 1 /sys/module/zfs/parameters/zfs_txg_timeout" 3. In order to avoid NUMA issues, use lstopo, then pin the container to *just* one numa node worth of cores - at the very least, ensure only one CPU can operate the containers threads. Otherwise, you're certain to hit a significant IRQ penalty due to context switching... With the lidarr devs choosing to compile sqlite with the default variables (and hence, the db parameters set upon creation time are 'sqlite default'), we get hit multiple times by constraints that simply don't account for such massive DBs. For example, the default max number of pages prior to checkpointing the DB (a very 'expensive' operation) is 1000, and with a maximum page size limit of 64K, that means every 64MB worth encounters a checkpoint. Assuming you're at ~2TB, you've got a minimum of ~30 checkpoints if you were to 're-write' the database, which is bonkers. That's probably about as much as I'm comfortable recommending without actually looking at the thing - everything from modifying the zfs dataset to remove sa xattr and go back to posixacl (they do have the potential to incur a performance penalty, and arent necessary if you never access the data outside of the terminal shell of your unraid server - and I mean 'ever', so please, anyone else reading this after the fact, please leave xattr to sa unless you know you know better!) to throwing the database in a dedicated zvol, on down to crazy stuff like recompiling sqlite3 within the container image and rebuilding the database with new parameters to allow us to modify things like the sync behavior, that checkpoint option mentioned above, and a bunch of others, they're all on the table 😁 Anyway, just lemme know if you try any of the above what the outcomes are like, and/or if you'd eventually like to take me up on the second set of eyes; you can probably tell, but I'm eager to take a whack if you end up being game for it hehehe. _____ ioztat is basically 'iostat, but at the zfs fileset level' - super helpful when trying to track down latency issues especially. I briefly touch on it here _____ As for the variable page size - this is absolutely true. However, there are a lot of other factors at play here as well - for instance, as you rsync'd the data in bulk, that data wrote everything in 64K chunks because it had 'one big-ass file' (your lidarr.db file) to write at once, so it was able to fill those 64K blocks you set the lidarr fileset to use (also resulting in 'right after the copy, it was the most performant it'd ever be with these settings' by proxy). On top of that, the extended attributes of the file itself can be anything from 255 bytes up to 64KB in linux, so if we have xattr=sa, each file we've got extended attributes for has this linux metadata (which zfs thinks of as 'just data', and has it's own metadata for as it's contained within the file itself now), and *that* almost certainly won't be a full block... My one concern here with the rsync is if those pages don't line up block-wise with the records written. Correcting this is as easy as vacuuming the db though, so the above options will cover that 👍 Anyway, this topic could get super long. Suffice it to say 'what you've read is very much true, but is wicked easy to take out of context, and often is taken as such by otherwise trusted online sources' lol.
  18. Keep in mind, while filesets can be renamed (and hence, their directory structure changed) on the fly, the existing data will still have been written in the prior fileset configuration. If you change things like recordsize, xattr, anything about the data itself, you'll need to copy the data off/back (or send to a new fileset with those parameters) in order to apply that configuration to the data (only applied at the time of write). ... I think I'm going to write a doc going over some more generalized zfs information at this point. I'd resisted the urge, as there are a couple detriments in my mind to doing so: 1. ZFS, while powerful, isn't for everyone - one has to have the desire/drive (and just as importantly, the TIME) to do some of their own background research and learning, or it's super easy for them to have a much worse experience with zfs than alternatives (or worse yet, cause themselves massive suffering by copy/pasting something they saw online ) 2. There's already so much out there on the basics, adding more stuff to the main page could mean that people just skip reading anything there altogether - if there's a short 'here's what you need to learn further', I feel like it's more likely to be read than if you add more than is absolutely necessary. Both of these things though I think aren't necessarily going to be a problem here - I was helping out another fellow forum member with some nextcloud issues where I'd asked if he'd be interested in proof-reading some of it for me once I got the time to write it, and he seemed amicable to the idea (we got his data back, AND his nextcloud instance running again - WOOOT!!). I just didn't know how useful it would be... But it sounds like there would be enough use to serve a purpose at least ❤️
  19. Did you copy the data out and then back over after setting up the fileset? And what part of lidarr is slow - e.g. is it just general browsing, or looking at a specific set of pages? Could you share your zfs get output for lidarr, and maybe check the output of ioztat as well to see what it's IOPs usage is like? Finally, how many songs+artists are we talkin? I think you might be talking about the dnode size setting (re: variable records) - this is something I haven't covered in the guides/docs yet, but maybe I should... You may also be talking about how it treats multiple inodes from differing objects when writing a record though, not entirely sure. I could take a look with you at some point if you'd like (we can get a webex going or something after taking this to DMs) - I wonder if maybe you're hitting a frequency limitation (single threaded for sqlite, so boost speeds are important). You might also disable the zfs txg timeout value I'd noted in the postgres guide; it'll help with anything DB related, but should only be used if all zpools on your system are redundant (no stripe only pools). If this doesn't clear it up (it can be unset after the fact, it's a kernel level change), I'd want to start looking to see what our interrupt counts are like during slowness periods, arc hit rates, and so on. There's no reason a 1950x shouldn't be able to make lidarr with sqlite at least usable, even with ~40k songs (that's how many I've got currently at least, so it's the most I could comment on for now). Just lemme know - it sounds like audio's a pretty big deal for you, would be happy to poke at it a bit with you if you like, maybe get a better experience out of it for you.
  20. I actually do that in the guides, setting up the recordsize and explaining why it's set to what it is - equally. important for the sqlite applications (like the 'arrs), where we set the page size to the max (64k), and configure the fileset to match. Only way to make sonarr history viewing any kind of performant with 10's of thousands of shows 👍
  21. I tried to allude to this in the guides themselves by first noting the fileset specifications recommended for that given application, but I guess it may not've been clear - I use a specific fileset for each application. Not only is this the only way to actually tune zfs for each individual application (as you're applying filesystem level features specific to those applications), but it's also the only way to fully take advantage of the snapshot and backup features in a meaningful way - this way, you can apply specific snapshot requirements to your dbs that you don't necessarily need for your more static applications. As a for-example, I've attached my zfs list output; filesets natively inherit the settings of their parents, so if you've a bunch of apps which have similar requirements, you don't have to keep manually setting them each time. My 'static-conf' filesets, which are almost never changed, the 'vms' fileset, which has all img files, etc, then just customize the few small changes needed for the others (...speaking of which... reminds me I need to move postgres over to wd/dock/dep - where all my databases [dependencies] are lol):
  22. If you apply the sqlite tuning, I'd argue that postgres isnt really required. Even with all the media I've stored, everything loads in less than a second after applying the tuning, along with the benefit of having all the containers data in a single location. To me, the added complexity isnt worth the negligible benefit (Not to mention the additional request load on PG that could be used for other more intensive applications such as nextcloud).
  23. Did you both go through the guide to optimize sqlite? None of the 'Arrs actually need postgres IMO, excepting possibly lidarr.
  24. I use postgres as the application database for everything that both supports (and actually needs) a full fledged DB actually. What do you mean?
  25. Happy to help! I've been at a cabin with limited connectivity the last 3 days or so, but am returning to civilization tomorrow and have a couple more days worth of updates planned, so hopefully there'll be a bit more somewhere within the updates thatll help with some other parts of whatever you're workin on as well 👍