drumstyx

Members
  • Posts

    99
  • Joined

  • Last visited

Everything posted by drumstyx

  1. Hooray! You did it! What a weird problem lol Now my first world problem is that my internet is actually faster than my SATA drives (not to mention the disk shelf they're in is a DS4246 so even turbo write peaks at about 85MB/s), so I can't rely on the mover to keep up with data intake. Maybe I can throttle nzbget at a certain cache usage threshold...
  2. Fair enough -- I've attached diags. I'm not quite sure where to look in them for this issue. As far as I can tell, there have been no dropouts or errors on the cache other than a failure to write due to insufficient space. datnas-diagnostics-20221230-1334.zip
  3. Right, but why is there a filesystem issue when it's "full"? Isn't the global reserve supposed to prevent this? I guess I didn't have min free space configured on cache, so my bad on that, but filling a filesystem shouldn't result in unresolvable readonly. If I accept the risk of some corrupted data in whatever was last written, is there any way to just say "to hell with integrity, delete this file"?
  4. Long story short, I upgraded to 3gbps fiber internet, and accidentally filled my 2tb cache pool (dual 2tb nvme drives) with the automated download processes. The filesystem went readonly, and I tried to just reboot, and of course, readonly again. On top of this, a disk in my array happened to fail a couple days ago, and I didn't notice, so I'm running a rebuild on a warm spare I had in there to replace it. Problem is, now I can't run mover, and of course even after copying manually, I can't delete anything to free up space. The global reserve is at 512MiB and says 0 is used, but still, it's just not happy. The rebuild means I can't go into maintenance mode right now. I'll wait it out if I have to, but it'd be awesome if I could just figure out how to delete even a single file from the cache to free it up a bit. Scrub doesn't work (readonly) balancing doesn't work (readonly)....seems like I'm stuck? It's frustrating because I went dual btrfs on the cache to prevent issues like this -- the suggestions I see on here have been "best bet is to backup cache, reformat and rebuild cache".
  5. Does this happen to have tools that used to be included in the old devpack plugin? Awesome to have an equivalent for nerdpack, but devpack was also indispensable for some uses! Seems to be a VERY similar codebase, so likely not a difficult port, if you're up for it!
  6. config/stop runs very early in the shutdown process -- not sure if you meant to quote me where I said "To run things LATE in the shutdown process is hard" there, but it holds true. It works great for various full-stack cleanup, or external notifications/hooks, but is technically not the best place for commands that affect the status of the server hardware (like my much earlier posts about turning the AC power off at the plug).
  7. The stop script is just about the earliest you could want anything to run. Shutdown is initiated with /sbin/init 0, which sets the runlevel to 0, at which point the absolute earliest hook is the first line of /etc/rc.d/rc.0 (after "#!/bin/bash", of course). The number after "rc." in the script name is the runlevel. Runlevel 0 is shutdown (aka "halt", officially) and runlevel 6 is reboot. In Unraid, rc.0 is just an alias for rc.6, and one script handles both things, as it looks to see which runlevel it was called as. The only scripts that run before /boot/config/stop are scripts within /etc/rc.d/rc0.d/ or /etc/rc.d/rc6.d/, depending on runlevel, which at this point contains only the flash backup of the myservers plugin, if applicable. All that to say, to run things EARLY in the shutdown process is easy -- /boot/config/stop. To run things LATE in the shutdown process is hard, and would involve either modifying the base unraid image on flash, or adding something in the go script to modify rc.6 to your liking on boot (either copying a complete file, which would have to be modified manually with each OS update, or some awk/sed/bash magic to modify the last if statement in rc.6)
  8. I've experienced various hardware failures over the years, and they've been a pain in the arse! Not the fault of UnRAID, just because I didn't take all the precautions necessary -- failed cache, failed drives, failed USB. Thing is, I can mitigate almost everything by just having some redundancy -- dual parity, cache pool, etc. which I've now done. Got this thing almost bulletproof, but for one thing: A USB drive failure. Is it possible, or are there any plans to make it possible, to have a boot pool or redundancy for the boot drive? I know that, in theory, the boot drive is only needed for power on and power off (actually, not even sure it's needed for power off?) but I do write a rotating syslog to my boot drive because it's very useful for debugging array and system bugs. I could write to an unassigned device, or the cache, but the boot drive is guaranteed to exist as long as the system is running, and if any of those other drives drop because of a system-wide failure, I don't want to lose the logs. I'm willing to sacrifice a boot drive every few years, but I'd REALLY like to not have that cause an outage for which I have to be physically present to fix.
  9. Long story short, I had a cache drive failure (ADATA nvme ssd, quite a surprise) and after recovery I'm looking to improve reliability. I've picked up a 2TB Samsung NVMe drive to complement the ADATA warranty replacement when it arrives (also 2tb) Thing is, I know the process when an array drive fails, but I have no idea what happens when a BTRFS drive fails in a pool. I've read some horror stories about a failed drive causing an unrecoverable situation. Of course I back up important stuff to the array (and important stuff on the array is backed up offsite) but recovering from that is still a pain in the butt. So, can anyone walk me through what a BTRFS cache pool drive failure looks like?
  10. Does Unraid work fine running on cores other than 0 these days? I remember when I first started playing around with pinning/isolation years ago, Unraid didn't play nice if I pinned/isolated core 0, even if I left, say, core 3 (the last core on a quad core CPU) entirely untouched. So I've always just let Unraid have the first core, which is no problem with homogenous CPUs. If I went 12th gen though, I'd really hate to give up a P core if I could hand it an E core for its minor management stuff.
  11. Got it -- runlevel is 0 for shutdown, 6 for reboot, 3 for normal operation For posterity, here's my hacked-together script for it: #!/bin/sh currentRunLevel=$(runlevel | awk {'print $NF'}) echo "*********** THE CURRENT RUN LEVEL IS ***********" echo $currentRunLevel if [[ $(echo $currentRunLevel) = 0 ]]; then snmpset -v 1 -c private 192.168.42.20 1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.1 integer 5 snmpset -v 1 -c private 192.168.42.20 1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.2 integer 5 snmpset -v 1 -c private 192.168.42.20 1.3.6.1.4.1.318.1.1.12.3.3.1.1.4.5 integer 5 fi The echos at the top were just for me to test that runlevel was changed before the script is executed, and indeed it seems it can be relied on. To add: this script runs pretty much immediately when a shutdown is triggered (at least from GUI). BEFORE unloading anything. Technically, that means it's not an ideal spot for it, but since unraid force kills things after 90 seconds (at least, that's what I assume when it says it's allowing 90 seconds for graceful shutdown) my timers getting triggered over snmp are fine at 180 seconds. The only downsides to the timeouts is that coming back up after power loss is a rather long process (I've got a raspberry pi that listens for UPS OFFBATTERY state and brings things back, but it needs to wait longer than the timeout to make sure the server isn't currently trying to shut down), and power is consumed from the UPS for longer than is really necessary, but only a minute or two extra. You just can't make the timings too tight, otherwise if there are any hangups during shutdown, it could kill the power too soon. EDIT: This isn't a thread for SNMP specifically, but I thought I'd note what the heck those long strings are: OIDs for the outlets on my PDU. APC provides MIB files to use in an MIB browser, and with some digging you can figure out how to control various things. It's a pain, not to mention archaic, but it's neat once it's set up. I'm positive I'll hate myself if I ever move outlets around and forget I set this up, but for now it's a blast!
  12. Come to think of it, this will probably run on both reboot and shutdown -- if the outlet timeout is long enough, and the scripts robust enough, it shouldn't be a problem for functionality, but it could slow down reboots considerably. Any ideas as to how I might ensure it only runs on shutdown?
  13. Ah, this could be perfect -- do you know if it runs before or after networking is unloaded, or can you point me to what script calls the stop script? The most ideal place to put this script is RIGHT before the poweroff call, just like a UPS killpower command, but just like the UPS killpower command, this won't work over SNMP because networking is already unloaded by then. Annoyingly, filesystems appear to be unmounted after stopping networking, so the only real option is to have a long poweroff delay on the outlets and send the request at the beginning of the shutdown. Probably fair to assume it's before the rc.0 script is called at all, so I'll give it a shot!
  14. Looking to run a script that shuts down my disk shelves on unraid shutdown. I've configured poweroff delays on my switched PDU, and I'm sending snmp commands to the PDU to make it work. Problem is, I'm using /etc/rc.d/rc.0 as the script location, and this is in the RAMdisk, so it doesn't persist across reboots. There used to be a powerdown script, but that's apparently been deprecated, and rc.0 just calls /sbin/poweroff, which is a binary. Basically, I'm looking for the shutdown equivalent of the go script
  15. I'm trying to fine tune my UPS power down and power restored workflows. I run 2 DS4246 disk shelves, and they need to be powered up for a minute or two before unraid boots, to avoid panics about missing disks. I've got a switched PDU, and a UPS with an outlet group. The outlet group on the UPS powers non-critical systems that I want to come back after a power outage, but not only my unraid server, so I can't just switch that off/on based on power status. So I've been working to directly switch the PDU outlets. Honestly, it's a long story, but the bottom line: I need to be able to power up those 2 disk shelves when unraid boots, and then tell it to wait a couple minutes for them to spin up, then continue with mounting the array. My understanding is that the go script runs asynchronously to array booting and mounting, so that's out, and user scripts simply specifies "AT" array start, not whether it's run before or after. What are my options here?
  16. Heck, I'd even pay for multiple licenses to make this possible -- I already own 2, but I don't want to run 2 whole machines and deal with network drive mapping for sharing media between them
  17. Update: I uninstalled almost all plugins, and when it settled down, I slowly reinstalled some. I think it *might* have been a misconfigured 'fix common problems' plugin.
  18. I've got a number of unassigned drives I keep connected as warm spares, as I got a good deal on a bunch of 8TB drives a while back. They're precleared, and I intend to keep them just dead idle until I need to either expand or replace a drive, at which point I can do it remotely (as I keep my server at a different location than I live primarily). I had some trouble with the unassigned drives plugin, in that it wouldn't recognize unassigned drives connected via my HBA (all drives reside in a Netapp DS4246 shelf), so I uninstalled the plugin, and am just using Unraid's native handling, which at least allows me to spin them up/down. The trouble is, I spin them all down, and seemingly randomly, they come back! No docker containers I have have access to /dev directly, and I can't figure out what plugins would possibly be causing it. Does the mover maybe accidentally hit those drives when scanning for changes?
  19. Quick question on this -- do I need to be careful of every Unraid upgrade now? Should I only upgrade through the plugin settings tool, rather than the built-in tool now?
  20. Necro-ing this thread to put a vote in for this. It's not a *huge* deal, but would be very nice. It basically completes the loop on all writes only ever happening when mover runs, and parity drives staying spun down 23 hours out of the day.
  21. Very nice, super handy! A few feature requests: Ability to reorder disk tray layouts, even if just with an up/down button, or even as simple as an order field. I've got hotswap units in my main server (3x4 drives) and one of them is sideways, plus a DS4243 shelf, so I have 3 configs, and I accidentally ended up with a couple out of order. Minor issue, but it just looks strange to see 2 before 3. Line breaks in dashboard view and tools area.
  22. I've shucked 3 elements drives, and all 3 had EMAZ drives, so I'm fairly certain the new ones will be the same, but I've heard the MyBook enclosures have EZAZ, and I've heard some things about them -- I thought EMAZ was air filled, but looking at SMART, you're right, they're helium -- so I guess I'll just have to see what's in the EZAZ drive when it arrives. Maybe I'll keep one for kicks to see how it compares to the EMAZ drives
  23. I've done a lot of confusing reading, and as I've got a bunch of mybooks and elements drives coming in the mail (and mybooks tend to have EZAZ vs EMAZ in the elements/easystore devices) I'm trying to figure out what drives I should actually want. Some info points to one being He filled, while the other is air filled, some info points to a lower cache on EZAZ, or EMAZ being more likely to be a white labelled red, but I'm honestly confused -- what drives do I want? I plan to only shuck the ones with the drives I want.
  24. Running a parity check right now for the first time since loading 4 drives into my DS4243 drive shelf, and it's fairly slow -- around 45MB/s, compared with the previous average around 100-125MB/s. Is the shelf itself something of a bottleneck? I'm using an LSI HBA, as recommended, and each individual drive is fine in terms of speed, but I'm thinking maybe the total bandwidth available from the shelf is only a few hundred MB/s? Is this to be expected? On that note, what happens when I start running 20 odd disks in this thing? Ah, there we go -- rebooted and I'm seeing 108MB/s or so. Much better! I wonder why...