11rcombs

Members
  • Posts

    44
  • Joined

  • Last visited

Everything posted by 11rcombs

  1. So in the end, what was the actual bug here, and how did it manifest? I'm mostly wondering if there's anything libsqlite's doing that relies on particular implementation-defined kernel behavior that isn't actually guaranteed, in which case I'd want to report that to the sqlite devs with a description of a repro case.
  2. >Yes maybe docker related - using a bind mount of what? Of whatever the affected sqlite database is on, just as opposed to using a volume. Again, I don't know how those features are implemented, but there's a chance it might hit on a different code path. >This is going to generate an absolute mountain of data because often you have to run days or weeks before you see corruption. Could maybe do something clever with `logrotate` + something to kill the affected service soon after it detects db corruption? But otherwise yeah sometimes you just need to generate several gigabytes of logs and then sift through to find the useful couple megs.
  3. If it's an sqlite bug, it's certainly not an sqlite _regression_, since it happens on Plex (we ship our own sqlite) and in Docker containers. Here's a question: has anyone seen this on a standalone PMS install, or only on Docker? I haven't run into this issue since updating to the affected versions, and I run PMS directly on the host. I've seen rare cases of sqlite corruption on my system in the past, but they were attributable to the cache/mover bug (which can be ruled out as the cause of this issue, since it reproduces on systems without cache drives, and even when bypassing shfs altogether by using an individual disk md mount). I'm not especially familiar with how Docker volumes are implemented, but has anyone tried using a bind mount instead? Could be interesting to see if that gives different results. Otherwise, I'd suggest having someone who sees this frequently attach strace to an affected app, watching all filesystem I/O syscalls, and see exactly what's going on when the corruption occurs. Just make sure to use `-f` to get all threads and spawned children, and direct the output to somewhere with a lot of free space.
  4. I've just tested on 6.7.3-rc4 and confirmed that I can still reproduce the mover issue I described here: Is it possible that this issue is actually just a manifestation of that one? That would be consistent with what I've seen with that issue in the past. Has anyone seen this issue on a setup with no cache drive, or a share set not to use it (or to use it exclusively)?
  5. Super, from what I've seen that sounds like it should be robust. Thanks for the detail.
  6. Any news on this? Just want to make sure it doesn't slip through the cracks.
  7. Yeah, there's no great way around that (short of mounting the filesystem with mandatory locks enabled and using those, but not a lot of software expects to need to handle that on linux), and the only way the kernel provides to find out if a file's open at a given time is indeed iterating /proc like fuser or lsof does. You could keep doing the fuser check as a best-effort attempt to avoid clobbering writes directly to /mnt/cache (it's not like it's getting _worse_, at least), but still lock within shfs to guarantee no writes are dropped when using /mnt/user.
  8. I'm actually only seeing test1 on the array disk after the test. It looks like the procedure goes like this: move runs in_use to check if the file has open handles having passed, move calls ioctl 0xe008e001 to invoke the move routine in shfs shfs open()s the file on both /mnt/cache and /mnt/disk[…] shfs calls sendfile() to instruct the kernel to copy the file's contents from cache to array with low overhead shfs close()s both files shfs unlink()s the original on /mnt/cache During the period where two copies of the file exist (starting at when it's open()ed on /mnt/disk[…] and ending when it's unlink()ed from /mnt/cache), if you open() the file on /mnt/user, shfs will give you the one on /mnt/cache. This means that any write to a point in the file before the position sendfile() has copied so far (or any position, if the write takes place after sendfile() closes) will be dropped. Important life lesson I learned while researching this: do not pipe `strace -fp [shfs PID]` to a file on the user filesystem. You could probably cut a fair bit of the cost by calling fork()+execve() yourself (using the full path to fuser and losetup, and presumably calling getxattr directly instead of shelling out to getfattr), instead of doing what I assume is a call to system(), which expensively invokes /bin/sh to in turn invoke /bin/bash to run in_use. But yeah, doing actual locking here would solve the issue properly.
  9. No more than ever before, but that's not really saying much. As far as I can tell, this issue has always been around (both with the current C move program and the previous rsync/shell approach).
  10. I noticed that the move program does attempt to avoid moving files that are currently open, by calling the `/usr/local/sbin/in_use` shell script (which in turn calls fuser). This prevents most issues, but misses the case where the file isn't opened for writing until _after_ the check is complete; thus, the race condition.
  11. I wrote a very basic test program to verify this: #include <stdint.h> #include <unistd.h> #include <fcntl.h> int main() { uint32_t val = 0; int fd0 = open("test1", O_RDWR | O_CREAT); write(fd0, &val, sizeof(val)); close(fd0); while (val < 100000) { uint32_t val2 = 0; int fd1 = open("test1", O_RDWR | O_CREAT); int fd2 = open("test2", O_RDWR | O_CREAT); read(fd1, &val2, sizeof(val2)); close(fd1); val++; val2++; fd1 = open("test1", O_RDWR | O_CREAT); write(fd1, &val2, sizeof(val2)); write(fd2, &val, sizeof(val)); close(fd1); close(fd2); } return 0; } On a normal disk, the resulting "test1" and "test2" files will always contain identical 4-byte integers (100000). We can see this with a hex dump: $ cat test1 test2 | hexdump -C 00000000 a0 86 01 00 a0 86 01 00 |........| However, if the move program is run on test1 while the program runs, we can desynchronize: $ echo /mnt/cache/[path]/test1 | move -d 2 $ cat test1 test2 | hexdump -C 00000000 9f 86 01 00 a0 86 01 00 |........| Note that the two files now differ by 1. Losing writes can result in a lot of unexpected behaviors; I think it might be responsible for corruption I've seen in files downloaded by Transmission, as well as in sqlite databases (I saw corruption in my Plex Media Server database last night that appears consistent with lost writes, and happened about the same time as I ran the mover script). I'm not sure what the best solution for this problem is, as I'm not familiar with the internals of the mover program or the shfs ioctl it uses. One route could be to do the copy to a tmp file on the destination drive, then while holding an internal lock on the file as exposed by fuse, verify that it hasn't changed since the copy started, and only then take the place of the source. Alternately, while a file is being moved, shfs could expose it to userspace such that reads come from the source file, but writes go to both the source and the destination.
  12. Currently, shfs gives non-stable (they appear to be incrementing) inode numbers for stat() calls. This results in some problems in applications that assume inode changes indicate file content changes. For instance, nano warns about the file having changed when saving if its inode has changed, and git considers an inode change to be a tree change for some purposes, which can prevent some operations from succeeding. Passing through the inodes from the underlying disk does potentially create a problem of multiple files in /mnt/user sharing the same inode. This could be worked around by returning `underlying_inode * nb_data_drives + drive_idx` (which is potentially problematic if the drive is near the 64-bit limit on inodes, but that seems extremely unlikely, and you'd have the same issue with incrementing-at-read-time inodes anyway).
  13. What testing did you perform to ensure stability? Have you tested every combination of network configuration as well as with each file sharing protocol, VMs, Docker, etc? Also, I think you'd also need to craft a webgui front end control for this as well. Maybe you could take a swing at creating a plugin for this? Point is that it's a low priority item because there is a lot more to it than just enabling the kernel flag and the benefits are pretty minimal right now. Sorry; didn't mean to imply that was all that needed to be done for full support. It would've been more accurate to have said "…adding basic support is as simple as…". However, enabling the kernel build flag and setting the sysctl `net.ipv6.conf.all.disable_ipv6=1`, or providing it as a kernel module that's not loaded by default, would allow users to test a variety of scenarios without causing potential backwards-compatibility problems in the interim.
  14. Sorry? I'm confused. A number of plugins create a user for their services to run under, or run as `nobody`, either by using sudo or su in their startup scripts, or by calling setuid/setgid once started. A compromise to one of those services shouldn't allow an attacker to overwrite the entire boot drive.
  15. I've tested on a custom kernel, and found that indeed adding support is as simple as setting CONFIG_IPV6=y in the kernel config.
  16. Currently, the boot flash drive is mounted world-writable during startup. This creates potential privilege escalation vulnerabilities, since a process running as an individual user could add lines to /boot/config/go, or overwrite the kernel image or initramfs. Could /boot be mounted with perms set to 0755 instead?
  17. +1 to this. Basic support should (in theory) be as simple as enabling IPV6 in the kernel config.
  18. That seems like a good workaround, but could this feature be added in a future release?
  19. Is it possible to replace a disk with a new one of the same size or larger without rebuilding the array? I'd prefer to avoid having the array unprotected for any period, and I've got an older Seagate 2TB with a few bad sectors that I'd like to replace with a 4TB WD Red.
  20. How does driver support for these things help with media servers. Players I get, they render the content. Servers, doesn't make sense to me, but please help explain... VAAPI enables most of H.264 (and some other codecs) decoding and encoding to be done on the GPU, which can be very fast and very power-efficient (at the expense of compression efficiency). Graphics drivers are required to use VAAPI.
  21. DRM and i915 are Intel graphics driver modules, which usually shouldn't be required on UnRAID, but including them allows use of Intel VAAPI hardware video decode/encoding, which is useful if you're using an UnRAID box as a media server. The total size of the extra modules is about 1.3MB, plus some small amount for the other bits of extra code that have to be compiled into the kernel (e.g. DMA shared buffers). I'm currently testing with a kernel I built with these modules enabled and finding it stable and the hardware encoding features functional (on my Haswell i7). Think we could get these kernel configuration changes made in an 6.0 beta 7 or 8?
  22. shfs is a "stackable" file system implemented using FUSE. It's been designed primarily for access via network services such as Samba (SMB) and is not posix compliant, e.g., hard links are not supported. Those kinds of operations are best performed directly to mounted disk devices. I can understand the issue with hard links, and I'm working around it with a symlink to my cache drive for the specific case that needed it, but the mtimes-on-symlinks issue with tar is rather strange, as symlinks seem to be otherwise perfectly functional. "bump" for this issue (though I'm not sure if that makes sense)
  23. shfs is a "stackable" file system implemented using FUSE. It's been designed primarily for access via network services such as Samba (SMB) and is not posix compliant, e.g., hard links are not supported. Those kinds of operations are best performed directly to mounted disk devices. I can understand the issue with hard links, and I'm working around it with a symlink to my cache drive for the specific case that needed it, but the mtimes-on-symlinks issue with tar is rather strange, as symlinks seem to be otherwise perfectly functional.