11rcombs

Members

View Profile See their activity

Posts
44
Joined
June 5, 2012
Last visited
August 3, 2023

Content Type

All Activity

Profiles

Gallery

Bug Reports

Documentation

Topics

Landing

Records

Everything posted by 11rcombs

SQLite Data Corruption testing

11rcombs commented on limetech's report in Prereleases

So in the end, what was the actual bug here, and how did it manifest? I'm mostly wondering if there's anything libsqlite's doing that relies on particular implementation-defined kernel behavior that isn't actually guaranteed, in which case I'd want to report that to the sqlite devs with a description of a repro case.
- November 6, 2019
- 104 comments
SQLite DB Corruption testers needed

11rcombs commented on limetech's report in Stable Releases

>Yes maybe docker related - using a bind mount of what? Of whatever the affected sqlite database is on, just as opposed to using a volume. Again, I don't know how those features are implemented, but there's a chance it might hit on a different code path. >This is going to generate an absolute mountain of data because often you have to run days or weeks before you see corruption. Could maybe do something clever with `logrotate` + something to kill the affected service soon after it detects db corruption? But otherwise yeah sometimes you just need to generate several gigabytes of logs and then sift through to find the useful couple megs.
- October 14, 2019
- 255 comments
SQLite DB Corruption testers needed

11rcombs commented on limetech's report in Stable Releases

If it's an sqlite bug, it's certainly not an sqlite _regression_, since it happens on Plex (we ship our own sqlite) and in Docker containers. Here's a question: has anyone seen this on a standalone PMS install, or only on Docker? I haven't run into this issue since updating to the affected versions, and I run PMS directly on the host. I've seen rare cases of sqlite corruption on my system in the past, but they were attributable to the cache/mover bug (which can be ruled out as the cause of this issue, since it reproduces on systems without cache drives, and even when bypassing shfs altogether by using an individual disk md mount). I'm not especially familiar with how Docker volumes are implemented, but has anyone tried using a bind mount instead? Could be interesting to see if that gives different results. Otherwise, I'd suggest having someone who sees this frequently attach strace to an affected app, watching all filesystem I/O syscalls, and see exactly what's going on when the corruption occurs. Just make sure to use `-f` to get all threads and spawned children, and direct the output to somewhere with a lot of free space.
- October 14, 2019
- 255 comments
SQLite DB Corruption testers needed

11rcombs commented on limetech's report in Stable Releases

I've just tested on 6.7.3-rc4 and confirmed that I can still reproduce the mover issue I described here: Is it possible that this issue is actually just a manifestation of that one? That would be consistent with what I've seen with that issue in the past. Has anyone seen this issue on a setup with no cache drive, or a share set not to use it (or to use it exclusively)?
- September 24, 2019
- 255 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

Super, from what I've seen that sounds like it should be robust. Thanks for the detail.
- October 9, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

Thanks for the update.
- October 9, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

Ping on this.
- October 9, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

Any news on this? Just want to make sure it doesn't slip through the cracks.
- September 16, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

Yeah, there's no great way around that (short of mounting the filesystem with mandatory locks enabled and using those, but not a lot of software expects to need to handle that on linux), and the only way the kernel provides to find out if a file's open at a given time is indeed iterating /proc like fuser or lsof does. You could keep doing the fuser check as a best-effort attempt to avoid clobbering writes directly to /mnt/cache (it's not like it's getting _worse_, at least), but still lock within shfs to guarantee no writes are dropped when using /mnt/user.
- September 1, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

I'm actually only seeing test1 on the array disk after the test. It looks like the procedure goes like this: move runs in_use to check if the file has open handles having passed, move calls ioctl 0xe008e001 to invoke the move routine in shfs shfs open()s the file on both /mnt/cache and /mnt/disk[…] shfs calls sendfile() to instruct the kernel to copy the file's contents from cache to array with low overhead shfs close()s both files shfs unlink()s the original on /mnt/cache During the period where two copies of the file exist (starting at when it's open()ed on /mnt/disk[…] and ending when it's unlink()ed from /mnt/cache), if you open() the file on /mnt/user, shfs will give you the one on /mnt/cache. This means that any write to a point in the file before the position sendfile() has copied so far (or any position, if the write takes place after sendfile() closes) will be dropped. Important life lesson I learned while researching this: do not pipe `strace -fp [shfs PID]` to a file on the user filesystem. You could probably cut a fair bit of the cost by calling fork()+execve() yourself (using the full path to fuser and losetup, and presumably calling getxattr directly instead of shelling out to getfattr), instead of doing what I assume is a call to system(), which expensively invokes /bin/sh to in turn invoke /bin/bash to run in_use. But yeah, doing actual locking here would solve the issue properly.
- September 1, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

No more than ever before, but that's not really saying much. As far as I can tell, this issue has always been around (both with the current C move program and the previous rsync/shell approach).
- August 20, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs commented on 11rcombs's report in Stable Releases

I noticed that the move program does attempt to avoid moving files that are currently open, by calling the `/usr/local/sbin/in_use` shell script (which in turn calls fuser). This prevents most issues, but misses the case where the file isn't opened for writing until _after_ the check is complete; thus, the race condition.
- August 13, 2018
- 19 comments
[6.5.3] Mover appears to be non-atomic and loses writes

11rcombs posted a report in Stable Releases

I wrote a very basic test program to verify this: #include <stdint.h> #include <unistd.h> #include <fcntl.h> int main() { uint32_t val = 0; int fd0 = open("test1", O_RDWR | O_CREAT); write(fd0, &val, sizeof(val)); close(fd0); while (val < 100000) { uint32_t val2 = 0; int fd1 = open("test1", O_RDWR | O_CREAT); int fd2 = open("test2", O_RDWR | O_CREAT); read(fd1, &val2, sizeof(val2)); close(fd1); val++; val2++; fd1 = open("test1", O_RDWR | O_CREAT); write(fd1, &val2, sizeof(val2)); write(fd2, &val, sizeof(val)); close(fd1); close(fd2); } return 0; } On a normal disk, the resulting "test1" and "test2" files will always contain identical 4-byte integers (100000). We can see this with a hex dump: $ cat test1 test2 | hexdump -C 00000000 a0 86 01 00 a0 86 01 00 |........| However, if the move program is run on test1 while the program runs, we can desynchronize: $ echo /mnt/cache/[path]/test1 | move -d 2 $ cat test1 test2 | hexdump -C 00000000 9f 86 01 00 a0 86 01 00 |........| Note that the two files now differ by 1. Losing writes can result in a lot of unexpected behaviors; I think it might be responsible for corruption I've seen in files downloaded by Transmission, as well as in sqlite databases (I saw corruption in my Plex Media Server database last night that appears consistent with lost writes, and happened about the same time as I ran the mover script). I'm not sure what the best solution for this problem is, as I'm not familiar with the internals of the mover program or the shfs ioctl it uses. One route could be to do the copy to a tmp file on the destination drive, then while holding an internal lock on the file as exposed by fuse, verify that it hasn't changed since the copy started, and only then take the place of the source. Alternately, while a file is being moved, shfs could expose it to userspace such that reads come from the source file, but writes go to both the source and the destination.
- August 12, 2018
- 19 comments
- - 2
shfs: passthrough inode numbers from underlying disks

11rcombs posted a topic in Feature Requests

Currently, shfs gives non-stable (they appear to be incrementing) inode numbers for stat() calls. This results in some problems in applications that assume inode changes indicate file content changes. For instance, nano warns about the file having changed when saving if its inode has changed, and git considers an inode change to be a tree change for some purposes, which can prevent some operations from succeeding. Passing through the inodes from the underlying disk does potentially create a problem of multiple files in /mnt/user sharing the same inode. This could be worked around by returning `underlying_inode * nb_data_drives + drive_idx` (which is potentially problematic if the drive is near the 64-bit limit on inodes, but that seems extremely unlikely, and you'd have the same issue with incrementing-at-read-time inodes anyway).
- April 3, 2018
Requesting IPV6 in unRAID kernel.

11rcombs replied to hurricanehrndz's topic in Feature Requests

What testing did you perform to ensure stability? Have you tested every combination of network configuration as well as with each file sharing protocol, VMs, Docker, etc? Also, I think you'd also need to craft a webgui front end control for this as well. Maybe you could take a swing at creating a plugin for this? Point is that it's a low priority item because there is a lot more to it than just enabling the kernel flag and the benefits are pretty minimal right now. Sorry; didn't mean to imply that was all that needed to be done for full support. It would've been more accurate to have said "…adding basic support is as simple as…". However, enabling the kernel build flag and setting the sysctl `net.ipv6.conf.all.disable_ipv6=1`, or providing it as a kernel module that's not loaded by default, would allow users to test a variety of scenarios without causing potential backwards-compatibility problems in the interim.
- September 21, 2015
- 115 replies
Mount /boot non-world-writable

11rcombs replied to 11rcombs's topic in Security

Sorry? I'm confused. A number of plugins create a user for their services to run under, or run as `nobody`, either by using sudo or su in their startup scripts, or by calling setuid/setgid once started. A compromise to one of those services shouldn't allow an attacker to overwrite the entire boot drive.
- September 21, 2015
- 2 replies
Requesting IPV6 in unRAID kernel.

11rcombs replied to hurricanehrndz's topic in Feature Requests

I've tested on a custom kernel, and found that indeed adding support is as simple as setting CONFIG_IPV6=y in the kernel config.
- September 21, 2015
- 115 replies
Mount /boot non-world-writable

11rcombs posted a topic in Security

Currently, the boot flash drive is mounted world-writable during startup. This creates potential privilege escalation vulnerabilities, since a process running as an individual user could add lines to /boot/config/go, or overwrite the kernel image or initramfs. Could /boot be mounted with perms set to 0755 instead?
- September 2, 2015
- 2 replies
Requesting IPV6 in unRAID kernel.

11rcombs replied to hurricanehrndz's topic in Feature Requests

+1 to this. Basic support should (in theory) be as simple as enabling IPV6 in the kernel config.
- July 7, 2015
- 115 replies
Replace disk without rebuild

11rcombs replied to 11rcombs's topic in General Support

That seems like a good workaround, but could this feature be added in a future release?
- February 25, 2015
- 2 replies
Replace disk without rebuild

11rcombs posted a topic in General Support

Is it possible to replace a disk with a new one of the same size or larger without rebuilding the array? I'd prefer to avoid having the array unprotected for any period, and I've got an older Seagate 2TB with a few bad sectors that I'd like to replace with a 4TB WD Red.
- February 25, 2015
- 2 replies
Request: Build kernel with DRM and i915 modules

11rcombs replied to 11rcombs's topic in Feature Requests

How does driver support for these things help with media servers. Players I get, they render the content. Servers, doesn't make sense to me, but please help explain... VAAPI enables most of H.264 (and some other codecs) decoding and encoding to be done on the GPU, which can be very fast and very power-efficient (at the expense of compression efficiency). Graphics drivers are required to use VAAPI.
- August 28, 2014
- 8 replies
Request: Build kernel with DRM and i915 modules

11rcombs posted a topic in Feature Requests

DRM and i915 are Intel graphics driver modules, which usually shouldn't be required on UnRAID, but including them allows use of Intel VAAPI hardware video decode/encoding, which is useful if you're using an UnRAID box as a media server. The total size of the extra modules is about 1.3MB, plus some small amount for the other bits of extra code that have to be compiled into the kernel (e.g. DMA shared buffers). I'm currently testing with a kernel I built with these modules enabled and finding it stable and the hardware encoding features functional (on my Haswell i7). Think we could get these kernel configuration changes made in an 6.0 beta 7 or 8?
- August 21, 2014
- 8 replies
unRAID Server Release 6.0-beta6-x86_64 Available

11rcombs replied to limetech's topic in Announcements

shfs is a "stackable" file system implemented using FUSE. It's been designed primarily for access via network services such as Samba (SMB) and is not posix compliant, e.g., hard links are not supported. Those kinds of operations are best performed directly to mounted disk devices. I can understand the issue with hard links, and I'm working around it with a symlink to my cache drive for the specific case that needed it, but the mtimes-on-symlinks issue with tar is rather strange, as symlinks seem to be otherwise perfectly functional. "bump" for this issue (though I'm not sure if that makes sense)
- August 20, 2014
- 336 replies
unRAID Server Release 6.0-beta6-x86_64 Available

11rcombs replied to limetech's topic in Announcements

shfs is a "stackable" file system implemented using FUSE. It's been designed primarily for access via network services such as Samba (SMB) and is not posix compliant, e.g., hard links are not supported. Those kinds of operations are best performed directly to mounted disk devices. I can understand the issue with hard links, and I'm working around it with a symlink to my cache drive for the specific case that needed it, but the mtimes-on-symlinks issue with tar is rather strange, as symlinks seem to be otherwise perfectly functional.
- August 11, 2014
- 336 replies

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Everything posted by 11rcombs