Spitko

Can you back up one zfs pool with another zfs pool on the same unraid server?

Spitko replied to MJFlanigan's topic in General Support

If you're going zfs -> zfs, look at znapzend. It allows you to both automate your main pool's snapshots, but also send additional snapshots to a remote pool. This lets you have all the benefits of zfs snapshotting, but in the remote pool as well. The setup has no GUI, but it's not too hard to get going.

Access to one zfs pool spinning up another

Spitko replied to Spitko's topic in General Support

Going to tentatively mark this as fixed, I did some jiggerypokery with my zfs backups so they don't share pool names. I think this exposes how fragile the exclusive access system is though, and there should probably be a way to force it on, especially now that you can have lots of tiny little purpose built pools. I don't want to worry that one of them might have a folder that shares a name with an unraid share, both for performance reasons and because suddenly having FUSE start smudging files together is a bug factory

Access to one zfs pool spinning up another

Spitko posted a topic in General Support

This is a weird situation I've been tracking for a while, but I've now caught it twice now so I'm convinced it's not a fluke. My setup: Pool A: A pure NVME pool with 4x4tb drives in raid0. This has one unraid share on it. The share is configured with the zfs pool as the primary storage, and the secondary storage set to none. I use this share quite a bit for various tasks. Pool B: 2x16tb spinning rust. This pool contains no unraid shares, and is not exposed to any docker containers. znapzend sends shapshots to it at midnight every night. I've confirmed this behavior with the znapzend log file, and by looking at snapshot data in zfs. One would expect pool B to only spin up once a night, at midnight, for the snapshots to run. However, access to pool A will sometimes spin pool B up. Normally I just randomly notice it spun up, but I've caught it doing it live twice now, doing the same specific thing. One of the jobs that writes to this disk occasionally produces invalid filenames (containing | characters). So every now and then I clean it up by opening the terminal, and running this exact sequence of commands: root@jibril:~# cd /mnt/user/fast/hf/v3/ root@jibril:/mnt/user/fast/hf/v3# find -name '*|*' -print0 | sed -ze "p;s/|/-/g" | xargs -0 -n2 mv the "fast" mount here is the share on pool A. Yet somehow this will sometimes spin up both disks on share B. It does not spin up array disks. I can't find a rational explanation for this, and can only assume at this point that there's some underlying ZFS tomfoolery afoot. To get in front of the common problem: No, I'm not running zfs master, or any other ZFS related plugins other than znapzend. I had suspected it might somehow be at fault so I killed the daemon before running the test the last time, and it still spin up pool B. I was about to post diagnostics but ended up finding the root of the problem in the process: Unraid has somehow determined that the backip pool contains files that the primary pool also contains, and has linked them. This is... extremely non-ideal. How do I prevent unraid from doing this? The share is set up with pool A as the primary storage with no secondary, so I don't understand what logic unraid is using to snoop pool B and make this determination.

[support] digiblur's Docker Template Repository

Spitko replied to digiblur's topic in Docker Containers

mdns info won't cross through bridge mode even with forwarded ports, at least in my testing. You can still grab logs and flash it, but they always show offline and won't show for adoption (Unless you use the ping mode, which also prevents adoption discovery). There's a PR now to accept "false" in this case but it hasn't been accepted yet.

[support] digiblur's Docker Template Repository

Spitko replied to digiblur's topic in Docker Containers

The ESPHome docker template has `ESPHOME_DASHBOARD_USE_PING` defined by default. This should really be removed, as *any value* counts as true here; meaning everyone has this flag forced on unless they delete the entry from their install. The default network mode also needs to be host.

Bypass shfs on cache-only smb shares

Spitko posted a topic in Completed

Currently, even if a share is cache only, exporting it to smb will go through /mnt/user and therefore shfs. shfs has some fairly well documented perf limitations in certain applications. I have a narrow case where this was problematic, so I build a ZFS pool specifically for these files, and it's been working great. However, I had to jump through some hoops to get there and it seems like something unraid could streamline relatively easily. The proposal would be to simply bypass shfs if the share is targeting volumes which don't need it. In my use case, I'm targeting a ZFS pool, though ideally it would either detect all cases where shfs isn't in play, or simply give the user a checkbox to control this behavior. This lets users more easily establish high performance pools for the narrow cases where they're needed, while still taking advantage of shfs for the majority use case. Currently, I've found workarounds via manually adding a share to the smb extras conf, but it shouldn't be necessary to do it this way.

Unraid OS version 6.12.0-rc2 available

Spitko commented on limetech's report in Prereleases

I hit an interesting bug, I've not had a chance to re-test it but it should be easy to repro. 1) Create a ZFS volume 2) Create a share for it. Set the share to Cache: Only. 3) Rename the ZFS volume (Done in the disk manager) The share is now broken. You can still read from it just fine, but it will report 0 bytes free and new files can't be copied to it. The workaround is easy; just edit the share and re-assign the zfs volume to it, but Unraid's UI should probably handle this automatically.

Dynamix - V6 Plugins

Spitko replied to bonienl's topic in Plugin Support

Some notes on getting a corsair commander pro working. I needed to go this route since my Asus mobo didn't offer any PWM control out of the box. 1) The fan control plugin does not properly detect it; you'll get errors when you try to get the fan list. You can work around this though by manually specifying it. For example, in my case I have PWM controller set to corsaircpro - pwm1 (this is detected properly) and the PWM fan manually set to /sys/class/hwmon/hwmon2/fan1_input. Change the fan and hwmon numbers to match your device. 2) Trying to specify a fan from the cpro in the system temp program completely breaks the interface and indeed the entire sensors command. To get it working again, modify sensors.conf and change corsairpro-hid-3-2 (or similar) to just be corsairpro-*. Ref this github issue: https://github.com/lm-sensors/lm-sensors/issues/369 As a note, this needs to be re-applied every reboot; I presume there's a way to fix this properly but unsure.

IcyDock ToughArmor 6x1 Cage Review

Spitko replied to Whaler_99's topic in Hardware

As a datapoint for future generations, I use the 8x1 version of this dock ( MB998IP-B ). The build quality and physical design is identical, but there are some minor differences in the back: - Two MiniSAS HD plugs instead of SATA. - Dual fans - SATA power plugs instead of molex. I only use it for SATA SSDs, but haven't had any issues with heat in light duty use. It's worth noting that the bays here are NOT tall enough to fit most larger 2.5" HDDs, at least my 1TB doesn't fit. I believe they officially specify a 7mm limit, but I'd have to double check. Durability has been great through a bunch of swaps, and the fans are quiet enough that I can't hear them over the other fans in my system. The bay latches are all metal and haven't shown any signs of wear. Honestly my only complaint is that it's too expensive, but there's just not a lot of competition.

10g network card somehow causing array failures?

Spitko replied to Spitko's topic in Hardware

I moved the 10g card to the 4x link on the chipset, and that appears to have solved the issue. Hopefully that helps someone else should they stumble across this thread. x399 has a single dedicated 4x outbound link so odds are if your motherboard has exactly one 4x slot just hanging around, it's probably that one. In the case of the ROG Zenith Extreme, it's slot 3.

10g network card somehow causing array failures?

Spitko replied to Spitko's topic in Hardware

I was afraid of that. However this still leaves me with the crux of the problem: How can I test the system with the NIC in, without dropping more disks on the array?

10g network card somehow causing array failures?

Spitko posted a topic in Hardware

Ok this one's spooky. I'm in the middle of a network upgrade, and one the tasks was to pop in a 10g network card to the unraid box as I slowly move towards multi-gig. This went mostly fine; I pulled the slot 1 GPU out and popped the ROG Areion my board came with in its place. I don't really use unraid in GUI mode so this is (mostly) fine. (I'll deal with the GPU passthrough issues later). About an hour after installation, one of my drives had a few read errors and went into emulated mode. Well.. that sucks, but wasn't TOO surprising, given it was the oldest drive in the array, and a holdover from initializing the box. A bit early for an Ironwolf but still under warranty. Except... no errors? All smart tests passed, no bad sectors... every scan I could throw at it passed. I couldn't find (read: Forgot to order) a cold spare and it was going to take 2 weeks for the one I ordered to arrive, so I figured might as well rebuild it and see what happens. Rebuild goes fine. No issues. Huh. The next day I do a reboot while chasing down the exciting new GPU passthrough issues and suddenly a *different* is disabled. But... the logs don't in any way indicate why. Even the alert just says "disabled", no errors logged. All tests green as before. Diagnostics from immediately after I got the alert are attached, but I don't even see the disable event in the log, so maybe it happened during shutdown and I didn't get the alert until reboot? At this point I'm fairly convinced somehow this NIC is causing the errors, but I don't have any mental model for *how*. The NIC itself was working fine and the second drive is now rebuilding without any incident. There are no weird errors in the log I've pulled the card for now because it's clearly dangerous, but this leaves me with two core problems: 1) I don't know how to validate the system stability with the card in 2) I don't know how to actually test the card without constantly putting the array into rebuild mode. How should/can I proceed from here? I'd like to eventually have the card in service, but the current behavior just isn't tenable.

Unraid Server or client SMBD issue in syslog

Spitko replied to bsim's topic in General Support

I see this as well on 6.9.2. Not sure if it's new, I was inspecting the logs after switching to a 10g network card and noticed it.

Spitko started following Windows 10 VM randomly hangs under heavy load April 6, 2021

Windows 10 VM randomly hangs under heavy load

Spitko posted a topic in VM Engine (KVM)

This usually happens while compiling code. The behavior is as follows: - First, the compile will hang at a certain spot. The system is still responsive, but CL processes just spin forever - In task manager, disk usage for the C drive is now stuck at 100%, though actual IO is generally fairly low at this point. - Around this time, Windows will start complaining in the event log: "Reset to device, \Device\RaidPort2, was issued.". This happens frequently. - Eventually, Visual Studio itself hangs, and the system continues to become less and less responsive until it requires manual restart. You can't kill the stuck CL processes, so something's likely hung deep in the driver. The VM has three disks: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <source file='/mnt/user/vms/Windows 10/vdisk1.img' index='2'/> <backingStore/> <target dev='hdc' bus='scsi'/> <boot order='1'/> <alias name='scsi0-0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <source dev='/dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S3Z8NB0M305963H'/> <target dev='hdd' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </hostdev> The compile is happening on the NVME drive that's passed through at the bottom, but the error points to one of the above drives. I would suspect the first entry (the OS is installed on this one) given the error and cause, as the middle drive is entirely idle. Ideas? For now I've copied the image to a raw NVMe device which appears to work around the problem, but this is obviously less than ideal from a scaling perspective. As a starting point, I ran memtest overnight and it came back clear. Hardware: AMD Threadripper 1950x Asus ROG Zenith Extreme LSI Logic SAS 9207-8i Nothing in the unraid logs (VM or system) correspond to the event.

April 6, 2021

Dynamix - V6 Plugins

Spitko replied to bonienl's topic in Plugin Support

Thirding this, my sensors suddenly all vanished (Beta 25 if that helps). Manually editing sensors.conf and removing an extra dash from the jc42 sensor fixed this for me after my sensors all dropped off the face of the earth suddenly. Upon further sleuthing, this doesn't ACTUALLY fix the issue though, the sensors command whines and the labels don't work properly if you have overlaps (ie, "temp1" isn't properly pinned to jc42). It seems the "correct" fix is to add a bus statement before the chip, eg chip "k10temp-pci-00c3" label "temp2" "CPU Temp" bus "i2c-0" "SMBus adapter" chip "jc42-i2c-0-19" label "temp1" "MB Temp" fixes the error, though I'm still having trouble getting the label command to work properly, time to hit the man pages I guess. HOWEVER, it's worth noting that jc42 are smbus memory sensors, so this has been mostly a goose chase; mobo temp can't yet be read as we're still missing a driver in unraid for the ITE IT8665E That said, selecting jc42 forever breaks the plugin since it won't remove the line from sensors.conf, and one selected it writes a bad line that breaks the sensors command. The current fix is to remove the line manually from sensors.conf, the plugin should either handle smbus sensors properly or, as a hotfix, just blacklist jc42 and handle the failure mode better.

Posts

Joined

Last visited

Recent Profile Visitors

Spitko's Achievements

Noob (1/14)

Reputation

Community Answers

Can you back up one zfs pool with another zfs pool on the same unraid server?

Access to one zfs pool spinning up another

Access to one zfs pool spinning up another

[support] digiblur's Docker Template Repository

[support] digiblur's Docker Template Repository

Bypass shfs on cache-only smb shares

Unraid OS version 6.12.0-rc2 available

Dynamix - V6 Plugins

IcyDock ToughArmor 6x1 Cage Review

10g network card somehow causing array failures?

10g network card somehow causing array failures?

10g network card somehow causing array failures?

Unraid Server or client SMBD issue in syslog

Windows 10 VM randomly hangs under heavy load

Dynamix - V6 Plugins