Rune

Members
  • Posts

    5
  • Joined

  • Last visited

Everything posted by Rune

  1. Disk is dying. Unmount failed. Replaced the SATA cable and booted it back up. Lots of clicking noises, dmesg messages resetting and retrying. Mount failed. I pulled the disk and booted the array degraded.
  2. I have an unraid array of 4 disks + parity, all are individual ZFS volumes. One of the disks just suspended itself: "zfs clear" seems to hang (but does reset the CKSUM counter to zero. Replacing the SATA cable seems like an obvious next step.... but I worry if I stop the array to swap the cable or reboot, if I would be able to start the array again, or will I get stuck into maintenance mode or something if the drive won't mount. I don't have a huge amount of time to deal with it today, and since unraid doesn't see it as "failed", the contents aren't emulated (users just get errors when accessing anything on that disk), and don't want to get stuck having everything offline in maintenance mode overnight.
  3. Did some more testing on this, and there is definitely some write amplification going on here. I transferred everything off of the cache, and marked it as 'No' for all shares, and that dropped the write IO to zero on cache. At that point, there was reasonably constant 50-200kB/s write IO on the array (including parity). That IO only appeared on the disk that I moved the VM disks to. I moved everything except the VMs back, one at a time, and the write IO stayed on the array. Shut down the VMs, the write IO stopped, moved the disks to cache pool, set the 'No COW' flag (chattr +C) on the disks, and when I started up one, the IO was already above the 50-200kB/s that I saw as an average on the array. On cache, it's about 900kB/s-3.8MB/s ...which is about a 18-19x write amplification for being on the cache pool instead of the array.
  4. Ran with a single XFS-encrypted cache disk for a while, then moved to a 3-disk BTRFS-encrypted cache pool. Since that change, I've noticed a constant write IO on the cache disks of about 2-3MB/s (per the Main page). IOStat on the unraid host confirms the IO is happening. No other tool can give me any idea on what is doing the writing (eg: iotop points at shfs). I have 3 VMs with disks on the cache pool (that were there prior to the change). Monitoring on the VMs, iostat says there is maybe 10kB/s average write IO on each. I NFS-mount in a bunch of stuff, but monitoring nfsiostat tells me that there's not much going on there, and mostly read traffic. Since Btrfs is copy-on-write, I checked, and made sure that the VM disks, libvirt, docker, and other random-write stuff is COW-disabled (chattr +C, and verify with lsattr). With no IO coming from SMB clients, an only minor IO coming from VMs and NFS mounts, and COW disabled... how can I tell where all the extra IO is coming from? (I can account for, say, 100kB/s, but I'm seeing 3MB/s (1.5MB/s when you account for btrfs writing redundant blocks).