eeking

Members
  • Posts

    55
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

eeking's Achievements

Rookie

Rookie (2/14)

1

Reputation

  1. I just want to say I don't think you're crazy. What you've outlined seems to make perfect sense to me as well. Successful recovery would have to assume that no bitrot had happened on other discs in the same blocks, or that parity hadn't been updated during a parity check while the bitrot was active. In fact, if there was a system like you mentioned implemented, it would be a good idea for the parity check to consult block checksums on the data disks before deciding to update parity when a difference is detected. The problem that I think others were trying to get at is that, currently, the parity part of unraid software is "dumb" and doesn't know anything about filesystems. Seems to me like the recovery method you've described could be implemented as an add-on/plugin just by doing raw reads of the parity and data drives once you've consulted the filesystem and partition info to get the file's actual location on the target disk. And it seems like it would be safest to write out a new copy of the file and delete the old one rather than try to directly correct the flipped bits on disk. The problem is this would require precise knowledge of the structure of the xfs/zfs/btrfs filesystems to isolate just the data and not the metadata. If you tried to just correct the data bits in place, the parity system would flip the bits on the parity drive and parity would be out of sync with the corrected data. (if understand correctly, when writing to a disk, the parity system checks the current value on the data disk - which would have been flipped due to rot - and compares it to the new value. It then flips the bit on parity if the value changed on the data disk. That's how it avoids needing to spin up all disks when you write to just one.)
  2. I bought a pre-built system from Limetech back in 2007, the Lian-Li PC-A16 case with 15 hot-swap bays in the front. The system has been in storage for roughly the past 3 years through a move, and today when I decided to get it out and power it up I ran into some issues. Maybe power supply, I'm not certain. After being on for about an hour working on a parity check it just abruptly shut off. Now it generally will not power on at all, though it may come on for a few seconds before shutting down again - sometimes basically immediately, sometimes getting all the way to the USB's unraid boot menu before shutting off again. Given the age of the system, it's probably worth rebuilding rather than just fixing whatever this isolated issue may be. But is anything worth saving? The case and SATA drive bays? I have experience building PCs, but nothing quite like this. And life has happened so it's been a good while since I've built anything and I'm not up to speed on current hardware trends. I'm attaching the "system" folder from an old diagnostics run on the USB, since I can't access the live system to pull info about the hardware. It looks like two Promise PDC40718 PCI cards provided four SATA ports each - those are probably worth replacing with something modern - and the motherboard provided the other 7. There's nothing else particularly special about the system. I only ever really used it for media storage. I'll probably want to run Plex or some DLNA server docker once I get it running again, primarily for local use. TIA for any input! system.zip
  3. Unraid 6.6.7 Currently having some problems getting docker to start. I thought maybe the image file was corrupt or my cache drive had problems, so I've reformatted the cache and recreated the docker image file. Still the webui reports "Docker Service failed to start" and the docker log looks like this: time="2019-09-18T23:13:15.560042404-04:00" level=info msg="libcontainerd: started new docker-containerd process" pid=10204 time="2019-09-18T23:13:15.560204769-04:00" level=info msg="parsed scheme: \"unix\"" module=grpc time="2019-09-18T23:13:15.560228333-04:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc time="2019-09-18T23:13:15.560334583-04:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc time="2019-09-18T23:13:15.560374678-04:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc time="2019-09-18T23:13:15.560470587-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:15-04:00" level=info msg="starting containerd" revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:35.572502759-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:35.572634843-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:35.572890270-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:55.573014687-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:55.573120481-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:55.573354144-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:14:15.573428114-04:00" level=warning msg="Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry." module=grpc time="2019-09-18T23:14:30.562752369-04:00" level=warning msg="daemon didn't stop within 15 secs, killing it" module=libcontainerd pid=10204 Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": context deadline exceeded Any thoughts?
  4. Unraid 6.6.7 Currently having some problems getting docker to start. I thought maybe the image file was corrupt or my cache drive had problems, so I've reformatted the cache and recreated the docker image file. Still the webui reports "Docker Service failed to start" and the docker log looks like this: time="2019-09-18T23:13:15.560042404-04:00" level=info msg="libcontainerd: started new docker-containerd process" pid=10204 time="2019-09-18T23:13:15.560204769-04:00" level=info msg="parsed scheme: \"unix\"" module=grpc time="2019-09-18T23:13:15.560228333-04:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc time="2019-09-18T23:13:15.560334583-04:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc time="2019-09-18T23:13:15.560374678-04:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc time="2019-09-18T23:13:15.560470587-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:15-04:00" level=info msg="starting containerd" revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:35.572502759-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:35.572634843-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:35.572890270-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:55.573014687-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:55.573120481-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:55.573354144-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:14:15.573428114-04:00" level=warning msg="Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry." module=grpc time="2019-09-18T23:14:30.562752369-04:00" level=warning msg="daemon didn't stop within 15 secs, killing it" module=libcontainerd pid=10204 Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": context deadline exceeded Any thoughts?
  5. Apologies yes, completely unresponsive from webui, ssh, or even a keyboard plugged into the machine when this kernel panic happens. I expect it's related to kernel changes in the most recent versions of unraid, as this issue started recently after upgrading. No other hardware in the system changed.
  6. Running 6.1.3. Nothing odd going on except that the server seems to hangs within a day of bootup. Nothing interesting seems to be captured in syslog, except possibly: Oct 19 19:32:23 Tower ntpd[1408]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized but the log is attached anyhow. Camera snapshot of the panic attached, as that's the only way I've managed to capture any portion of it so far. EDIT: I'm currently running with the "noapic" kernel option to see if that resolves the issue. So far the time error doesn't appear in the log. syslog.txt
  7. Looks like maybe my issue is different. I'm seeing a kernel panic after connecting a monitor. Last log entry from a shell was the mover script ending again. Gonna look into how to capture the panic for analysis.
  8. I've been having the same problem lately. I come home from work to find the machine completely unresponsive remotely, can't connect to webui or ssh. I'd chalked it up to failing hardware on an old hdd (because after one of the lockups the webui reported it as missing), but I continue to have the same issue even after replacing and removing that drive (which has no faults reported by smart and matches all file checksums with the reconstructed replacement). The only thing I run besides vanilla unraid are deluge and crashplan dockers. Haven't even been running the crashplan one since this issue started. The last couple times, I left a ssh connection open from my PC with "tail -f /var/log/syslog" to try capturing what happened but nothing interesting showed up there. One time the last entry was the mover script ending (after doing nothing) and the next time the last entry was a spindown. I guess the next step is to hook a monitor up to it, I don't believe AMT isn't an option for me. This seems to have started after a recent upgrade in unraid versions. I'm not sure if it started with 6.1.2 or 6.1.3, I'm pretty sure I skipped some versions somewhere. But currently I'm experiencing this with 6.1.3. May try a downgrade if I can't capture anything with a monitor connected to the machine.
  9. I'm using the execute plugin to run a script to fix permissions after each torrent finishes. Something like this, though my script-fu is weak and there might yet be something wrong with it. #!/bin/bash torrentid="$1" torrentname="$2" torrentpath="$3" chmod -R u-x,go-rwx,go+u,ugo+X "$torrentpath/$torrentname" I put the script in the deluge config directory, made it executable, then told the execute plugin to run it when it torrent completes.
  10. In my own adventure to get YaRSS2 working, I found that the UI for it would only show up on the Linux GTK client. The Windows clients (and I tried several versions) would show UI for other plugins, but not for YaRSS.
  11. I'm sure I don't have a definitive answer, but I'll give a personal example. The deluge dockerization linked in the sticky indicates bridge with only two ports exported. I wanted to enable randomization of ports in deluge, using UPnP on my router to get them forwarded through NAT. Due to the randomization, it would be impossible to know ahead of time which ports to set up in the container, so I instead chose host mode for the network to enable any port.
  12. Ah... that makes sense then. I thought it was bit-for-bit parity at the device level, but it's actually at partition level. Thanks for the info!
  13. I guess I don't understand - the responses above suggested I rebuild from parity to replace the drive, rather than take my roundabout route. Won't that overwrite the partition table on the new drive, no matter what I do with pre-clear?
  14. Thanks for the quick responses. The new drive is an advanced format drive, while the failing one is not and is not 4K aligned. That was one reason I was considering the roundabout route. Suggestions for a good way to correct this after the rebuild? I'm familiar with moving partitions around with gparted, but never in this environment.
  15. I found this old topic, but comments there indicate it doesn't work for recent 5.0 RCs. Conceptually it would seem to work regardless of the unraid version, but the specifics of adjusting and trusting the array to remove the drive may have changed. I've got an old 500GB drive that appears to be failing. I'll be replacing it with a 1TB drive, so my plan is to: 1. Parity check existing array 2. Preclear the 1TB drive. 3. Add the 1TB drive 4. Copy contents of failing 500GB drive to 1TB drive 5. Zero the failing drive 6. Remove the failing drive Of course the specifics of step 6 are the key to maintaining parity. I've currently shut down my server entirely until the replacement drive arrives, so I don't have the webui up to examine what buttons are available there. Seems like there was a"trust array" option that could be used, followed by a parity *check*. Suggestions?