eeking

Members
  • Posts

    55
  • Joined

  • Last visited

Everything posted by eeking

  1. I just want to say I don't think you're crazy. What you've outlined seems to make perfect sense to me as well. Successful recovery would have to assume that no bitrot had happened on other discs in the same blocks, or that parity hadn't been updated during a parity check while the bitrot was active. In fact, if there was a system like you mentioned implemented, it would be a good idea for the parity check to consult block checksums on the data disks before deciding to update parity when a difference is detected. The problem that I think others were trying to get at is that, currently, the parity part of unraid software is "dumb" and doesn't know anything about filesystems. Seems to me like the recovery method you've described could be implemented as an add-on/plugin just by doing raw reads of the parity and data drives once you've consulted the filesystem and partition info to get the file's actual location on the target disk. And it seems like it would be safest to write out a new copy of the file and delete the old one rather than try to directly correct the flipped bits on disk. The problem is this would require precise knowledge of the structure of the xfs/zfs/btrfs filesystems to isolate just the data and not the metadata. If you tried to just correct the data bits in place, the parity system would flip the bits on the parity drive and parity would be out of sync with the corrected data. (if understand correctly, when writing to a disk, the parity system checks the current value on the data disk - which would have been flipped due to rot - and compares it to the new value. It then flips the bit on parity if the value changed on the data disk. That's how it avoids needing to spin up all disks when you write to just one.)
  2. I bought a pre-built system from Limetech back in 2007, the Lian-Li PC-A16 case with 15 hot-swap bays in the front. The system has been in storage for roughly the past 3 years through a move, and today when I decided to get it out and power it up I ran into some issues. Maybe power supply, I'm not certain. After being on for about an hour working on a parity check it just abruptly shut off. Now it generally will not power on at all, though it may come on for a few seconds before shutting down again - sometimes basically immediately, sometimes getting all the way to the USB's unraid boot menu before shutting off again. Given the age of the system, it's probably worth rebuilding rather than just fixing whatever this isolated issue may be. But is anything worth saving? The case and SATA drive bays? I have experience building PCs, but nothing quite like this. And life has happened so it's been a good while since I've built anything and I'm not up to speed on current hardware trends. I'm attaching the "system" folder from an old diagnostics run on the USB, since I can't access the live system to pull info about the hardware. It looks like two Promise PDC40718 PCI cards provided four SATA ports each - those are probably worth replacing with something modern - and the motherboard provided the other 7. There's nothing else particularly special about the system. I only ever really used it for media storage. I'll probably want to run Plex or some DLNA server docker once I get it running again, primarily for local use. TIA for any input! system.zip
  3. Unraid 6.6.7 Currently having some problems getting docker to start. I thought maybe the image file was corrupt or my cache drive had problems, so I've reformatted the cache and recreated the docker image file. Still the webui reports "Docker Service failed to start" and the docker log looks like this: time="2019-09-18T23:13:15.560042404-04:00" level=info msg="libcontainerd: started new docker-containerd process" pid=10204 time="2019-09-18T23:13:15.560204769-04:00" level=info msg="parsed scheme: \"unix\"" module=grpc time="2019-09-18T23:13:15.560228333-04:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc time="2019-09-18T23:13:15.560334583-04:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc time="2019-09-18T23:13:15.560374678-04:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc time="2019-09-18T23:13:15.560470587-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:15-04:00" level=info msg="starting containerd" revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:35.572502759-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:35.572634843-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:35.572890270-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:55.573014687-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:55.573120481-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:55.573354144-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:14:15.573428114-04:00" level=warning msg="Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry." module=grpc time="2019-09-18T23:14:30.562752369-04:00" level=warning msg="daemon didn't stop within 15 secs, killing it" module=libcontainerd pid=10204 Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": context deadline exceeded Any thoughts?
  4. Unraid 6.6.7 Currently having some problems getting docker to start. I thought maybe the image file was corrupt or my cache drive had problems, so I've reformatted the cache and recreated the docker image file. Still the webui reports "Docker Service failed to start" and the docker log looks like this: time="2019-09-18T23:13:15.560042404-04:00" level=info msg="libcontainerd: started new docker-containerd process" pid=10204 time="2019-09-18T23:13:15.560204769-04:00" level=info msg="parsed scheme: \"unix\"" module=grpc time="2019-09-18T23:13:15.560228333-04:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc time="2019-09-18T23:13:15.560334583-04:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}]" module=grpc time="2019-09-18T23:13:15.560374678-04:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc time="2019-09-18T23:13:15.560470587-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:15-04:00" level=info msg="starting containerd" revision=468a545b9edcd5932818eb9de8e72413e616e86e version=v1.1.2 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.content.v1.content"..." type=io.containerd.content.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.btrfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.aufs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.aufs" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.native"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.overlayfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.snapshotter.v1.zfs"..." type=io.containerd.snapshotter.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.zfs" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:15-04:00" level=info msg="loading plugin "io.containerd.metadata.v1.bolt"..." type=io.containerd.metadata.v1 time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter aufs in metadata plugin" error="modprobe aufs failed: "modprobe: FATAL: Module aufs not found in directory /lib/modules/4.18.20-unRAID\n": exit status 1" time="2019-09-18T23:13:15-04:00" level=warning msg="could not use snapshotter zfs in metadata plugin" error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter" time="2019-09-18T23:13:35.572502759-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:35.572634843-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:35.572890270-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:13:55.573014687-04:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/docker-containerd.sock 0 <nil>}. Err :connection error: desc = \"transport: error while dialing: dial unix:///var/run/docker/containerd/docker-containerd.sock: timeout\". Reconnecting..." module=grpc time="2019-09-18T23:13:55.573120481-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, TRANSIENT_FAILURE" module=grpc time="2019-09-18T23:13:55.573354144-04:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420184a20, CONNECTING" module=grpc time="2019-09-18T23:14:15.573428114-04:00" level=warning msg="Failed to dial unix:///var/run/docker/containerd/docker-containerd.sock: grpc: the connection is closing; please retry." module=grpc time="2019-09-18T23:14:30.562752369-04:00" level=warning msg="daemon didn't stop within 15 secs, killing it" module=libcontainerd pid=10204 Failed to connect to containerd: failed to dial "/var/run/docker/containerd/docker-containerd.sock": context deadline exceeded Any thoughts?
  5. Apologies yes, completely unresponsive from webui, ssh, or even a keyboard plugged into the machine when this kernel panic happens. I expect it's related to kernel changes in the most recent versions of unraid, as this issue started recently after upgrading. No other hardware in the system changed.
  6. Running 6.1.3. Nothing odd going on except that the server seems to hangs within a day of bootup. Nothing interesting seems to be captured in syslog, except possibly: Oct 19 19:32:23 Tower ntpd[1408]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized but the log is attached anyhow. Camera snapshot of the panic attached, as that's the only way I've managed to capture any portion of it so far. EDIT: I'm currently running with the "noapic" kernel option to see if that resolves the issue. So far the time error doesn't appear in the log. syslog.txt
  7. Looks like maybe my issue is different. I'm seeing a kernel panic after connecting a monitor. Last log entry from a shell was the mover script ending again. Gonna look into how to capture the panic for analysis.
  8. I've been having the same problem lately. I come home from work to find the machine completely unresponsive remotely, can't connect to webui or ssh. I'd chalked it up to failing hardware on an old hdd (because after one of the lockups the webui reported it as missing), but I continue to have the same issue even after replacing and removing that drive (which has no faults reported by smart and matches all file checksums with the reconstructed replacement). The only thing I run besides vanilla unraid are deluge and crashplan dockers. Haven't even been running the crashplan one since this issue started. The last couple times, I left a ssh connection open from my PC with "tail -f /var/log/syslog" to try capturing what happened but nothing interesting showed up there. One time the last entry was the mover script ending (after doing nothing) and the next time the last entry was a spindown. I guess the next step is to hook a monitor up to it, I don't believe AMT isn't an option for me. This seems to have started after a recent upgrade in unraid versions. I'm not sure if it started with 6.1.2 or 6.1.3, I'm pretty sure I skipped some versions somewhere. But currently I'm experiencing this with 6.1.3. May try a downgrade if I can't capture anything with a monitor connected to the machine.
  9. I'm using the execute plugin to run a script to fix permissions after each torrent finishes. Something like this, though my script-fu is weak and there might yet be something wrong with it. #!/bin/bash torrentid="$1" torrentname="$2" torrentpath="$3" chmod -R u-x,go-rwx,go+u,ugo+X "$torrentpath/$torrentname" I put the script in the deluge config directory, made it executable, then told the execute plugin to run it when it torrent completes.
  10. In my own adventure to get YaRSS2 working, I found that the UI for it would only show up on the Linux GTK client. The Windows clients (and I tried several versions) would show UI for other plugins, but not for YaRSS.
  11. I'm sure I don't have a definitive answer, but I'll give a personal example. The deluge dockerization linked in the sticky indicates bridge with only two ports exported. I wanted to enable randomization of ports in deluge, using UPnP on my router to get them forwarded through NAT. Due to the randomization, it would be impossible to know ahead of time which ports to set up in the container, so I instead chose host mode for the network to enable any port.
  12. Ah... that makes sense then. I thought it was bit-for-bit parity at the device level, but it's actually at partition level. Thanks for the info!
  13. I guess I don't understand - the responses above suggested I rebuild from parity to replace the drive, rather than take my roundabout route. Won't that overwrite the partition table on the new drive, no matter what I do with pre-clear?
  14. Thanks for the quick responses. The new drive is an advanced format drive, while the failing one is not and is not 4K aligned. That was one reason I was considering the roundabout route. Suggestions for a good way to correct this after the rebuild? I'm familiar with moving partitions around with gparted, but never in this environment.
  15. I found this old topic, but comments there indicate it doesn't work for recent 5.0 RCs. Conceptually it would seem to work regardless of the unraid version, but the specifics of adjusting and trusting the array to remove the drive may have changed. I've got an old 500GB drive that appears to be failing. I'll be replacing it with a 1TB drive, so my plan is to: 1. Parity check existing array 2. Preclear the 1TB drive. 3. Add the 1TB drive 4. Copy contents of failing 500GB drive to 1TB drive 5. Zero the failing drive 6. Remove the failing drive Of course the specifics of step 6 are the key to maintaining parity. I've currently shut down my server entirely until the replacement drive arrives, so I don't have the webui up to examine what buttons are available there. Seems like there was a"trust array" option that could be used, followed by a parity *check*. Suggestions?
  16. I don't have it automated, but my process is to move completed torrents that I wish to keep (as opposed to delete after use), and keep seeding, to a "torrents" directory on the root of cache. I then tell transmission in the GUI (using the 5.0-rc transmission plugin with updated web gui) the torrent is now in /mnt/user/torrents. This user share is not exported via SMB or NFS, but it allows me to move a file to any drive and seed it from the same path. Then the mover script takes care of moving the files out to the array drives. After that is done, I create hard links with "cp -rl" to create the shared/media library names for my files. Symbolic links would probably work just as well as hard links. This allows me to continue seeding with only one stored copy of the file but use different filenames for adding to XBMC. Also a good way to keep torrent nfo files and sample files out of the media library, as those hard links can be deleted while retaining the original file. This may not be the best setup, but it works for me.
  17. I had just upgraded and didn't have any extras configured to captured the syslog in case of a crash. Will try to reproduce it this week, for now I must sleep, and parity check is running right now anyway. As for hardware, I'm still running the MD-1500/LL I bought from Limetech six(!!!) years ago. I have upgraded the RAM to 2GB since that time but otherwise haven't done much with it except add drives. About time to replace some fans though. Getting pretty loud.
  18. This is my first time trying out one of the 5.0-rcs. Upgrade from 4.7 went smoothly, no apparent issues at this time. Though I believe I experienced the same crash/hang when trying to stop the array that others have experienced.
  19. I think there's an error in the transctl script? There's no way to start transmission with it currently: case "$1" in start) # Starts the Transmission deamon if [ -f $PIDFILE ]; then echo "Starting Transmission deamon" rm -f $PIDFILE ... else echo "Transmission is already running" fi Seems you can only start it if a PIDFILE already exists, which is the test of whether it's running or not...but then I haven't done a lot of shell scripting so I might be reading it wrong.
  20. It's been an interesting week. First my cache drive started to fail, so I replaced it. Then, as a result of what I was doing there to troubleshoot and replace it, and some inexplicable lock ups, I noted some odd errors in the syslog. The root cause of that issue seems to have been that when I added RAM several months ago, I failed to notice that it was a different clock / timing than the RAM already present. Memtest gave errors with all the RAM present, but isolating each stick yielded no errors. The end results was that I removed the 2x512MB super talent 533MHz DDR2 sticks and installed only the 2x1GB Patriot 800MHz DDR2 sticks instead of all of them together. Memtest then yield no errors. Since I was doing all that work, I went ahead and installed the less power hungry Celeron 440 processor that's been sitting around in its box for a few years because I was too timid to mess with any of the hardware Limetech had shipped to me. It's FSB speed matched the upgrade RAM, so it made sense. Anyway, so now everything seems to run fine. I can set up everything properly in the BIOS, boot into unraid, and have none of the mysterious errors in the syslog. *But* if the system loses power (say, when I try to move it back into the server closet) I get bad CMOS checksum and "overclocking failed" errors. I'm not doing any overclocking. The obvious answer is that the battery is dead, but I've tried 4 different replacements, old and new, all with the same results. So long as power is maintained and the green LED on the motherboard remains lit, I can shutdown or reboot and all is well - but lose power and the CMOS loses it's settings. My only guess is that I stirred up some dust during all that work and it's lodged somewhere shorting the CMOS, but I've used a lot of canned air to no avail. Updated BIOS to latest version as well. I found this hit of someone having the same problem 3 years ago, but no solution. http://www.tomshardware.com/forum/216195-30-cmos-checksum-help Any suggestions?
  21. I've been using UnTorrent for a while now, thanks! It works great. I'd like to have the "Data Directory" plugin added: http://code.google.com/p/rutorrent/wiki/PluginDataDir
  22. That turned out to be the case for me, after seeing the same errors. For whatever reason, right-click "save as" on your package gave me a ~400KB file, simply clicking on your link got me the full ~14MB file. Installed and working fine now.
  23. I'll admit I'm not entirely certain about the internals, but isn't the md driver part of unraid open source? Only emhttp is closed source. So I think it practical that users could continue to compile their own updated kernels with the md driver to support new hardware. That's not what we want, certainly, but even if unraid development halted tomorrow, the functionality as it exists today to be carried forward almost indefinitely.
  24. Been seeing this too, lately. Seems to have coincided with switching over to a new router running Tomato. That or OpenDNS. Nothing on the unRaid box changed, and I used to be able to keep rTorrent running continuously without problems. Thought it might be an out-of-RAM issue, so I set up a cache drive and added some swap... didn't help.
  25. I also am running 0.01.17-beta, and the built-in upgrade check tells me it is current.