optim

Members
  • Posts

    13
  • Joined

  • Last visited

Everything posted by optim

  1. Jon, Just wanted to follow up since my last email. Probably restating what you already know, so chances are its of no value, but what I've noticed so far when the system hangs: - Stop all VM and Dockers and perform heavy I/O on the system (i.e. disk to disk copies) and all is fine. It will happily copy for hours. - Have a Docker do heavy IO (i.e. Sonarr unpacking all the mono dependencies on startup) without any disk activity (on the host) and all is fine. No freeze. Although I did previously (a few weeks back) manage to get a Sabnzbd Docker to kill the system without host disk activity when processing a large queue. - Have a VM do heavy I/O (i.e. Ubuntu server VM w/ Sabnzbd processing 100's of gigabyte queue) without disk activity on host and no freeze. - Multiple VM and/or Dockers taxing the system at the same time will bring on a hang as well. I had a script processing processing some PDF's for language detection and renaming, while Sabnzbd worked on some downloads in the Ubuntu VM and it caused a hang. - Start a VM and/or Docker with heavy disk activity while performing heavy host I/O as above and the system will hang within 10-15 minutes. By hang I mean the disk to disk copy (using mc) will freeze, the Docker will be unresponsive and the web ui will cease to respond. Only a power cycle will bring the system back. You can telnet to the system, but trying to run a command such as iotop will never return and will hang that ssh session. During all the above, there is no dmesg entries, nothing in the syslog, no significant I/O (as per previously started iotop) and the system load will just creep up until you power cycle. I've left it in that state to see if it would recover and had a system load in the high 70's after a few hours with no end in site. It really seems as though the system is hitting some kind of a deadlock between Docker, KVM and the Linux host OS. When the hang starts, nothing is happening in terms of CPU, disk or network IO. All the I/O just suspends (doesn't even get to error out). It just ceases to respond which is why I think the various pieces (processes, threads, whatever) are waiting on another piece. That's why I describe it as a deadlock, at least in terms of its symptoms. I wish I could be more descriptive, but there's nothing much to go on. I wish the was a rogue process or io stream that could be identified as a smoking gun. In terms of how often it happens, I've gone 3-4 days without a hang and then had 4-5 hangs in a day. It's fairly random and very frustrating. Luckily the machine has IPMI so I can power cycle it remotely. Again, nothing here is probably of value but I figured it couldn't hurt to send it along anyway. Feel free to ignore it, I'm just filling in the time waiting for the next beta...
  2. I had switched to NZBGet for a few months up until about a week ago. I finally gave up on it as no matter how much I tweaked the settings I couldn't get it to saturate my download speed. It kept pulsing on and off and left a lot of unused bandwidth on the table. After my previous years of Sabnzbd I figured I'd go back and it worked perfect right out of the box. Yes, I believe the LT stance is BTRFS is used mainly for cache. Using it on array drives is considered experimental (my words, not theirs, but the gist I took away from reading the forums). I converted the 29 drives back from ZFS single drive zvols and SnapRaid dual parity on Ubuntu 14.04, so the loss of checksum's and snapshots was too big a deal for me to use XFS. The fact that the BTRFS on disk format is considered stable combined with the facts that Unraid doesn't use any real quirky BTRFS features on data drives and ships with a very recent 4.4.6 kernel on the 6.2 betas led me to take a chance on BTRFS. At the end of the day nothing on the Unraid data drives is truly critical for me. Anything critical is on my 9TB ZFS (BTW, I LOVE ZFS!) array which I have hosted in an Ubuntu 16.04 server VM with the motherboards SATA controller passed through in (same Unraid 6.2 box). The VM has a Python script I wrote that manages snapshots (72 hourly, 365 daily, 156 weekly, 600 monthly) and also backs up via CrashPlan to a secondary 6.1.9 XFS Unraid box onsite and offsite to the CrashPlan cloud. So anything valuable is taken care of. So far I have to say that BTRFS on Unraid has been rock solid, but time will tell. I have cranked this 6.2 box enough times this week that I thought for sure I'd have troubles. So far all my scrubs have come back clean, so my faith in BTRFS is slowly growing. Maybe one day it'll earn back some of the respect it lost during the early releases, but right now it's still not something I'd recommend to anyone as a primary solution. I lost data to it years ago and swore I'd never look at it again, but I have to admit it is gradually winning me back... In my blue-sky world, hopefully BTRFS remains stable so its scope with Unraid could be expanded. Subvolume support so we could use snapshots would be a welcome addition to Unraid's toolset, at least in my eyes. Using snapshots combined with Samba shadow vfs support (Windows Previous Versions) converted me to ZFS years ago and it'd be nice if we could get there with BTRFS built in to the kernel (no ZFS modules!) while still maintaining the "appliance"-like nature of Unraid. All this is my humble opinion of course and like politics and religion, file systems can be divisive topics.
  3. Thanks for the feedback. I've been waiting for my parity to rebuild since last night before trying any of your suggested changes. However, while the parity was rebuilding I did slowly start enabling dockers and monitored their effect on the load of the system. I went in thinking Plex would be a huge load on the system, but in the end I was able to load all dockers except Sabnzbd. Once the parity rebuilt, I enabled Sabnzbd and the load went through the roof and the web ui stopped responding. Sniffing around while that happened, I noticed that for some reason I was unable to read (ls) the Incomplete folder (on disk 7, not user share) that Sabnzbd was also trying to read from when it hung. After a reboot I started a btrfs scrub on the drive in question to make sure it wasn't an FS error that was causing Sabnzbd to lock up, which would in turn took out the web ui. So while the jury is still out until the scrub is finished (i'm 1TB into a 3 TB drive), it seems that if I don't run the Sabnzbd docker then all is well. My server was stable enough to rebuild dual 4TB parity drives in about 15 hours without Sabnzbd running. Interestingly, as a test I've created an Ubuntu JEOS 14.04 VM and installed Sabnzdbd there. It has been happily running stable for the last hour or so, even with the host server load staying above 15 on the system. So it's almost as though Docker is choking when Sabnzbd would have a lot to process, but KVM keeps chugging along. I would have the thought with less overhead in Docker it would perform better. So, again thanks for the suggestions, but I'm reluctant to try them at the moment as things are somewhat stable, with everything at least working. I still need figure my way through the various combinations of technologies, but in the end it is amazing what this one box is doing. It's: - a 73 TB media server - a 9 TB ZFS RAID5 server with Samba and 3 years and counting of Windows Previous Versions for all my documents, family photos, etc. - a downloading powerhouse (Sabnzbd, Sonarr, FlexGet, etc) - a CrashPlan offsite backup enabler for my 9TB of critical data above, plus selected media (thanks to the 9p passthrough) I've consolidated multiple machines into this thing and thanks to KVM, passthrough, 9p, virtio and all the other Linux and Unraid goodness its running on less electrical power, less equipment and more then adequate performance for home use. I know I sound like an infomercial/fanboy, but it really is awesome when you step back and think about it (and when everything works! lol). So, good job LT/volunteers/open source communities!
  4. Did a little more testing related to the high load/unresponsive server tonight. I re-enabled all the disabled dockers and queued up some downloads. I now have a par2 repair stuck in the download queue that puts enough IO strain on the server to have it lock up within 10 minutes of booting. I've rebooted 3 times to ensure that it will lock up consistently. I then did something daring (or maybe stupid ) to eliminate the possibility of it being the dual parity. I unassigned both my parity drives and rebooted the server to see if it would lock up. The good news is that it locked up with 10 minutes of booting, so I'm now assuming it does not have anything to do with the new/changed dual parity code (sorry for doubting you Tom!). The bad news is I'm more stumped than ever as to what it could be. Below is what top and iotop are reporting at roughly the same time. The server is sitting in an unresponsive state right now, although my previously connected ssh sessions continue to update the top and iotop screens. You'll notice that the top command shows high load and the wa figure indicates its waiting on IO of some sort. But the iotop doesn't show any significant disk use. In fact there is no disk use and if I leave it long enough the drives spin down as per their settings (seen in syslog and dmesg). I'm not enough of a Linux guru to figure out where to look next, so if anyone has suggestions on what next steps could be, please pass them along. Thanks! top: top - 20:53:21 up 42 min, 3 users, load average: 38.53, 38.31, 32.92 Tasks: 1031 total, 2 running, 1029 sleeping, 0 stopped, 0 zombie %Cpu(s): 10.7 us, 9.2 sy, 0.0 ni, 0.0 id, 80.2 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32989816 total, 17111804 free, 1649940 used, 14228072 buff/cache KiB Swap: 0 total, 0 free, 0 used. 30570024 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21618 nobody 20 0 513616 11372 3148 R 99.7 0.0 29:55.60 /usr/bin/par2 r /incomplete-d+ 18167 nobody 20 0 432220 162120 37060 S 60.2 0.5 17:19.48 ./Plex Media Server 7785 root 20 0 83092 21380 7160 S 6.9 0.1 2:42.74 /usr/bin/python /usr/sbin/iot+ 8696 root 20 0 25892 4212 2468 R 1.0 0.0 0:22.84 top 292 root 39 19 0 0 0 S 0.3 0.0 0:00.22 [khugepaged] 11685 root 20 0 25772 3880 2368 S 0.3 0.0 0:21.00 top 1 root 20 0 4372 1640 1532 S 0.0 0.0 0:07.00 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 [kthreadd] 3 root 20 0 0 0 0 S 0.0 0.0 0:00.21 [ksoftirqd/0] 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/0:0H] 7 root 20 0 0 0 0 S 0.0 0.0 0:01.18 [rcu_preempt] 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_sched] 9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_bh] 10 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/0] 11 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/1] 12 root 20 0 0 0 0 S 0.0 0.0 0:00.08 [ksoftirqd/1] 14 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:0H] 15 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/2] 16 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [ksoftirqd/2] 18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/2:0H] 19 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/3] 20 root 20 0 0 0 0 S 0.0 0.0 0:00.07 [ksoftirqd/3] 21 root 20 0 0 0 0 S 0.0 0.0 0:00.09 [kworker/3:0] 22 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/3:0H] 23 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/4] 24 root 20 0 0 0 0 S 0.0 0.0 0:00.06 [ksoftirqd/4] 26 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/4:0H] 27 root rt 0 0 0 0 S 0.0 0.0 0:00.01 [migration/5] 28 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [ksoftirqd/5] 30 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/5:0H] iotop: Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 4945 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [kworker/u16:12] 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd] 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0] 5 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H] 7 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_preempt] 8 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_sched] 9 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [rcu_bh] 10 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0] 11 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/1] 12 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/1] 14 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/1:0H] 15 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/2] 16 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/2] 18 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/2:0H] 19 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/3] 20 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/3] 21 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:0] 22 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/3:0H] 23 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/4] 24 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/4] 26 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/4:0H] 27 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/5] 28 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/5] 30 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/5:0H] 31 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/6] 32 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/6] 33 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/6:0] 34 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/6:0H] 35 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/7] dmesg | tail -n 50: [ 646.361780] eth0: renamed from veth85dfe66 [ 646.370907] docker0: port 4(veth71faf9b) entered forwarding state [ 646.370923] docker0: port 4(veth71faf9b) entered forwarding state [ 646.466434] docker0: port 2(veth415f089) entered forwarding state [ 648.772820] device veth6254b33 entered promiscuous mode [ 648.772972] docker0: port 5(veth6254b33) entered forwarding state [ 648.772988] docker0: port 5(veth6254b33) entered forwarding state [ 648.773860] docker0: port 5(veth6254b33) entered disabled state [ 651.842532] docker0: port 3(vethcda3321) entered forwarding state [ 653.168843] eth0: renamed from veth3c62677 [ 653.173696] docker0: port 5(veth6254b33) entered forwarding state [ 653.173712] docker0: port 5(veth6254b33) entered forwarding state [ 661.378664] docker0: port 4(veth71faf9b) entered forwarding state [ 667.530740] BTRFS info (device loop1): disk space caching is enabled [ 667.530744] BTRFS: has skinny extents [ 668.226770] docker0: port 5(veth6254b33) entered forwarding state [ 668.917968] BTRFS info (device loop1): new size for /dev/loop1 is 1073741824 [ 668.925642] tun: Universal TUN/TAP device driver, 1.6 [ 668.925643] tun: (C) 1999-2004 Max Krasnyansky <[email protected]> [ 670.188941] device virbr0-nic entered promiscuous mode [ 670.303816] virbr0: port 1(virbr0-nic) entered listening state [ 670.303830] virbr0: port 1(virbr0-nic) entered listening state [ 670.326386] virbr0: port 1(virbr0-nic) entered disabled state [ 2500.891496] mdcmd (63): spindown 19 [ 2501.318623] mdcmd (64): spindown 21 [ 2503.464412] mdcmd (65): spindown 9 [ 2503.868055] mdcmd (66): spindown 10 [ 2504.154809] mdcmd (67): spindown 11 [ 2505.158121] mdcmd (68): spindown 14 [ 2505.585249] mdcmd (69): spindown 17 [ 2507.589672] mdcmd (70): spindown 1 [ 2508.026666] mdcmd (71): spindown 3 [ 2509.029710] mdcmd (72): spindown 4 [ 2509.456308] mdcmd (73): spindown 8 [ 2510.460182] mdcmd (74): spindown 12 [ 2510.897515] mdcmd (75): spindown 13 [ 2511.184286] mdcmd (76): spindown 15 [ 2511.471072] mdcmd (77): spindown 16 [ 2511.757819] mdcmd (78): spindown 20 [ 2513.185802] mdcmd (79): spindown 5 [ 2514.473687] mdcmd (80): spindown 6 [ 2518.143213] mdcmd (81): spindown 26 [ 2520.572942] mdcmd (82): spindown 2 [ 2527.584083] mdcmd (83): spindown 22 [ 2533.878293] mdcmd (84): spindown 23 [ 2535.306545] mdcmd (85): spindown 7 [ 2535.593328] mdcmd (86): spindown 18 [ 2536.022367] mdcmd (87): spindown 24 [ 2536.449974] mdcmd (88): spindown 25 [ 2536.736169] mdcmd (89): spindown 27 tail -n 50 /var/log/syslog Apr 12 20:22:19 unmedia root: Starting libvirtd... Apr 12 20:22:19 unmedia kernel: tun: Universal TUN/TAP device driver, 1.6 Apr 12 20:22:19 unmedia kernel: tun: (C) 1999-2004 Max Krasnyansky <[email protected]> Apr 12 20:22:19 unmedia emhttp: nothing to sync Apr 12 20:22:19 unmedia rc.unRAID[18670][18674]: Processing /etc/rc.d/rc.unRAID.d/ start scripts. Apr 12 20:22:20 unmedia kernel: device virbr0-nic entered promiscuous mode Apr 12 20:22:21 unmedia avahi-daemon[12607]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1. Apr 12 20:22:21 unmedia avahi-daemon[12607]: New relevant interface virbr0.IPv4 for mDNS. Apr 12 20:22:21 unmedia avahi-daemon[12607]: Registering new address record for 192.168.122.1 on virbr0.IPv4. Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered listening state Apr 12 20:22:21 unmedia dnsmasq[19079]: started, version 2.75 cachesize 150 Apr 12 20:22:21 unmedia dnsmasq[19079]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: DHCP, sockets bound exclusively to interface virbr0 Apr 12 20:22:21 unmedia dnsmasq[19079]: reading /etc/resolv.conf Apr 12 20:22:21 unmedia dnsmasq[19079]: using nameserver 192.168.10.1#53 Apr 12 20:22:21 unmedia dnsmasq[19079]: read /etc/hosts - 2 addresses Apr 12 20:22:21 unmedia dnsmasq[19079]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Apr 12 20:22:21 unmedia dnsmasq-dhcp[19079]: read /var/lib/libvirt/dnsmasq/default.hostsfile Apr 12 20:22:21 unmedia kernel: virbr0: port 1(virbr0-nic) entered disabled state Apr 12 20:38:02 unmedia sshd[24198]: Accepted none for root from 192.168.10.248 port 50879 ssh2 Apr 12 20:52:51 unmedia kernel: mdcmd (63): spindown 19 Apr 12 20:52:52 unmedia kernel: mdcmd (64): spindown 21 Apr 12 20:52:54 unmedia kernel: mdcmd (65): spindown 9 Apr 12 20:52:54 unmedia kernel: mdcmd (66): spindown 10 Apr 12 20:52:54 unmedia kernel: mdcmd (67): spindown 11 Apr 12 20:52:55 unmedia kernel: mdcmd (68): spindown 14 Apr 12 20:52:56 unmedia kernel: mdcmd (69): spindown 17 Apr 12 20:52:58 unmedia kernel: mdcmd (70): spindown 1 Apr 12 20:52:58 unmedia kernel: mdcmd (71): spindown 3 Apr 12 20:52:59 unmedia kernel: mdcmd (72): spindown 4 Apr 12 20:53:00 unmedia kernel: mdcmd (73): spindown 8 Apr 12 20:53:01 unmedia kernel: mdcmd (74): spindown 12 Apr 12 20:53:01 unmedia kernel: mdcmd (75): spindown 13 Apr 12 20:53:01 unmedia kernel: mdcmd (76): spindown 15 Apr 12 20:53:02 unmedia kernel: mdcmd (77): spindown 16 Apr 12 20:53:02 unmedia kernel: mdcmd (78): spindown 20 Apr 12 20:53:03 unmedia kernel: mdcmd (79): spindown 5 Apr 12 20:53:05 unmedia kernel: mdcmd (80): spindown 6 Apr 12 20:53:08 unmedia kernel: mdcmd (81): spindown 26 Apr 12 20:53:11 unmedia kernel: mdcmd (82): spindown 2 Apr 12 20:53:18 unmedia kernel: mdcmd (83): spindown 22 Apr 12 20:53:24 unmedia kernel: mdcmd (84): spindown 23 Apr 12 20:53:26 unmedia kernel: mdcmd (85): spindown 7 Apr 12 20:53:26 unmedia kernel: mdcmd (86): spindown 18 Apr 12 20:53:26 unmedia kernel: mdcmd (87): spindown 24 Apr 12 20:53:27 unmedia kernel: mdcmd (88): spindown 25 Apr 12 20:53:27 unmedia kernel: mdcmd (89): spindown 27 Apr 12 21:01:01 unmedia sshd[27286]: Accepted none for root from 192.168.10.248 port 51560 ssh2
  5. I'm in the same situation. I've tried disabling all plugins, vm's and dockers but it still doesn't improve the lock ups. The system becomes unresponsive with a load (according to top) of >50. Iotop doesn't show any IO activity and top says the CPU is not busy, yet load remains extremely high. Dmesg doesn't have anything of interest, with the last messages being about spindowns. The system will not shut down once the load gets that high, so I have to resort to powering off. I should also mention that I can connect through telnet while this is happening, but depending on what command I issue the session will lock up. For example a "btrfs fi sh" will never return. I also noticed that any significant concurrent IO will bring on the problem quickly, which made me wonder if perhaps there is some kind of deadlock/race condition happening with the new dual parity code. Totally unsubstantiated (sorry Tom, I'm not trying to point fingers!), just offering my uninformed guess at least. My other thought was maybe BTRFS was dying under the concurrent IO, but then again BTRFS was solid when I transferred the 70+ TB from my ZFS disks onto BTRFS so I could move back to Unraid. I did that using Ubuntu 16.04 Beta which used the 4.4 kernel as well, and had 3 disks copying at the same time (from 3 other disks, not thrashing). Average throughput on the hardware saturated a SATA2 connection and I never had a lockup in the two weeks it took me to move the data. All 27 data drives in the array are formatted with BTRFS and are spread across two Norco 24 bay enclosures using an Intel SAS expander. This setup was reliable using Ubuntu 14.04 and ZoL so I know the hardware is solid. Something just needs to be tweaked a little to make it reliable. Also, FWIW, heavy IO that brings on the lock up was not using any of the drives on the expander. It was done by moving data from one drive to another using mc in an ssh session and having NzbGet working on uncompressing a large 200GB download. Diags are attached and you can PM me if you want me to test anything for you... Thanks to the LimeTech staff and volunteers for all your efforts! unmedia-diagnostics-20160412-0734.zip
  6. Can somebody please post the LSI_MegaRAID_to_SAS2008(P10).zip file somewhere as the link is still not working. I received 3 of these cards via FedEx yesterday and have had 6 3TB drives sitting on my desk waiting for a couple of weeks. It figures the link stopped working as soon as I got the controller... Or I can PM my email if some kind soul could email it to me. Thanks for any help!
  7. TestDisk & PhotoRec. Both are free and open source utils. The Windows version include a minimal (Cygwin?) Linux environment and they can read a bunch of partition types, including volumes spread across RAID stripes. See the wiki for the full details: http://www.cgsecurity.org/wiki/TestDisk PhotoRec is bundled in the TestDisk download. I told it not to look for a specific partition type and it picked out the corrupt ReiserFS partition. So far PhotoRec has picked up 325 mkv's off the volume, and they are are playable. Unfortunately the names are all generic. It's just over half way through scanning the volume.
  8. Actually, I think it is. Sorry, I prbably didn't explain it well enough. To be clear, I pulled the reconstructed disk13 and examined it on another Windows 7 system using a USB dock. So this is a drive that was originally dd'ed with zeroes, placed into the UnRAID to replace a missing drive, rebuilt, pulled from the system to be placed in another box, and then scanned for data. There is no parity-on-the-fly recalcs here, as it is outside of the UnRAID environment. The Windows version of PhotoRec I'm using has pulled a bunch of playable mkv's and jpg coverart. From a drive that was originally dd'ed! That is my point. The drive is not mountable, says it needs to be formatted, but somehow parity combined with all the other disks were able to reconstitute at least some of the data. Once again, I understand I'm off the beaten path here, and I'm not looking for any resolution, just observing what I see. I think it is a good thing that at least the data is getting reconstructed, but I am questioning why the disk is flagged as requiring formatting. It's likely Tom is correct in that maybe parity was somehow faulty, but I find it curious as the beta7 scan done hours earlier passed with no errors.
  9. Well, the data rebuild finished this morning. The Unformatted status remained and the disk13 was not available for mounting as it said it was missing the superblock. I only have options in the GUI to format the drive at this point. But here is the strange thing. I took down the server and pulled the rebuilt disk13. Now I am running PhotoRec on the drive to recover any readable data that might be found. This drive had a number of TV episodes on it in mkv format, and the funny thing is that so far I have been able to recover a number of these episodes from the rebuilt drive. It is still progressing. So my point is, while the drive is somehow flagged as unformatted/unusable, the data rebuild is restoring data from the parity calcs. I still have my original disk13, and I will probably re-add it and redo the parity drive, but I'm just trying to document whats going on here in case anyone else sees something similar. BTW, I went back and checked my emails (I had unMenu emails enabled), and the parity check had finished around 4:00 AM on the 11th. Again, that check was done in beta7 and came back with no errors. I started the drive swap (under beta 8d) around 8:30 AM on the 11th, so it wasn't few days like I said in an earlier email, but actually a few hours.
  10. Now that you mention it, one had run a couple of days prior, but not immediately before replacing the drive. Last parity check had found no errors, if thats of any significance.
  11. Yes it was empty. It had been part of a ZFS RAIDZ array from a previous test setup, but was dd'ed with zeros prior to being used.
  12. Upgraded to beta8d from beta7 and all is working well, except for one thing that struck me as odd. Perhaps it is correct, but I thought I'd post to make sure. I was using a Seagate 2TB drive as disk13 in my setup, and wanted to pull the drive to use it in another machine, intending to replace it with a spare Western Digital 2TB I had lying around. Rather than pre-clearing (drive is well burned in), I figured I would swap the drives and let the Data Rebuild take care of putting all the data back. As a failsafe, I would keep the Seagate on my desk until the rebuild had completed correctly. So upon restarting the server, I got the usual checkbox and Start Data Rebuild buttons. Once clicked, the rebuild started, but I also the saw a box asking if I want to Format the drive, and the drive is listed as unformatted, even though the rebuild is proceeding correctly (I think!). To understand what I am saying better, have a look at the screen capture attached to my post. I don't know if this is a bug or feature, so maybe Tom (or someone else!) can enlighten me. Thanks for any help! Daniel