Cull2ArcaHeresy

Members
  • Posts

    98
  • Joined

  • Last visited

Everything posted by Cull2ArcaHeresy

  1. but sadly "Connect to OpenVPN: We are still working on this script.", which makes me assume the back end aint ready yet either
  2. guessing it has to do with max speed tweaks, which i will look into after getting constant consistent connections and activity EDIT: now that i'm finally creating a combined file, the default ones i have start with the following, so maybe its always been udp by default (thought defaults were tcp) client dev tun proto udp remote LOCATION.privateinternetaccess.com 1198
  3. was that built off the default pia files? looking at mine, the gcm and auth are different (quickly compared so probably more differences).
  4. Changing of pia vpn files to get to different endpoints while whatever is going on, what is the best way to measure "quality" for each? There are different tools to measure container stats but also connecting directly to rtorrent could give data. I have enough seeding torrents that that should be constantly at limit (when connection is good), fewer downloads but that would be easy to change. Trying to figure out the best way to keep track of upload and download for each vpn file. Being early on this would be raw totals for each day and later being more of a line graph probably. Raw scaled by how many hours that vpn file was used that day (other options too like hourly average, but that is a later thing). So say your connection is bad right now, you stop the container and run this script that would give you a list of vpn files and their daily quality history (from your runs) and ask you to select which one you want to copy in for the container to use. Then you start the container again and it will connect using the chosen file. While container is running this script is logging the network usage.
  5. compared to a "graceful stop", would this cause any issues like rtorrent corrupting a file or leakage (or other)? Also not a hard threshold, but a count that resets each hour might be much better so a series of reconnects over time wouldn't cause a restart.
  6. i had assumed that it more or less did something like 1-[ established vpn conn ], 2-[ locked down connections (including dns) ], 3-[ started rtorrent/rutorrent ], 4-[ monitored connection and if need be kill *torrent and go back to #1 ] knowing the dns issue (and yea docker env variables are a pain to change), it is a much more closed locked down system Can a container restart itself? Rough idea that is it counts number of fails (connection or pia forward), and at some arbitrary threshold it restarts (or kill itself with the restart flag in docker command)? Could always have it create a file when that threshold is met and a script running on unraid sees the file, deletes it and restarts the container...but idk if it is a big enough problem to do that. just to be clear, all pia with same creds or all other with same creds. Im not against mixing them, but being the same for simplicity.
  7. this method requires restarting the container, any reason to not have the line where an openvpn connection is established to pick a random *.ovpn file? Been meaning to try it, just havent yet. This way whenever it resets the connection it will try a new one (or pick the same again as random).
  8. Only officially supported one left? I deleted my post since you had a factually correct answer, just didnt pop up until after i had made my comment.
  9. does your vpn have port forwarding? i use pia and sometimes pia and rtorrent will pick the same port and it will show open, but most the time it shows closed. If i found the list of pia port ranges again (but was old list so prob bad) i would set my range to match, but be it pia or something else, port open or closed i get the same connection where it will be full speed for a while now and then but most the time it is <1m up/down. i did not buy pia for this, already had it.
  10. one thing is that # comments out the line, so if you know that is the right range then remove the #
  11. they switched it to switzerland basically it seems
  12. Have had a few issues with seedbox where rutorrent crashes and rtorrent gets stuck, so had to kill and start instead of restart. With this container the few times i have crashed rutorrent it did a similar thing so i had to stop container and then start it again and wait for a longer start time (was like 20 minutes). Monitoring the supervisor log showed that it was working on it tho. Main thing that ive done that has caused crashes so far is dumping too many torrent files into the watch dir at the same time.
  13. PIA (Switzerland) connects and ip confirmed to be there. When rutorrent saying port open does not seem to have an impact on speed, but most the time it has port open. My internet is 200/20, so rtorrent config has bandwith limited to 100 down/10 up (12500/1250). In waves ill have full connections at limits, but most the time up/down is less than 1 MB (50+ active torrents or 2). Seeing stuff about PIA transitioning and udp problems, but i have connections (most the time) just speed is almost always minuscule. In the case that this is a disk IO issue, im planning on adding 2 user shares to the container with 1 being cache only (downloading) and the other being normal both (seeding). Have not started this yet because it seems less likely to be the issue.
  14. All my sata seagate drives (bare and shucked) have that attribute and large list of others, the sas hitachi drives have few attributes available and 240 aint one. Wonder if its a seagate only thing.
  15. The drop at 80% is weird, but should have a full picture in a week when full run is recorded. Almost 2 days is slow (1d 19h), but tolerable compared to 3.5 days. Since it looks like the slowdown is a 6.8.x issue, ill mark this as solved.
  16. Excel was not cooperating with how was trying to graph, so 3 graphs (left out time left). Stat are grabbed every minute and parsed after from text file since this was thrown in quickly. Will have it fixed and grab the whole run when monthly runs on the 1st and then will add to the linked post. The old config of 8 disk array did at least 1 or 2 parity checks after updating to 6.8.3. Being as there was a check on 5-20 & 5-28, 3 or 4+ checks have happened on 6.8.3 and the many others in the 6.8.X range (i tend to update 1 to 3 months after an update is out as that requires a restart). Currently at 6.03 tb (50.2%), and only reading from the 5 array 12 tb disks (and 2 parity) is still at ~80 (def better than 30/40). Guess I crossed some threshold of disk count to cause slowness in 6.8, probably.
  17. when mover ran (< 1 gig of files) reads dipped a bit but bounced back and have maintained 75 to 88. Still quite a bit slower than used to be. Position is 2.14 tb (17.8%). After it passes the 4tb mark ill update as thats the only main variable left then. Gonna script something to grab the speeds as it goes to be able to see a graph over time and correlations with disk speed graph.
  18. i thought i forgot something raza-diagnostics-20200723-1214.zip
  19. TLDR: why is my parity check speed down to 40 from 100+ ranges? Had 8 data disks (12tb and 6tb) previously. Added 16 4tb drives and everything seemed fine, but when monthly parity check came it took 3.5 days instead of about 1. Duration correlates to speed staying around 40. Altho currently at 1.48tb (12.4%) and getting 70, but that is still way less. Does the parity calculation get exponentially more difficult and makes it slower or i something else going on? The disk speed docker test does not have the 4tb disks capped at 40. A 6tb seagate barracuda (shucked) was added since the first slow check. Being as everything else seems fine I haven't worried about that issue thinking at the next monthly check if it does the same thing then I'll post. I had to perform an unclean shutdown earlier which triggered the check, but canceled it to add the 6tb disk and restarted the check. Note: the 89.7 one i when i was doing alot of IO
  20. Having found your reply here (and stuff elsewhere), ran "btrfs scrub start /mnt/cache" and below is output plus scrub status and dev stats. Your reply to this thread was ironic timing being not long after this finished while was doing docker stuff. I have not run a second scrub yet, if I even should. Neither the docker.img file nor the backup I made (renamed correctly) worked. Further making it seem like image corruption, followed the "***OFFICIAL GUIDE*** Restoring your Docker Applications in a New Image File" instructions and dockers are added back (plex quickly tested and seems fine, assume rest are). Not seeing errors in log file anymore. I assume reset stats (comment I linked), and then add monitoring (you linked)? Or clear stats, run another scrub, clear stats, and then add monitoring? Due to a wrong network config and having no idea what I had set unraid password to, I had to clear password hash in shadow file on USB. Unraid rotated mac addresses between nics with hardware change...so I could have used port 3/4 it appeared, after having gotten in...mini-rant over. I think the idrac power off command caused an ungraceful shutdown, which then later interrupted a parity check when pushing (not holding) front power button to power off. Everything seems to be fine now, but most VMs are still off and I am not doing heavy data transfers (VMs off including one that is "gateway" for seedbox). Now is ~9 days away from a parity check, but since one was interrupted (and the shutdown that triggered the check), I assume run one? Speaking of cache drives, I know having a 2tb and two 1tb cache drives is not standard (plus mix of qlc and tlc...maybe worst part). Is there any reason to switch to all 1tb drives instead of mixed sizes? Thinking about getting another 1tb to bring cache pool up to 2.5tb, but I can go down on cache size if it is better (down to 1tb or 1.5tb with a new 1tb drive). root@Raza:~# WARNING: errors detected during scrubbing, corrected btrfs scrub s root@Raza:~# btrfs scrub status -d /mnt/cache UUID: b376c540-8bb3-4580-86bf-fd19d5c8ed08 scrub device /dev/sdc1 (id 1) history Scrub started: Mon Jun 22 01:39:43 2020 Status: finished Duration: 0:14:13 Total to scrub: 474.03GiB Rate: 507.75MiB/s Error summary: verify=478 csum=7468 Corrected: 7946 Uncorrectable: 0 Unverified: 0 scrub device /dev/sdb1 (id 2) history Scrub started: Mon Jun 22 01:39:43 2020 Status: finished Duration: 0:14:19 Total to scrub: 475.00GiB Rate: 505.87MiB/s Error summary: csum=10124 Corrected: 10124 Uncorrectable: 0 Unverified: 0 scrub device /dev/sdn1 (id 3) history Scrub started: Mon Jun 22 01:39:43 2020 Status: finished Duration: 0:33:47 Total to scrub: 949.03GiB Rate: 430.95MiB/s Error summary: no errors found root@Raza:~# btrfs dev stats /mnt/cache [/dev/sdc1].write_io_errs 574718 [/dev/sdc1].read_io_errs 407964 [/dev/sdc1].flush_io_errs 6103 [/dev/sdc1].corruption_errs 7468 [/dev/sdc1].generation_errs 478 [/dev/sdb1].write_io_errs 724802 [/dev/sdb1].read_io_errs 490716 [/dev/sdb1].flush_io_errs 6103 [/dev/sdb1].corruption_errs 10124 [/dev/sdb1].generation_errs 0 [/dev/sdn1].write_io_errs 0 [/dev/sdn1].read_io_errs 0 [/dev/sdn1].flush_io_errs 0 [/dev/sdn1].corruption_errs 0 [/dev/sdn1].generation_errs 0 root@Raza:~# btrfs scrub status /mnt/cache UUID: b376c540-8bb3-4580-86bf-fd19d5c8ed08 Scrub started: Mon Jun 22 01:39:43 2020 Status: finished Duration: 0:33:47 Total to scrub: 1.69TiB Rate: 859.00MiB/s Error summary: verify=478 csum=17592 Corrected: 18070 Uncorrectable: 0 Unverified: 0 root@Raza:~#
  21. Update If I pull all the drives trays halfway out, boot the server, and then push them all in, VM is available after starting array. Thinking back to after this issue started, I think the only other successful VMs available boot was when I did it this same way. Either way no docker at all. All of my smart reports are passed on all drives (parity/array/cache, didn't check unassigned). Going to power off and reseat lsi cards and cables, and then do a normal boot with all drives connected.
  22. My drives are exos, plus a few barracuda. After getting my docker/VM issue fixed, I'll see what the hitachi drives ordered with DAS say (before and after preclearing). Thought is maybe consumer (or older?) drives do not have all the same metrics as enterprise ones. All current drives are SATA, the new ones are SAS, if that would make a diff.
  23. Added an external lsi card to attach a DAS. Changed the pcie slots that the 2 internal lsi cards are in and removed a nic (VM that was using it has been able to use 1 from 4 port). DAS is off to reduce devices that could be causing issues and reduce boot time with indexing devices. Sometimes when I reboot, VM tab will have "Libvirt Service failed to start." but sometimes VMs are there and work fine. Have not been able to get docker page to not say "Docker Service failed to start."-it used to work. Researching "bad tree block start" (was in log), I found and made a backup of docker.img ready to delete/recreate, but looking at log chunk below, my concern is that one of my cache drives is failing. Would think card/cable, but all of those say sdc not sdc & sdb (or sdn). Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5201959788544 (dev /dev/sdc1 sector 18198928) Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5201959792640 (dev /dev/sdc1 sector 18198936) Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5202249596928 wanted 10386593 found 10375290 Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202249596928 (dev /dev/sdc1 sector 18764960) Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202249601024 (dev /dev/sdc1 sector 18764968) Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202249605120 (dev /dev/sdc1 sector 18764976) Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202249609216 (dev /dev/sdc1 sector 18764984) Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5202217517056 wanted 10386551 found 10375235 Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202217517056 (dev /dev/sdc1 sector 18702304) Jun 21 20:09:46 Raza kernel: BTRFS info (device sdc1): read error corrected: ino 0 off 5202217521152 (dev /dev/sdc1 sector 18702312) Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5201256808448 wanted 10386601 found 10385206 Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5201545854976 wanted 10384892 found 10379573 Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5201542184960 wanted 10384888 found 10379564 Jun 21 20:09:46 Raza kernel: BTRFS error (device sdc1): parent transid verify failed on 5201257611264 wanted 10386601 found 10385206 Jun 21 20:09:46 Raza ker raza-diagnostics-20200621-2030.zip
  24. Not sure where I got that from then. Guessing you are talking about "9 Power on hours" vs "240 Head flying hours" from disk attributes? Checking a few disks, parity ones having the highest numbers have a "240 Head flying hours" of ~22.5% of "9 Power on hours". I'd love to have actual backups, but not there yet $. I'm sitting at 3.12tb free of 78tb (plus 1.10tb free of 2tb cache). Adding in other devices backups (need to do still) we can round that to a full 80. I want to move my 3 x 8tb and 1 x 6tb external drives onto the array, data then probably shuck too. I definitely ordered too many of those 4tb sas drives being I forgot the 4 external ones would be added (and the whole drive limit starting this 3 tier idea) But that would put me at ~110tb of data (+ stacks of full 250g drives that need to be emptied), so the 100tb worth of disks ordered wouldn't even cover that ~110tb. Eventually I plan to get another server (low powered since just a backup) full of largee/slow/cheap drives, create a backup locally, then get it in a datacenter somewhere doing continuous backup (and the whole 2 locations backup rule). But also that would be more of a hot backup (pending upload speeds of ~9gigs/hr) vs your cold backup, altho wouldn't snapshot/timemachine style backup be the best of both (just maybe slower restore speeds)? What do you mean WORM drive? If really write once read many, wouldn't your server be a "cold backup" categorically and would explain low active/read/write time (maybe 10% is normal, but my 22.5% seems low to me as well). And yea, I've been contemplating making an intelligent plex plugin caching thing for unRAID to keep the next unwatched episodes in cache for the most recently watched X shows. Ideal it would watch your watch rate for each show and keep the number of episodes you watch a day cached. Either where X is set or it does as many shows as it can before it reaches whatever storage limit you set (or a combination). Reads from plex db and will cache the next episode (Z) as you are watching episode Y of some show (or maybe trak.tv api instead of plex db). For your case you could always add another plex library pointing at a cache only share and copy the series there manually. I have a second TV lib for 1 episode in each of 2 shows that plex wouldn't grab, I should investigate that again and delete the lib if fixed. Thinking about your use case it would make sense to add a selector to the plugin where you could tell it to keep a selection in cache. I haven't looked into how to make the mover move a file to cache, but thinking maybe touch -ac would do it. Does mover actually move files to cache? I thought it copied them to it and then if nothing was changed it deleted off cache and if something was changed it moved it back overwriting the file on array. Rereading the "Use cache" text, I had forgot that mover does not do any kind of intelligent thing like moving what you're working with to cache...so the plugin would have to move the file from array to cache, and keep an active read on the file so mover does not take the files away since they are in use. This might be a little more difficult. My use is because with many shows in plex, when I want to go back or forward more than 30 seconds it gets stuck at loading and I have to close and reopen it. I'm assuming it has something to do with reading from disk so cache would fix it. If I am thinking this thru right, my correct (and simple) move steps would be to (after getting the hardware setup): insert 16 drives and preclear them add the 16 to array move data off the 4 external drives shuck drives and add insert them preclear 4 shucked add 4 shucked to array in turns (only 4 bays open) preclear the other 9 4tb sas drives so they are ready to be used when a drive fails (almost sounds morbid 😛 )
  25. I'm adding 24 more bays, when adding the 8 in array and 2 parity would be 34. The new drives also have a (presumably) higher failure rate, seller told me to expect 1 a year to fail if running 24/7. With the higher failures, the disk rebuilds will put more stress on the original 8+2 drives. Vault layer would use all 24 bays and not have 4 free (think about if using a 60+ bay). Vault layer would also have the more likely to fail drives add extra stress only to the same set of drives for rebuilds. This is assuming that a rebuild is more stressful than the monthly parity check. For ease of use it would make archiving a seamless integration similar to not knowing if a file is on cache or array when you access it (I know you can find out). Future version would have the archive section made of slower larger disks. Haven't messed with this yet, but guessing this is not compatible with unRAID https://github.com/45Drives/autotier Maybe I chased this idea too far down the rabbit hole and the right move is to add 20 drives (28array+2parity) and then have 5 hot spares, in that config would go from 78tb + 2tb cache to 158tb + 2tb cache so still doubling space. Concerned about the rebuild stress still, but maybe that is me being paranoid. If just adding 20 to array, thinking would probably only add them in 2/3/4/5 at a time (as space is needed) to reduce power on hours thus reducing drive failure rate...but every time I add drives the parity would need to be rebuilt.