Jump to content

weirdcrap

Members
  • Posts

    460
  • Joined

  • Last visited

Posts posted by weirdcrap

  1. You can run the User Scripts plugin to schedule your scripts.

     

    Public key auth would be your solution to allowing it to run with no manual intervention.

     

    LimeTech actually published a guide about the very thing you are trying to do: https://unraid.net/blog/unraid-server-to-server-backups-with-rsync-and-wireguard


    It uses WireGuard, but if you have another solution just skip that part and start at the section titled "Setting up Site A and Site B for RSYNC"

     

    SpaceInvaderOne I believe has also done a walk through video on the subject. EDIT: I was wrong or I can't find it.

     

    As for adding SSHPass, you could see if it is listed in

    or ask for it to be added to the list of available packages for the plugin. Though IMO Public key auth is the better way to go anyways.

  2. UPDATE 3/3/2021: I have definitively determined my performance issues are caused by WireGuard. I do not yet know if or when I'll find a solution.

     

    UPDATE 8/1/2022: This is still very much broken. I try a file transfer every couple of months and it continues to be horribly slow. Using RSYNC over SSH outside the Wireguard tunnel works great and is what I will continue to use until I can figure this sh*t out.


    FINAL UPDATE 11/24/2022: See my last post here for solution and TL;DR:

     

    Let me preface all of this by saying I'm not sure where my issue lies, so I'm going to layout what I know and hopefully get some ideas on where to look for my performance woes. 
     

    The before times:

    Before setting up WireGuard I had SSH open to the world (with security and precautions in place) on my main server so that once a month my backup server could connect and push and pull content as defined in my backup script. This all worked splendidly for years and I always got my full speeds up to the bandwidth limit I set in my rsync parameters.

     

    Now: 

    With the release of WireGuard for UnRAID I quickly shutdown my SSH port forward and setup WireGuard. I have one tunnel for my administrative devices and a second tunnel which serves as sever2server access between NODE and VOID.

     

    NODE is my main server, and runs 6.8.3 stable. It is located on a 100Mbps/100Mbps fiber line.

    UPDATE: As a last ditch effort I upgraded NODE to 6.9.0-RC2 as well, no change in the issue.

     

    VOID is my backup, runs 6.9.0-RC2 and lives in my home on a 400Mbps/20Mbps cable line.

     

     

    In this setup, my initial rsync session will go full speed for anywhere from 5-30 minutes, then suddenly and dramatically drop in speed, down to 10Mbps or less and stay there until I cancel the transfer. I can restart the transfer immediately and regain full speed for a time, but it always eventually falls again.

     

    Here is my rsync call: 

    rsync -avu --stats --numeric-ids --progress --delete -e "ssh -i /mnt/cache/.watch/id_rsa -T -o Compression=no -x -o StrictHostKeyChecking=no" root@NODE:/mnt/user/TV/Popeye/ /mnt/user/TV/Popeye/

     

    Here is a small sample of the rsync transfer log to illustrate the sudden and sharp  drop in speed:

    Season 1938/Popeye - S1938E09 - Mutiny Ain't Nice DVD [BTN].mkv
        112,422,538 100%   10.80MB/s    0:00:09 (xfr#24, to-chk=58/135)
    Season 1938/Popeye - S1938E10 - Goonland DVD [BTN].avi
         72,034,304 100%    9.76MB/s    0:00:07 (xfr#25, to-chk=57/135)
    Season 1938/Popeye - S1938E11 - A Date to Skate DVD [BTN].mkv
        138,619,127 100%   10.44MB/s    0:00:12 (xfr#26, to-chk=56/135)
    Season 1938/Popeye - S1938E12 - Cops Is Always Right DVD [BTN].mkv
        127,109,972 100%   11.02MB/s    0:00:10 (xfr#27, to-chk=55/135)
    Season 1939/Popeye - S1939E01 - Customers Wanted DVD [BTN].mkv
        114,673,044 100%   10.50MB/s    0:00:10 (xfr#28, to-chk=54/135)
    Season 1939/Popeye - S1939E02 - Aladdin and His Wonderful Lamp DVD [BTN].mkv
        325,996,501 100%   11.69MB/s    0:00:26 (xfr#29, to-chk=53/135)
    Season 1939/Popeye - S1939E03 - Leave Well Enough Alone DVD [BTN].mkv
        105,089,182 100%   11.30MB/s    0:00:08 (xfr#30, to-chk=52/135)
    Season 1939/Popeye - S1939E04 - Wotta Nitemare DVD [BTN].mkv
        149,742,115 100%  754.78kB/s    0:03:13 (xfr#31, to-chk=51/135)
    Season 1939/Popeye - S1939E05 - Ghosks Is The Bunk DVD [BTN].mkv
        114,536,257 100%  675.53kB/s    0:02:45 (xfr#32, to-chk=50/135)
    Season 1939/Popeye - S1939E06 - Hello, How Am I DVD [BTN].mkv
         92,083,730 100%  700.03kB/s    0:02:08 (xfr#33, to-chk=49/135)
    Season 1939/Popeye - S1939E07 - It's The Natural Thing to Do DVD [BTN].mkv
        110,484,799 100%  715.66kB/s    0:02:30 (xfr#34, to-chk=48/135)
    Season 1939/Popeye - S1939E08 - Never Sock a Baby DVD [BTN].mkv
         97,660,132 100%  716.88kB/s    0:02:13 (xfr#35, to-chk=47/135)
    Season 1940/Popeye - S1940E01 - Shakespearian Spinach DVD [BTN].mkv
        102,543,357 100%  632.64kB/s    0:02:38 (xfr#36, to-chk=46/135)
    Season 1940/Popeye - S1940E02 - Females is Fickle DVD [BTN].mkv
        102,363,188 100%  674.34kB/s    0:02:28 (xfr#37, to-chk=45/135)
    Season 1940/Popeye - S1940E03 - Stealin' Ain't Honest DVD [BTN].mkv
        100,702,236 100%  732.80kB/s    0:02:14 (xfr#38, to-chk=44/135)
    Season 1940/Popeye - S1940E04 - Me Feelins is Hurt DVD [BTN].mkv
        111,018,052 100%  672.35kB/s    0:02:41 (xfr#39, to-chk=43/135)
    Season 1940/Popeye - S1940E05 - Onion Pacific DVD [BTN].mkv
        103,088,015 100%  650.18kB/s    0:02:34 (xfr#40, to-chk=42/135)
    Season 1940/Popeye - S1940E06 - Wimmin is a Myskery DVD [BTN].mkv
         61,440,000  59%  757.02kB/s    0:00:56  ^C
    rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(701) [generator=3.2.3]

     

    and my accompanying stats page during the same transfer. You can see the sudden decline around 11:46 which coincides with my sudden drop in transfer speed above:

    acitivty_drop1.thumb.png.95bd2bd4965d3e583997fb640a48d292.png

     

    I don't see anything telling in the system logs on either server when this speed drop happens. It almost seems like a buffer is filling up and not being emptied quick enough, causing the speed to tank.

     

     

    What I don't think it is:


    I don't think my issue is with WireGuard or my ISP speeds on either end. While the transfer is crawling along over SSH at sub-par speeds I can easily browse to  NODE over WireGuard from my Windows or Mac computer and pick any file to copy over the tunnel and I can fully saturate the sending servers upload with no issues while SSH is choking in the background:

    image.thumb.png.18cb92998a2dc22892fc37be3a5b0d60.png

     

     

    Could it have something to do with the SSH changes that took place between 6.8.3 and 6.9.0? None of the changes I'm aware of sound like the culprit but I could be wrong. So  besides that I'm pretty much out of ideas on what it could be without just playing with random ssh and rsync options. 

     

    Let me know if there is some other info I can provide, below are both servers diagnostic files:

     

    node-diagnostics-20210204-0751.zip

    void-diagnostics-20210204-0752.zip

     

    EDIT: I just realized LimeTech has a guide about this published: https://unraid.net/blog/unraid-server-to-server-backups-with-rsync-and-wireguard

     

    I looked it over and I'm not really doing anything different except not passing -z (compression) to rsync and disabling compression for the SSH connection. a lot of what is transferred for me is video and doesn't compress well so why waste the CPU cycles on it.

  3. Single or dual parity?

     

    UnRAID should be smart enough to keep track of your drive locations without you having to do anything special with a single parity disk.

     

    Dual parity is something I'm not familiar with as I don't currently use it but it does make a difference.

     

    With that in mind you should probably get a copy of your diagnostics zip and take a screenshot of the main tab where all your drives and serial #s are displayed in the current order just in case.

     

     

    • Like 1
  4. 1 hour ago, Squid said:

    That way that I did it was to look at all the .plgs within /boot/config/plugins and see which code section was long enough to hit a line 87 and then copied / pasted that section and saw that line 87 contained an ==, then uninstalled the plugin rebooted and saw the error didn't reappear.

     

    The way most people would have done it is uninstall one by one followed by a reboot and see when the error disappeared.

    Oh yeah I thought about trying to dig through the PLGs myself but I didn't really know what I was looking for so I figured asking would be quicker. I'd just have a million questions about the PLGs instead lol.

     

    @dlandon thank you for the quick fix! Glad it wasn't anything serious.

  5. Decided to dive straight into beta 35 (from stable) with my backup server to get ahead of any issues I might encounter as my backup and main are configured very similarly.

     

    I've noticed two innocuous errors on the monitor attached to the server. They don't show up in the logs anywhere that I can find and there is nothing preceding or following them.

    image.thumb.png.3b64261ecb35a0580cab9075f368041a.png

     

    I rebooted into safe mode and those two lines don't appear. So it has something to do with a plugin, but I'm not sure how to go about figuring out which one. I poked through the beta threads and tried some searching but didn't see this mentioned anywhere else yet.

     

    I haven't noticed any lost or broken functionality yet, so they appear to be harmless but I would feel better knowing where they are coming from and what if any damage it may cause.

     

    Diagnostics:

    void-diagnostics-20201203-1515.zip

  6. UnRAID v6.8.3

     

    Diagnostics and the previous two system logs capturing roughly my last 30 days of uptime: node-diagnostics-20201129-0852.zip

     

    A very strange issue I have just now started experiencing. Twice in the last 48 hours UnRAID has completely list its ability to resolve DNS names without a reboot of the server. All attempts to ping by name result in "name or service not known". UnRAID is unable to resolve any names for update checks and the like. Pinging by IP address works without issue and my wireguard sever continues to provide access to the system.

     

     

    I have made no recent changes to network setup or DNS. My network settings are statically configured and I utilize 3 upstream DNS servers which all remain pingable during this outage: 8.8.8.8, 1.1.1.1, 8.8.4.4

     

    When this issue occurs I can get into other devices on the network and they can resolve names just fine so this is something exclusive to my machine it would appear.

     

    If you look at my syslog from 11/28-11/29 (included in the zip above) you will see I rebooted the server and after a few minutes DNS resolution started working again and I was able to update some plugins and whatnot.

     

    Nov 28 11:49:12 Node emhttpd: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin update fix.common.problems.plg
    Nov 28 11:49:12 Node root: plugin: creating: /boot/config/plugins/fix.common.problems/fix.common.problems-2020.11.28-x86_64-1.txz - downloading from URL https://raw.githubusercontent.com/Squidly271/fix.common.problems/master/archive/fix.common.problems-2020.11.28-x86_64-1.txz
    Nov 28 11:49:12 Node root: plugin: checking: /boot/config/plugins/fix.common.problems/fix.common.problems-2020.11.28-x86_64-1.txz - MD5
    Nov 28 11:49:12 Node root: plugin: running: /boot/config/plugins/fix.common.problems/fix.common.problems-2020.11.28-x86_64-1.txz
    Nov 28 11:49:13 Node root: plugin: running: anonymous
    Nov 28 12:49:00 Node kernel: veth375a643: renamed from eth0
    Nov 28 12:49:00 Node kernel: docker0: port 1(veth20db4d4) entered disabled state
    Nov 28 12:49:00 Node kernel: docker0: port 1(veth20db4d4) entered disabled state
    Nov 28 12:49:00 Node kernel: device veth20db4d4 left promiscuous mode
    Nov 28 12:49:00 Node kernel: docker0: port 1(veth20db4d4) entered disabled state
    Nov 28 13:51:25 Node kernel: mdcmd (47): spindown 4
    Nov 28 13:51:26 Node kernel: mdcmd (48): spindown 6
    Nov 28 13:51:26 Node kernel: mdcmd (49): spindown 7
    Nov 28 13:51:26 Node kernel: mdcmd (50): spindown 9
    Nov 28 13:51:52 Node kernel: mdcmd (51): set md_write_method 0
    Nov 28 13:51:52 Node kernel: 
    Nov 28 14:00:43 Node kernel: mdcmd (52): spindown 2
    Nov 28 14:01:36 Node kernel: mdcmd (53): spindown 1
    Nov 28 15:21:54 Node kernel: mdcmd (54): set md_write_method 1
    Nov 28 15:21:54 Node kernel: 
    Nov 28 17:25:38 Node kernel: mdcmd (55): spindown 1
    Nov 28 17:25:48 Node kernel: mdcmd (56): spindown 2
    Nov 28 17:25:50 Node kernel: mdcmd (57): spindown 9
    Nov 28 17:25:53 Node kernel: mdcmd (58): spindown 7
    Nov 28 17:25:56 Node kernel: mdcmd (59): set md_write_method 0
    Nov 28 17:25:56 Node kernel: 
    Nov 28 19:40:22 Node kernel: mdcmd (60): spindown 4
    Nov 28 19:52:48 Node kernel: mdcmd (61): spindown 8
    Nov 28 20:41:07 Node kernel: mdcmd (62): spindown 6
    Nov 28 21:27:50 Node kernel: mdcmd (63): spindown 9
    Nov 28 23:20:56 Node kernel: mdcmd (64): spindown 4
    Nov 28 23:27:19 Node kernel: mdcmd (65): spindown 5
    Nov 28 23:41:38 Node kernel: mdcmd (66): spindown 7
    Nov 29 00:00:01 Node Docker Auto Update: Community Applications Docker Autoupdate running
    Nov 29 00:00:01 Node Docker Auto Update: Checking for available updates
    Nov 29 00:00:02 Node Docker Auto Update: No updates will be installed
    Nov 29 00:15:03 Node kernel: mdcmd (67): spindown 8
    Nov 29 00:43:31 Node kernel: mdcmd (68): spindown 2
    Nov 29 01:00:01 Node root: Fix Common Problems Version 2020.11.28
    Nov 29 01:00:01 Node root: Fix Common Problems: Error: Unable to communicate with GitHub.com
    Nov 29 01:00:01 Node root: Fix Common Problems: Other Warning: Could not check for blacklisted plugins
    Nov 29 01:00:12 Node root: Fix Common Problems: Other Warning: Could not perform docker application port tests
    Nov 29 01:00:12 Node sSMTP[31740]: Unable to locate smtp.gmail.com
    Nov 29 01:00:12 Node sSMTP[31740]: Cannot open smtp.gmail.com:465
    Nov 29 01:01:11 Node kernel: mdcmd (69): set md_write_method 1

    Everything was great until 1AM when FCP wanted to run and was unable to resolve github.com again... I see nothing between those two events that explains my sudden loss of DNS or why this seems to be all of a sudden happening daily.

     

    I'd love some help in figuring this out as it brings my server and dockers to a grinding halt with so many parts depending on name resolution.

     

    EDIT: In the time it took me to draft this topic I have lost DNS resolution again. I'm at a loss as to what has suddenly changed to cause this...

     

    EDIT2: hmm it almost seems to be related to docker? I stopped the dockers and docker service and now its back up? I'm honestly just grasping at straws here though, I can't seem to find a rhyme or reason yet. Bringing docker back online doesn't seem to immediately break it.

     

    UPDATE: Well It's now been 48 hours with no issues...I still have no idea what caused this but it seems to have resolved itself....

  7. Is your AT&T router in bridge mode or otherwise allowing pfsense to handle IP addressing, routing, DNS, etc? Putting the AT&T device in bridge mode essentially turns it into a dumb modem, allowing pfsense to handle your network traffic and security.

     

    If not you may be double NATing yourself which may be your biggest problem here.

     

    I couldn't find home user instructions and am not familiar with AT&T provided routers but this should be a start:

    https://www.att.com/support/smallbusiness/article/smb-internet/KM1188700/

     

     

    Beyond that, I had on and off issues with remote access until I added:    server:private-domain: "plex.direct"   to the DNS resolver custom options box in PFSENSE.

     

    https://forums.plex.tv/t/secure-connections-lan-and-pfsense/123319/9

    ^I didn't find the NAT+Proxy setting in that thread necessary, mine is set to system default which for me is keeping NAT reflection disabled.


    That and making sure the port forward was setup properly was, IIRC, all I had to do to get plex working behind PFSense.

     

    As mfwade mentioned, if you use PFBlockerng and have GeoIP filtering on you may want to turn it off until you get plex working to eliminate it as a potential problem.

  8. 4 minutes ago, JorgeB said:

    For this last case yes, my issue was without a check running, a fan in a 5in3 cage just stopped.

    Yeah I would be interested in both your original use case (failed fan) and my fringe case (failed AC and parity check kick off).

     

    I have 4x 5in3 drive cages with separate fans in my second server and would definitely be interested in having the ability to stop the array or shut the system down if one of those failed and my drives started heating up real bad.

     

    To put my mind at ease a bit, when you had this happen to you @JorgeB did you notice the "cooked" drives failed at a higher rate than the others?

     

    I'm waiting for someone to get into the DC and check the AC before my system gets powered up so right now i'm just doing a lot of reading on overheated drives and possible issues I may encounter.

  9. 16 minutes ago, JorgeB said:

    That only helps if it's doing a parity check.

    Well in this case that would have helped me a bit. The drives were toasty but not overheating before the parity check kicked off. 

     

    I'll look into that plugin as I apparently can't trust people to remember to turn on the flippin air conditioner after a power outage.

     

    It would have saved me from several of my drives toasting themselves out of warranty coverage (the older Reds have a max op temp of 60C).

     

    Anyone have experience with RMA'ing drives that are overheated? is that something normally checked by WD? I'm just trying to educate myself for the future on how likely I'm going to be screwed by this little incident if I end up trying to RMA some of these down the road.

  10. I would be interested in seeing this come to fruition for scenarios where cooling is normally adequate but sudden failure leads to sky rocketing temps.

     

    I had a power outage where my main server is hosted, server stayed up on battery, but when the power came back the AC was not switched back on and a parity check kicked off.

     

    This led to my disks running at 60-62C for the whole night until I woke up and saw the 50+ alerts from UnRAID and shut the server down. Every single one of my disks reached 60C at one time or another.

     

    I'm thinking about stronger fans that can move more air as well but in a locked DC closet with limited airflow without the AC I think a shutdown would always be the safer scenario.

  11. 5 minutes ago, cowger said:

    Sorry for not being clear on this.  It's an Android T95 box for TV:

    https://www.amazon.com/gp/product/B0897QCBF7

     

    Ah so Android 10 then, not AndroidTV.

     

    See if the my Files app lets you specify you want to connect to the public share as a guest. The "Files" app on my Android 11 phone doesn't appear to support looking at my local network so I can't really test your specific scenario out.

     

    Let me know if you have trouble connecting with Kodi, I can try to help with it though I mostly use it strictly for the Plex add-on so I can watch offline when the internet goes down (Plex app on the Shield breaks without internet).

  12. Just now, JorgeB said:

    Don't known why it's reporting pending sectors, did it report they changed to 0 after?

     

    The disk does appear to be failing, though.

    Nope, I got two notifications from UnRAID, one about the read errors and a second about the 2 pending sectors. I went to check the SMART stats and saw the discrepancy, canceled the check and shut down the server to swap the disk with another.

     

    The diag file posted above is from before I shut down the server and after the alerts were generated.

     

    image.thumb.png.8a631680cd1f49ad830c79d0fed739e9.png

     

    I'll let the parity check finish and order a replacement disk. I missed my warranty window by about 6 months =(

  13. To be clear is this actually an AndroidTV box like the shield TV or is it an "Android box for TVs" like the ones you can find all over Amazon? I ask because some of these actually run AndroidTV while others just run Android and can look and act differently.

     

    The "My Files" app is not a part of AndroidTV that I'm aware of so I imagine you are using a set top box with Android installed.

     

    On my 2019 Shield TV I was able to use Kodi's file manager to access my UnRAID shares with credentials by passing the credentials as part of the SMB path call:

     

    image.thumb.png.3d1453d575e54db985aaa04ef4926eed.png

     

    Requiring credentials for public shares is unfortunately rather common when dealing with SMB shares in my experience. It is not an UnRAID issue specifically, certain OSes and apps seem to be able to handle it, others can't. I use File Commander on my shield to side load apps from my UnRAID public share and it works fine (the guest box is required, it doesn't like blank credentials):

    image.thumb.png.0dc2ac2a7c422c583a904b31c745129b.png

     

  14. UnRAID v6.8.3

     

    void-diagnostics-20200915-0707.zip

     

    My monthly parity check on my backup server has produced read errors on disk 1 for the last two months...

     

    The first time it happened there was no report of pending re-allocated sectors and the drive passed both a short and long smart self test so I wrote it off as a fluke and went on with my life.

     

    This morning it happened again within minutes of the parity check starting, this time with UnRAID claiming there are 2 pending sectors:

     

    image.png.c0ac7a0e3b911c087e07e0c45f85233c.png

    However when I go to look at the drive stats in the context menu SMART doesn't report any pending or reallocated sectors??

    image.thumb.png.ec4b88f317adea86829510e4a8deefd6.png

     

    I plan on moving the drive to a different slot and see if the error follows the disk or stays with the slot.

     

    Anyone ever seen UnRAID misreport pending sectors like that before? Is SMART just slow on the uptake?

     

    EDIT: Swapped disk with another slot, rerunning nocorrect check now.

     

    EDIT2: It appears to be following the disk, different slot, same disk with read errors. No reports of reallocated sectors this time by unraid, just read errors.

     

    Sep 15 07:25:06 VOID kernel: mdcmd (57): check nocorrect
    Sep 15 07:25:06 VOID kernel: md: recovery thread: check P ...
    Sep 15 07:25:24 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
    Sep 15 07:26:20 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog
    Sep 15 07:28:31 VOID ntpd[1859]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:28:52 VOID kernel: sd 10:0:0:0: [sdp] tag#3130 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    Sep 15 07:28:52 VOID kernel: sd 10:0:0:0: [sdp] tag#3130 Sense Key : 0x3 [current] 
    Sep 15 07:28:52 VOID kernel: sd 10:0:0:0: [sdp] tag#3130 ASC=0x11 ASCQ=0x0 
    Sep 15 07:28:52 VOID kernel: sd 10:0:0:0: [sdp] tag#3130 CDB: opcode=0x88 88 00 00 00 00 00 01 ba 94 b0 00 00 04 00 00 00
    Sep 15 07:28:52 VOID kernel: print_req_error: critical medium error, dev sdp, sector 29005968
    Sep 15 07:28:52 VOID kernel: md: disk1 read error, sector=29005904
    Sep 15 07:28:52 VOID kernel: md: disk1 read error, sector=29005912
    Sep 15 07:28:52 VOID kernel: md: disk1 read error, sector=29005920
    Sep 15 07:28:52 VOID kernel: md: disk1 read error, sector=29005928
    Sep 15 07:29:16 VOID kernel: sd 10:0:0:0: attempting task abort! scmd(00000000ee3221de)
    Sep 15 07:29:16 VOID kernel: sd 10:0:0:0: [sdp] tag#3104 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
    Sep 15 07:29:16 VOID kernel: scsi target10:0:0: handle(0x0009), sas_address(0x4433221104000000), phy(4)
    Sep 15 07:29:16 VOID kernel: scsi target10:0:0: enclosure logical id(0x5c81f660e69c9f00), slot(7) 
    Sep 15 07:29:17 VOID kernel: sd 10:0:0:0: task abort: SUCCESS scmd(00000000ee3221de)
    Sep 15 07:29:17 VOID kernel: sd 10:0:0:0: Power-on or device reset occurred
    Sep 15 07:29:22 VOID kernel: sd 10:0:0:0: Power-on or device reset occurred
    Sep 15 07:29:34 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:29:34 VOID kernel: mpt2sas_cm1: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    Sep 15 07:29:34 VOID kernel: sd 10:0:0:0: [sdp] tag#3105 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
    Sep 15 07:29:34 VOID kernel: sd 10:0:0:0: [sdp] tag#3105 Sense Key : 0x3 [current] 
    Sep 15 07:29:34 VOID kernel: sd 10:0:0:0: [sdp] tag#3105 ASC=0x11 ASCQ=0x0 
    Sep 15 07:29:34 VOID kernel: sd 10:0:0:0: [sdp] tag#3105 CDB: opcode=0x88 88 00 00 00 00 00 01 ba c8 b0 00 00 04 00 00 00
    Sep 15 07:29:34 VOID kernel: print_req_error: critical medium error, dev sdp, sector 29019160
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019096
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019104
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019112
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019120
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019128
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019136
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019144
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019152
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019160
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019168
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019176
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019184
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019192
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019200
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019208
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019216
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019224
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019232
    Sep 15 07:29:34 VOID kernel: md: disk1 read error, sector=29019240
    Sep 15 07:29:39 VOID rc.diskinfo[12312]: SIGHUP received, forcing refresh of disks info.
    Sep 15 07:30:57 VOID emhttpd: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog

     

    EDIT: "solved" per say. I know the drive is dying though I do find the ghost re-allocated sectors reported strange. I have a new one on order.

  15. 2 hours ago, CorneliousJD said:

    I had the same issue this morning, I actually just removed the container and re-added it and it's working again now.

     

    For what it's wroth, very much *not* a fan of the animated gif logo in my docker page either. ;) 

    If anyone else is not either, I changed

     

    FROM: 

    
    https://raw.githubusercontent.com/Organizr/docker-organizr/master/logo.gif

    TO: 

    
    https://raw.githubusercontent.com/causefx/Organizr/v2-master/plugins/images/organizr/logo-no-border.png

     

    Just came here to see why the update broke the container and what the deal was with the spinning logo. Also not a fan of the animated logo.

     

    Thanks for the quick fix.

  16. Is anyone else noticing that the plex docker is starting to get killed for Out of Memory errors during Plex's server maintenance window? 


    This morning alone Plex has been OOM reaped dozens of times, sometimes as often as every minute!! https://pastebin.com/8BcJnQ5J

     

    This is happening across two different severs with different memory amounts and client loads (both use LSIO Plex docker). The errors begin within minutes of the maintenance window beginning and never appear after the maintenance window has closed. I do have hard RAM limits set in the docker's config, however they have never approached or hit this limit before in normal operation so I'm hesitant to increase the limit (they are in place to prevent runaway RAM consumption by dockers).

     

    NODEFlix has an 8GB RAM limit set for the PMS docker

    NOTYOFLIX has a 6GB RAM limit set for the PMS docker

     

    image.thumb.png.d63c4f7f82291a2810925d17c31c85cf.pngimage.thumb.png.d1a23625cda7b3abf93b1998304d061b.png

     

    I haven't changed anything about either servers Plex settings or docker configurations recently. The only thing I can think of is Plex's new "Detect Intro" feature runs as a scheduled task so I have disabled it and will monitor to see if the errors return with that setting off. However I don't recall seeing this issue when that feature was introduced,  this problem just appeared a few days ago...

     

    Attached are both server diagnostic zips.

     

    node-diagnostics-20200705-0545.zipvoid-diagnostics-20200705-0545.zip

     

    I've posted about my issues over on the Plex forums since this thread is to massive for anything to be seen or addressed. https://forums.plex.tv/t/server-maintenance-leads-to-excessive-resource-utilization/611012/

  17. On 6/28/2020 at 5:54 PM, ljm42 said:

    I'm looking into this. To clarify, what browser on what device are you using when you do this?

     

    From my testing, it appears to be a browser issue. When I use press the Apply button using Chrome on Android, the tunnel stops but does not start back up. If I use Chrome on Windows it works fine. In either case, it does not matter whether I am connected via WireGuard or direct via wifi.

     

    Can you confirm that you see the same?

    For me it is somewhat different.

     

    it is 100% reproducible with Chrome (currently version # 83.0.4103.116 (64-bit)) on Windows 10 v1909, the tunnel goes down and stays down.

    I just tried with Chrome on my Android (Pixel 3A XL w/ Android 10) over LTE and it also brought down the tunnel and did not restart it. 

     

    The first set of stop/start is me on Windows 10 adding a test peer then logging in locally and re-enabling the tunnel.

    The second set was me logging in via my android and removing said peer, which also brought down the tunnel.

    image.png.e9e8a529e53bde44d54c5d1660802286.png

     

    Connected directly to the web interface via the LAN (not a VPN) I can make changes to the tunnel settings in Chrome on Windows 10 and the tunnel rolls without issue. The tunnel only stops and stays down when I'm managing Wireguard over a Wireguard connection.

     

    EDIT: Also happens in latest Firefox on Windows.

     

    EDIT2: I tried to manage wireguard over wireguard again this time using RDP  Windows to Windows machine that I then use to access the unraid webui. Management over the RDP connection tunneled through WireGuard successfully brought the tunnel down and back up. 


    So my problem seems to be any direct attempt to manage the wireguard server over a wireguard connection results in the tunnel going down and staying down. If I connect to another machine on the LAN over wireguard and use that machine to manage the wireguard server then it seems to go down and come back up gracefully.

     

     

  18. Is Wireguard supposed to just stop the tunnel and leave it stopped when adding a new peer or making any changes at all really?

     

    I setup wireguard remote access to LAN for my phone and PC no problem super easy as advertised.

     

    I'm connected over wireguard managing unraid and I go to add a peer, hit apply and the unraid webui stops working because the tunnel has been stopped:

     

    Jun 25 09:41:05 Node wireguard: Tunnel WireGuard-wg0 stopped

    Jun 25 09:43:20 Node webGUI: Successful login user root from xxx.xxx.xxx.xxx

    Jun 25 09:43:24 Node wireguard: Tunnel WireGuard-wg0 started

     

    Thankfully I have other remote access methods to this server so I was able to go in and restart the tunnel but I don't see how this could be by design...shouldn't it be able to gracefully roll the connection?

     

    I'll make a new thread if this is unexpected behavior where troubleshooting can be done.

     

    EDIT: just got kicked again just trying to change the connection type for a peer that isn't even in use currently. It just stopped the tunnel and left it off...

     

    EDIT2: it sounds like depending on how peers are added active session interruption could be avoided: https://manpages.debian.org/unstable/wireguard-tools/wg.8.en.html#COMMANDS:~:text=syncconf

     

    EDIT3: I am just so utterly lost on how to make my main server talk to my backup server directly over wireguard. I currently have SSH and rsync running a monthly backup of my data, I would like to stop leaving SSH open to the net but I can't get server to server or remote access to server to work to save my life.

     

    I followed your "rough instructions" of setup server to server on one and import on the other but now i have a second tunnel I don't really want. Do I have to have a second tunnel for this to work? Can I not just add the server as a peer to my existing tunnel with my phone and home PC?

     

    I got server to server working, still not sure if a second port forward and tunnel was required or not but at least the Chinese will stop spamming my logs with SSH brute force attempts (Key based auth only so it is more aesthetic than a real security concern).

  19. 21 minutes ago, johnnie.black said:

    Not a bad idea to run a scrub to check for more corruption, but since it's a single device pool it can only detect corruption, not fix it.

    Yeah, I ran a second scrub after deleting the corrupted file and it reports no further errors:

     

    
    UUID:             cc9f1614-fc5d-406a-8ee7-58a5651dc9ae
    Scrub started:    Thu May 21 07:58:40 2020
    Status:           finished
    Duration:         0:02:48
    Total to scrub:   75.17GiB
    Rate:             458.17MiB/s
    Error summary:    no errors found

    Thanks for reminding me about not being able to repair without a pool, i forgot that was the case.

  20. 2 minutes ago, johnnie.black said:

    It means that block doesn't have the checksum it should have, i.e., data is corrupt, you can fix it by deleting the file or overwriting it.

     

    Most likely cause would be the SSD, there's a bad block, that was reallocated:

    
    183 Runtime_Bad_Block       PO--C-   099   099   010    -    1

     

    The SSD firmware shouldn't reallocate a block containing data without being able to write it correctly to another place, but it's known to happen, it happened to me a few years ago with a Sandisk SSD, another option would be a one time RAM bit flip, if it was bad RAM in general it would likely cause more issues.

     

     

    Ok cool so I don't necessarily need to do a scrub repair? Neat I'll just delete the file then.

     

    Thanks for the reassurance 😃

  21. Unraid v6.8.3

     

    void-diagnostics-20200521-0651.zip <--- diagnostics before any troubleshooting.

     

    I'm receiving the following error from my cache drive, it is always the same inode #:

     

    BTRFS warning (device sdg1): csum failed root 5 ino 156381873 off 143360 csum 0xf58f6015 expected csum 0xf58f6055 mirror 1

    I ran a find on the inode # and it is an Emby poster:

     

    find /mnt/cache -inum 156381873
    /mnt/cache/appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg

     

    I just finished a scrub:

    May 21 07:08:25 VOID ool www[20615]: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r'
    
    May 21 07:10:32 VOID kernel: BTRFS warning (device sdg1): checksum error at logical 1627265449984 on dev /dev/sdg1, physical 57454903296, root 5, inode 156381873, offset 143360, length 4096, links 1 (path: appdata/EmbyServer/data/collections/Toy Story Collection [boxset]/poster.jpg)
    
    May 21 07:10:32 VOID kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
    
    
    UUID:             cc9f1614-fc5d-406a-8ee7-58a5651dc9ae
    Scrub started:    Thu May 21 07:08:25 2020
    Status:           finished
    Duration:         0:03:08
    Total to scrub:   75.15GiB
    Rate:             409.40MiB/s
    Error summary:    csum=1
      Corrected:      0
      Uncorrectable:  0
      Unverified:     0


    Should I attempt to repair the corrupted block with BTRFS Scrub? Or should I just delete the affected file and let it be regenerated?

     

    I plan on running a memtest later to ensure it isn't bad RAM, though i think if it was bad RAM i would have more than just one bad file after this error going on for over a week.

     

    I appear to have at least one pending allocated sector a reserve block used according to SMART for the SSD: void-smart-20200521-0711.zip

     

    I wanted to check and see how much data I have written to this cache drive so I found a calculator, this seems wildly out of bounds for a 3 1/2 year old SSD, there is no way I have written 300TB through this drive.

    image.thumb.png.05bf14a9a65d039c340e1ae79b52437a.png

     

    BTW, why aren't warnings like this picked up by FCP (Fix Common Problems)? It would be nice to have BTRFS errors reported (a notification generated) for those with cache devices.

     

    • Like 1
×
×
  • Create New...