Jump to content

weirdcrap

Members
  • Posts

    460
  • Joined

  • Last visited

Posts posted by weirdcrap

  1. @dmaciasThe latest version of unRAR supplied is subject to a directory traversal vulnerability:

     

    https://nvd.nist.gov/vuln/detail/CVE-2022-30333#vulnCurrentDescriptionTitle

     

    I found a slackware package for 6.1.7 (needs to be >= 6.1.2) so assuming this is usable in UnRAID it should be a simple swap? *SEE EDIT BELOW

    https://slackware.pkgs.org/current/slackers/unrar-6.1.7-x86_64-1cf.txz.html

     

     

     

    For those of you who don't want to wait for the plugin to be updated can place your packages in /boot/extra for install.

     

    This is unsupported, not recommended by FCP (it will produce a warning on scan), and may break your system but in my testing with the few packages I use it works fine for the time being.

     

    EDIT: I totally misread the CVE version #. It's 6.12 or later not 6.1.2 . So the version I linked above, while newer than what's in NerdPack, still doesn't address that CVE. I can't seem to find a 6.12 version for slackware anywhere? Can anyone else?

    • Like 1
  2. It looks like way back in May python invoked the out of memory killer and it killed QEMU.

     

    May  7 21:05:54 Tower kernel: python3 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
    
    ....
    
    May  7 21:05:54 Tower kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=3ea9e18112ed2dd06031f35c6351a2de56ee7b8fb2ce9631019b9f7ca51e3881,mems_allowed=0,global_oom,task_memcg=/machine/qemu-1-V-DLC.libvirt-qemu,task=qemu-system-x86,pid=15497,uid=0
    May  7 21:05:54 Tower kernel: Out of memory: Killed process 15497 (qemu-system-x86) total-vm:17946876kB, anon-rss:17028860kB, file-rss:72kB, shmem-rss:23104kB, UID:0 pgtables:35208kB oom_score_adj:0
    May  7 21:05:54 Tower kernel: oom_reaper: reaped process 15497 (qemu-system-x86), now anon-rss:16kB, file-rss:68kB, shmem-rss:4kB

     

    I only see the one OOM kill so it hasn't happened again in several months so it is probably safe to simply ignore it unless it happens again.

     

    You appear to have some VMs running, assigned about 24GB of RAM if I'm reading your XMLs correctly. That plus the dockers listed in your sig you may be pushing it a little to close on your RAM usage sometimes as the underlying OS needs some RAM for itself.

     

    You could try setting artificial limits for the dockers (using --memory=xG in the extra parameters for each docker where X is the limit in GB) and reducing the assigned RAM on the VMs a gigabyte or two at a time until the errors stop for good.

     

    But again if this only happened the one time and things seemed to be running OK otherwise I'd probably just leave it be.

  3. Built a new system with this board and UnRAID reports a max install-able RAM capacity of 64GB. But everything I can find (MSI website, newegg, etc) suggests it should support up to 128GB.

     

    image.png.896f1cbd5f7278db7e3dc21bf68195eb.png

     

    I don't plan on installing more than 64GB of RAM in this anytime soon if ever, but I'm still curious if UnRAID is detecting this incorrectly or if the advertised RAM capacity is wrong...

     

    Anyone got this board and actually have it maxed out with RAM?

    node-diagnostics-20220731-2129.zip

  4. On 7/20/2022 at 4:36 PM, Torben said:

    Me too. And even NFS speed over wireguard got worth running at about 60% of the speed it was before. So I tried to find a workaround and installed an Ubuntu VM with the folders mounted via mount-tags on the receiving unRAID server, which I found out causes the problem (even a fresh installed test server on different hardware on 6.8.3 works, same server on 6.9+ shows the problem). So far it's working great for a couple of weeks now. In the beginning I wanted to do some more investigating on why it works in the VM - or if I can replicate the problem in the VM -, since I think having a VM just for doing what the host was/should be capable of is "a bit" unnecessary, but well, a lot of time spent and no real idea left where to continue.

     

    I'm curious what happens when you have new hardware.

    I may have to go the VM route as well. The new hardware made no difference in the file transfer speeds. Not that I honestly expected it to.😥

  5. On 7/26/2022 at 10:46 AM, crzynik said:

    Running Unraid RC 6.11.0-RC1 and both nerdpack and dev tools seem to load forever.  Also the dynamix temp plugin complains of no perl installed which tells me the nerdpack isn't being initialized properly perhaps? Happy to provide logs or whatever is needed.

     

    Error I see in Unraid log: `root: Warning: preg_grep() expects parameter 2 to be array, string given in /usr/local/emhttp/plugins/NerdPack/scripts/packagemanager on line 79`

    Yeah this appears to be broken and any previously installed tools from it are missing...

     

    My whole *arr setup relies on SSHFS and it appears to be missing now.

     

    Interestingly I have no errors in my logs but trying to load the NerdPack interface just loads infinitely and as I mentioned above my previously installed packages are missing.

  6. @Quick_FOXmy resolv.conf is intact with docker running.

     

    @SquidI can give the RC a try. It's difficult for me to diagnose as it seems to happen randomly. Maybe once a month or so.

     

    I will also disable NETBIOS as suggested.

     

    EDIT: Waiting on trying the RC until NerdPack gets updated as I need several of the tools it offers.

  7. 57 minutes ago, Spike87 said:

    Hello,

     

    i have nearly the same problem. From time to time most of my UnRAID Server is not able to connect to the internet.

    SSH into UnRAID and ping or nslookup a WAN IP/DNS Address is successful, but not checking for plugin/docker updates, no access to Home Assistant Docker in Host Mode.

    Restarting the Docker Service resolves the problem.

     

    Any Ideas?

    nas-diagnostics-20220725-2108.zip 143.25 kB · 0 downloads

    Unfortunately no, I'm at a loss with my own issue already lol. I'd take a look at what has already been suggested in this thread to see if you can narrow your issue down further.

     

    So when you lose connectivity your still able to resolve hostnames via ping? I'm not able to resolve any hostnames at all EXCEPT with NSLookup. It seems to be the only thing still capable of resolving domain names when I run into whatever is causing this issue...

     

    This doesn't seem to be a common problem, I've only found a few threads on it that sound similar enough to my issue and unfortunately none of them have yielded any further clues as to what's wrong.

     

    This server is getting rebuilt with all new hardware next month so I'm trying to just keep it coasting until then, see if the new hardware magically resolves any of the issues I've been having.

  8. Just lost DNS again. Still no idea why NSLookup works but things like ping, checking for plugin/docker updates, etc all fail.

     

    Interestingly, i regained DNS functionality without having to restart or do anything other than wait this time. Very strange...

  9. 19 hours ago, Vr2Io said:

     

    Do you success ping the gateway ( router ) and internet by IP ?

    When this happens I can reach the gateway and internet, but by IP only. I can SSH from the affected unraid server into the gateway by IP, and I can ping outside servers like 8.8.8.8. For all intents and purposes the internet is working, just not name resolution

    Quote

    Suspect it is your DNS sever problem, may be try use public DNS server for troubleshoot

    UnRAID was set to use only external DNS servers (8.8.8.8, 1.1.1.1, 9.9.9.9) when this started occurring. I've tried switching to my router's DNS server for troubleshooting but doing so made no difference. 

     

    I looked at that link you've posted but I don't really know how UnRAID's DNS system works. Does nslookup use a different lookup method vs ping in UnRAID? That might explain why nslookup works while a simple ping to an external domain does not.

  10. Alright so this finally happened again. Now on v6.10.1


    I ran nslookup against google.com and it returned a proper answer from my LAN router as well as 8.8.8.8:

    root@Node:~# ping google.com
    ping: google.com: Name or service not known
    root@Node:~# nslookup google.com
    Server:         192.168.20.254
    Address:        192.168.20.254#53
    
    Non-authoritative answer:
    Name:   google.com
    Address: 172.217.4.206
    Name:   google.com
    Address: 2607:f8b0:4009:806::200e
    
    
    
    root@Node:~# nslookup google.com 8.8.8.8
    Server:         8.8.8.8
    Address:        8.8.8.8#53
    
    Non-authoritative answer:
    Name:   google.com
    Address: 172.217.0.174
    Name:   google.com
    Address: 2607:f8b0:4009:808::200e
    
    root@Node:~# nslookup google.com 192.168.20.254
    Server:         192.168.20.254
    Address:        192.168.20.254#53
    
    Non-authoritative answer:
    Name:   google.com
    Address: 172.217.4.206
    Name:   google.com
    Address: 2607:f8b0:4009:806::200e
    
    root@Node:~# 

     

    So then wtf is causing this?

     

    Sonarr/Radarr can't reach any of my external resources, I can't ping domain names from the terminal, nothing DNS related seems to work yet nslookup seems to suggest DNS is fine?

  11. On 5/6/2022 at 4:25 AM, Vr2Io said:

     

    You can use nslookup to troubleshoot

     

    i.e. nslookup www.google.com x.x.x.x

     

    x.x.x.x can be actual IP of DNS server or relay ( router, private DNS etc )

    Duh why didn't I think of nslookup. I was wanting to use dig but the BIND package seems to have disappeared from NerdPack. I was also hoping for a way to check the literal status of the service, something equivalent to systemctl status ServiceName (I'm a debian guy mostly).

     

    I haven't lost DNS again so far. I also switched my first DNS server to my router rather than an external server. Saw some threads here about DNS issues being resolved doing that so I figured why not.

  12. Unraid v6.10-rc5

     

    Diagnostics are from a fresh boot so my previous syslog is also attached:

    node-diagnostics-20220505-0737.zip

    syslog-20220505-073254.txt

     

    This issue is not new for me in RC5, it's happened before but its been years since it last occurred. It's now happened twice in the last week. DNS is statically configured to use Google, Cloudflare, & Quad9.

     

    Out of the blue my server will lose the ability to resolve any host names at all. I don't realize it's happened until I log into Radarr/sonarr and see errors about all my indexers, download clients, literally anything requiring a DNS lookup being unreachable.

     

    I console in and confirm that I can't ping any hostname with a "name or service unknown" error. Pinging internal and external IP addresses works fine. I don't see anything in the syslog to indicate what the issue might be.

     

    I can usually fix it with a restart but its annoying to have to reboot the entire server just to get DNS back up. I've tried just stopping the array, shuffling the DNS servers, and hitting apply in hopes of reviving DNS but it generally doesn't work.

     

    Is there a way to roll the UnRAID DNS service without having to reboot the entire machine? Or at least a way to verify that the DNS resolver is still running when this issue occurs?

    If the DNS service is up and running in UnRAID I may have to investigate issues on the LAN but no other client seems to experience this issue except for my server.

     

    Besides the loss of DNS networking seems entirely unaffected. My Wireguard VPN continues to work, sonarr and radarr are still available remotely, I can access shares and the webui.

  13. Having an issue with the New Apps page of CA. Every time I open it I get "An error occurred. Could not find any New Apps".

     

    The individual category pages work correctly, I can search for and install apps, I just can't get the new apps page to load. Not a huge deal since it otherwise works but I'm curious what's wrong.

     

    I've restarted the server and everything else (dockers, webui, etc) seems to work fine. My other server with the same config and CA version works fine.

     

    I enabled debug logging for CA and it is attached along with my server diagnostic.

     

    node-diagnostics-20220306-0614.zipCA-Logging-20220306-0608.zip

  14. On 10/20/2021 at 4:03 PM, Phoenix26 said:

    Anyone else getting an error when trying to use iotop?

     

    iotop
    Traceback (most recent call last):
      File "/usr/sbin/iotop", line 17, in <module>
        main()
      File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 620, in main
        main_loop()
      File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 610, in <lambda>
        main_loop = lambda: run_iotop(options)
      File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 508, in run_iotop
        return curses.wrapper(run_iotop_window, options)
      File "/usr/lib64/python2.7/curses/wrapper.py", line 22, in wrapper
        stdscr = curses.initscr()
      File "/usr/lib64/python2.7/curses/__init__.py", line 33, in initscr
        fd=_sys.__stdout__.fileno())
    _curses.error: setupterm: could not find terminal

     

    Tried re-installed it but made no difference.

    Unraid V6.9.2

    I'm on 6.10 RC2 and iotop is working fine for me.

  15. So as a follow up to my post here: 

    around the same time I started having those parity errors the server started having kernel panics that would lock up the server, producing no logs.

     

    I initially wrote this off as being related to the RAM issue, but its been two weeks of normal functionality since replacing the RAM and now the server again just dropped offline. I'm not seeing parity mismatches like before so I don't think this is the same issue. The monitor wasn't on when it went down so I have no clue what happened.

     

    I've had the external rsyslog server option turned on in UnRAID for months but every single time the server hard locks, it locks up so thoroughly nothing useful ever gets logged. It's just whatever the last normal log entry was then the server startup again. Is there any other way to better capture what's happening? I'd like to avoid mirroring logs to the flash drive and given how I never get anything to my log server on the LAN I doubt mirroring locally would be any more productive...

     

    2021-10-10T13:27:50-05:00 Node sshd[28719]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
    2021-10-10T13:27:50-05:00 Node sshd[28719]: Starting session: command for root from xx.xx.xx.xx port 22016 id 0
    2021-10-10T13:34:30-05:00 Node sshd[28719]: Close session: user root from xx.xx.xx.xx port 22016 id 0
    2021-10-10T13:34:30-05:00 Node sshd[28719]: Received disconnect from xx.xx.xx.xx port 22016:11: disconnected by user
    2021-10-10T13:34:30-05:00 Node sshd[28719]: Disconnected from user root xx.xx.xx.xx port 22016
    2021-10-10T13:34:30-05:00 Node sshd[28719]: pam_unix(sshd:session): session closed for user root
    2021-10-10T13:34:30-05:00 Node sshd[3724]: Connection from xx.xx.xx.xx port 9719 on xx.xx.xx.xx port 22 rdomain ""
    2021-10-10T13:34:31-05:00 Node sshd[3724]: Accepted key RSA  found at /etc/ssh/root.pubkeys:1
    2021-10-10T13:34:31-05:00 Node sshd[3724]: Accepted publickey for root from xx.xx.xx.xx port 9719 ssh2: RSA SHA256
    2021-10-10T13:34:31-05:00 Node sshd[3724]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
    2021-10-10T13:34:31-05:00 Node sshd[3724]: Starting session: command for root from xx.xx.xx.xx port 9719 id 0
    2021-10-10T20:05:10-05:00 Node rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="8183" x-info="https://www.rsyslog.com"] start
    2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/enhanced.log.cfg already exists
    2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/custom_syslog.conf already exists
    2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/syslog_filter.conf already exists
    2021-10-10T20:05:10-05:00 Node root: plugin: running: anonymous
    2021-10-10T20:05:10-05:00 Node root: 
    2021-10-10T20:05:10-05:00 Node root: -----------------------------------------------------------
    2021-10-10T20:05:10-05:00 Node root:  Plugin enhanced.log is installed.
    2021-10-10T20:05:10-05:00 Node root:  Copyright 2015-2021, Dan Landon
    2021-10-10T20:05:10-05:00 Node root:  Version: 2021.08.21
    2021-10-10T20:05:10-05:00 Node root: -----------------------------------------------------------
    2021-10-10T20:05:10-05:00 Node root: 
    2021-10-10T20:05:10-05:00 Node root: plugin: enhanced.log.plg installed
    2021-10-10T20:05:10-05:00 Node root: plugin: installing: /boot/config/plugins/fix.common.problems.plg
    2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz already exists
    2021-10-10T20:05:10-05:00 Node root: plugin: running: /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz
    2021-10-10T20:05:10-05:00 Node root: 
    2021-10-10T20:05:10-05:00 Node root: +==============================================================================
    2021-10-10T20:05:10-05:00 Node root: | Installing new package /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz
    2021-10-10T20:05:10-05:00 Node root: +==============================================================================
    2021-10-10T20:05:10-05:00 Node root: 

     

    I've asked that the monitor be left on so I can hopefully get some sort of a clue as to wtf is going on as I don't know what else to do besides just start replacing hardware. I guess I could upgrade to 6.10-RC1 but it was stable on 6.9.2 for months before this so I don't think it has anything to do with the OS version.

    node-diagnostics-20211010-2007.zip

     

    EDIT: I have this photo of a kernel panic from when I was still having the RAM issue, but its the only clue I've got right now.

    Untitled.thumb.jpg.c8c18ef6647904b465c6b874ff526324.jpg

  16. 6 hours ago, ChatNoir said:

    65536 is exactly 2^16.

     

    That is odd that you get that value from a notification and nothing from the GUI.

    You might want to do an extended SMART test on that drive after the parity check.

    I ran a short SMART yesterday but i'll turn off drive sleep and run an extended now that the parity check is finished.

     

    5 hours ago, JorgeB said:

    Those are usually due to firmware issues, value changed and went back to 0, that should be safe to ignore.

    That's good to hear. I know it isn't one of the default monitored SMART attributes-I assume for reasons like this-but I had enabled it after coming across a recommendation in another thread when I was troubleshooting some disk issues that its useful for drives from certain vendors.

     

    I've never seen a WD drive with anything but zero for that attribute but I have seagate drives in my other server that all report a very high number for this attribute so I don't monitor it on VOID.

     

    EDIT: Oh and the correcting check completed with zero errors! Thanks Turl and JorgeB for helping me figure out it was the RAM.I think this is the first time I've ever had a computer issue and it was actually a bad stick of RAM.

    • Like 1
  17. It seems like the RAM has done the trick, its ~75% through a correcting check with no errors and it hasn't hard locked or crashed yet.

     

    However I just got a random notification from the server letting me know my RAW Read error rate on disk 1  is some ridiculous number

    28-09-2021 05:27 PM	Unraid Disk 1 SMART health [1]	Warning [NODE] - raw read error rate is 65536	WDC_WD80EFAX-68KNBN0_VAJBBYUL (sdd)

    Which is odd because when I go to check the smart stats in the GUI it says my raw read error rate is zero???

  18. @JorgeBIt finished the non-correcting check, 73 errors.

     

    within a few hours of starting a new correcting check with the second set of RAM it has again hard locked and the server is unresponsive.

     

    So at this point I've tried two correcting checks and it kernel panics each time with this RAM. Should I assume it's bad and replace?

     

    I find it interesting that it only happens during the correcting check.

     

    I had no problems with the first set of RAM.

  19. 14 hours ago, JorgeB said:

    yep.

    Well it hard locked and crashed within a few hours of the second set being installed and a parrot check started. 

     

    I've got someone going over today to power it off and back on. If it continues to be unstable with this set of DIMMS I wager a replacement is in order?

     

    EDIT: Unclean shutdown. I'm letting it run its non-correcting check, its already found 73 errors. It was in the middle of importing a bunch of stuff from sonarr, but it was all going to the cache drive (mover doesnt run till 3AM) so that wouldn't be the cause of these new parity errors right? 

     

    After the non-correcting check is finished should I continue with the correcting checks?

×
×
  • Create New...