weirdcrap
-
Posts
460 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by weirdcrap
-
-
It looks like way back in May python invoked the out of memory killer and it killed QEMU.
May 7 21:05:54 Tower kernel: python3 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 .... May 7 21:05:54 Tower kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=3ea9e18112ed2dd06031f35c6351a2de56ee7b8fb2ce9631019b9f7ca51e3881,mems_allowed=0,global_oom,task_memcg=/machine/qemu-1-V-DLC.libvirt-qemu,task=qemu-system-x86,pid=15497,uid=0 May 7 21:05:54 Tower kernel: Out of memory: Killed process 15497 (qemu-system-x86) total-vm:17946876kB, anon-rss:17028860kB, file-rss:72kB, shmem-rss:23104kB, UID:0 pgtables:35208kB oom_score_adj:0 May 7 21:05:54 Tower kernel: oom_reaper: reaped process 15497 (qemu-system-x86), now anon-rss:16kB, file-rss:68kB, shmem-rss:4kB
I only see the one OOM kill so it hasn't happened again in several months so it is probably safe to simply ignore it unless it happens again.
You appear to have some VMs running, assigned about 24GB of RAM if I'm reading your XMLs correctly. That plus the dockers listed in your sig you may be pushing it a little to close on your RAM usage sometimes as the underlying OS needs some RAM for itself.
You could try setting artificial limits for the dockers (using --memory=xG in the extra parameters for each docker where X is the limit in GB) and reducing the assigned RAM on the VMs a gigabyte or two at a time until the errors stop for good.
But again if this only happened the one time and things seemed to be running OK otherwise I'd probably just leave it be.
-
Built a new system with this board and UnRAID reports a max install-able RAM capacity of 64GB. But everything I can find (MSI website, newegg, etc) suggests it should support up to 128GB.
I don't plan on installing more than 64GB of RAM in this anytime soon if ever, but I'm still curious if UnRAID is detecting this incorrectly or if the advertised RAM capacity is wrong...
Anyone got this board and actually have it maxed out with RAM?
-
On 7/20/2022 at 4:36 PM, Torben said:
Me too. And even NFS speed over wireguard got worth running at about 60% of the speed it was before. So I tried to find a workaround and installed an Ubuntu VM with the folders mounted via mount-tags on the receiving unRAID server, which I found out causes the problem (even a fresh installed test server on different hardware on 6.8.3 works, same server on 6.9+ shows the problem). So far it's working great for a couple of weeks now. In the beginning I wanted to do some more investigating on why it works in the VM - or if I can replicate the problem in the VM -, since I think having a VM just for doing what the host was/should be capable of is "a bit" unnecessary, but well, a lot of time spent and no real idea left where to continue.
I'm curious what happens when you have new hardware.
I may have to go the VM route as well. The new hardware made no difference in the file transfer speeds. Not that I honestly expected it to.😥
-
On 7/26/2022 at 10:46 AM, crzynik said:
Running Unraid RC 6.11.0-RC1 and both nerdpack and dev tools seem to load forever. Also the dynamix temp plugin complains of no perl installed which tells me the nerdpack isn't being initialized properly perhaps? Happy to provide logs or whatever is needed.
Error I see in Unraid log: `root: Warning: preg_grep() expects parameter 2 to be array, string given in /usr/local/emhttp/plugins/NerdPack/scripts/packagemanager on line 79`
Yeah this appears to be broken and any previously installed tools from it are missing...
My whole *arr setup relies on SSHFS and it appears to be missing now.
Interestingly I have no errors in my logs but trying to load the NerdPack interface just loads infinitely and as I mentioned above my previously installed packages are missing.
-
@Quick_FOXmy resolv.conf is intact with docker running.
@SquidI can give the RC a try. It's difficult for me to diagnose as it seems to happen randomly. Maybe once a month or so.
I will also disable NETBIOS as suggested.
EDIT: Waiting on trying the RC until NerdPack gets updated as I need several of the tools it offers.
-
57 minutes ago, Spike87 said:
Hello,
i have nearly the same problem. From time to time most of my UnRAID Server is not able to connect to the internet.
SSH into UnRAID and ping or nslookup a WAN IP/DNS Address is successful, but not checking for plugin/docker updates, no access to Home Assistant Docker in Host Mode.
Restarting the Docker Service resolves the problem.
Any Ideas?
Unfortunately no, I'm at a loss with my own issue already lol. I'd take a look at what has already been suggested in this thread to see if you can narrow your issue down further.
So when you lose connectivity your still able to resolve hostnames via ping? I'm not able to resolve any hostnames at all EXCEPT with NSLookup. It seems to be the only thing still capable of resolving domain names when I run into whatever is causing this issue...
This doesn't seem to be a common problem, I've only found a few threads on it that sound similar enough to my issue and unfortunately none of them have yielded any further clues as to what's wrong.
This server is getting rebuilt with all new hardware next month so I'm trying to just keep it coasting until then, see if the new hardware magically resolves any of the issues I've been having.
-
Just lost DNS again. Still no idea why NSLookup works but things like ping, checking for plugin/docker updates, etc all fail.
Interestingly, i regained DNS functionality without having to restart or do anything other than wait this time. Very strange...
-
Still having the same problems with this. I'm going to rebuild the server from scratch with all new hardware and we'll see if that makes any difference. I'm not holding out any hope though.
-
19 hours ago, Vr2Io said:
Do you success ping the gateway ( router ) and internet by IP ?
When this happens I can reach the gateway and internet, but by IP only. I can SSH from the affected unraid server into the gateway by IP, and I can ping outside servers like 8.8.8.8. For all intents and purposes the internet is working, just not name resolution
QuoteSuspect it is your DNS sever problem, may be try use public DNS server for troubleshoot
UnRAID was set to use only external DNS servers (8.8.8.8, 1.1.1.1, 9.9.9.9) when this started occurring. I've tried switching to my router's DNS server for troubleshooting but doing so made no difference.
I looked at that link you've posted but I don't really know how UnRAID's DNS system works. Does nslookup use a different lookup method vs ping in UnRAID? That might explain why nslookup works while a simple ping to an external domain does not.
-
Bump. This continues to happen. Ping will tell me name or service not known but nslookup returns responses so I guess DNS is working but then what is happening to my ability to resolve names?
-
Alright so this finally happened again. Now on v6.10.1
I ran nslookup against google.com and it returned a proper answer from my LAN router as well as 8.8.8.8:root@Node:~# ping google.com ping: google.com: Name or service not known root@Node:~# nslookup google.com Server: 192.168.20.254 Address: 192.168.20.254#53 Non-authoritative answer: Name: google.com Address: 172.217.4.206 Name: google.com Address: 2607:f8b0:4009:806::200e root@Node:~# nslookup google.com 8.8.8.8 Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: Name: google.com Address: 172.217.0.174 Name: google.com Address: 2607:f8b0:4009:808::200e root@Node:~# nslookup google.com 192.168.20.254 Server: 192.168.20.254 Address: 192.168.20.254#53 Non-authoritative answer: Name: google.com Address: 172.217.4.206 Name: google.com Address: 2607:f8b0:4009:806::200e root@Node:~#
So then wtf is causing this?
Sonarr/Radarr can't reach any of my external resources, I can't ping domain names from the terminal, nothing DNS related seems to work yet nslookup seems to suggest DNS is fine?
-
On 5/6/2022 at 4:25 AM, Vr2Io said:
You can use nslookup to troubleshoot
i.e. nslookup www.google.com x.x.x.x
x.x.x.x can be actual IP of DNS server or relay ( router, private DNS etc )
Duh why didn't I think of nslookup. I was wanting to use dig but the BIND package seems to have disappeared from NerdPack. I was also hoping for a way to check the literal status of the service, something equivalent to systemctl status ServiceName (I'm a debian guy mostly).
I haven't lost DNS again so far. I also switched my first DNS server to my router rather than an external server. Saw some threads here about DNS issues being resolved doing that so I figured why not.
-
What happened to the bind package? Am I just not seeing it? Or is it because I'm on RC5?
-
Unraid v6.10-rc5
Diagnostics are from a fresh boot so my previous syslog is also attached:
node-diagnostics-20220505-0737.zip
This issue is not new for me in RC5, it's happened before but its been years since it last occurred. It's now happened twice in the last week. DNS is statically configured to use Google, Cloudflare, & Quad9.
Out of the blue my server will lose the ability to resolve any host names at all. I don't realize it's happened until I log into Radarr/sonarr and see errors about all my indexers, download clients, literally anything requiring a DNS lookup being unreachable.
I console in and confirm that I can't ping any hostname with a "name or service unknown" error. Pinging internal and external IP addresses works fine. I don't see anything in the syslog to indicate what the issue might be.
I can usually fix it with a restart but its annoying to have to reboot the entire server just to get DNS back up. I've tried just stopping the array, shuffling the DNS servers, and hitting apply in hopes of reviving DNS but it generally doesn't work.
Is there a way to roll the UnRAID DNS service without having to reboot the entire machine? Or at least a way to verify that the DNS resolver is still running when this issue occurs?
If the DNS service is up and running in UnRAID I may have to investigate issues on the LAN but no other client seems to experience this issue except for my server.Besides the loss of DNS networking seems entirely unaffected. My Wireguard VPN continues to work, sonarr and radarr are still available remotely, I can access shares and the webui.
-
11 minutes ago, Squid said:
Not quite sure.
Try this
rm -rf /tmp/community.applications rm /boot/config/plugins/community.applications/community.applications.cfg
And reload the apps tab
That seems to have fixed it, thanks!
-
Having an issue with the New Apps page of CA. Every time I open it I get "An error occurred. Could not find any New Apps".
The individual category pages work correctly, I can search for and install apps, I just can't get the new apps page to load. Not a huge deal since it otherwise works but I'm curious what's wrong.
I've restarted the server and everything else (dockers, webui, etc) seems to work fine. My other server with the same config and CA version works fine.
I enabled debug logging for CA and it is attached along with my server diagnostic.
node-diagnostics-20220306-0614.zipCA-Logging-20220306-0608.zip
-
On 10/20/2021 at 4:03 PM, Phoenix26 said:
Anyone else getting an error when trying to use iotop?
iotop Traceback (most recent call last): File "/usr/sbin/iotop", line 17, in <module> main() File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 620, in main main_loop() File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 610, in <lambda> main_loop = lambda: run_iotop(options) File "/usr/lib64/python2.7/site-packages/iotop/ui.py", line 508, in run_iotop return curses.wrapper(run_iotop_window, options) File "/usr/lib64/python2.7/curses/wrapper.py", line 22, in wrapper stdscr = curses.initscr() File "/usr/lib64/python2.7/curses/__init__.py", line 33, in initscr fd=_sys.__stdout__.fileno()) _curses.error: setupterm: could not find terminal
Tried re-installed it but made no difference.
Unraid V6.9.2I'm on 6.10 RC2 and iotop is working fine for me.
-
So as a follow up to my post here:
around the same time I started having those parity errors the server started having kernel panics that would lock up the server, producing no logs.
I initially wrote this off as being related to the RAM issue, but its been two weeks of normal functionality since replacing the RAM and now the server again just dropped offline. I'm not seeing parity mismatches like before so I don't think this is the same issue. The monitor wasn't on when it went down so I have no clue what happened.
I've had the external rsyslog server option turned on in UnRAID for months but every single time the server hard locks, it locks up so thoroughly nothing useful ever gets logged. It's just whatever the last normal log entry was then the server startup again. Is there any other way to better capture what's happening? I'd like to avoid mirroring logs to the flash drive and given how I never get anything to my log server on the LAN I doubt mirroring locally would be any more productive...
2021-10-10T13:27:50-05:00 Node sshd[28719]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0) 2021-10-10T13:27:50-05:00 Node sshd[28719]: Starting session: command for root from xx.xx.xx.xx port 22016 id 0 2021-10-10T13:34:30-05:00 Node sshd[28719]: Close session: user root from xx.xx.xx.xx port 22016 id 0 2021-10-10T13:34:30-05:00 Node sshd[28719]: Received disconnect from xx.xx.xx.xx port 22016:11: disconnected by user 2021-10-10T13:34:30-05:00 Node sshd[28719]: Disconnected from user root xx.xx.xx.xx port 22016 2021-10-10T13:34:30-05:00 Node sshd[28719]: pam_unix(sshd:session): session closed for user root 2021-10-10T13:34:30-05:00 Node sshd[3724]: Connection from xx.xx.xx.xx port 9719 on xx.xx.xx.xx port 22 rdomain "" 2021-10-10T13:34:31-05:00 Node sshd[3724]: Accepted key RSA found at /etc/ssh/root.pubkeys:1 2021-10-10T13:34:31-05:00 Node sshd[3724]: Accepted publickey for root from xx.xx.xx.xx port 9719 ssh2: RSA SHA256 2021-10-10T13:34:31-05:00 Node sshd[3724]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0) 2021-10-10T13:34:31-05:00 Node sshd[3724]: Starting session: command for root from xx.xx.xx.xx port 9719 id 0 2021-10-10T20:05:10-05:00 Node rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="8183" x-info="https://www.rsyslog.com"] start 2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/enhanced.log.cfg already exists 2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/custom_syslog.conf already exists 2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/enhanced.log/syslog_filter.conf already exists 2021-10-10T20:05:10-05:00 Node root: plugin: running: anonymous 2021-10-10T20:05:10-05:00 Node root: 2021-10-10T20:05:10-05:00 Node root: ----------------------------------------------------------- 2021-10-10T20:05:10-05:00 Node root: Plugin enhanced.log is installed. 2021-10-10T20:05:10-05:00 Node root: Copyright 2015-2021, Dan Landon 2021-10-10T20:05:10-05:00 Node root: Version: 2021.08.21 2021-10-10T20:05:10-05:00 Node root: ----------------------------------------------------------- 2021-10-10T20:05:10-05:00 Node root: 2021-10-10T20:05:10-05:00 Node root: plugin: enhanced.log.plg installed 2021-10-10T20:05:10-05:00 Node root: plugin: installing: /boot/config/plugins/fix.common.problems.plg 2021-10-10T20:05:10-05:00 Node root: plugin: skipping: /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz already exists 2021-10-10T20:05:10-05:00 Node root: plugin: running: /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz 2021-10-10T20:05:10-05:00 Node root: 2021-10-10T20:05:10-05:00 Node root: +============================================================================== 2021-10-10T20:05:10-05:00 Node root: | Installing new package /boot/config/plugins/fix.common.problems/fix.common.problems-2021.08.05-x86_64-1.txz 2021-10-10T20:05:10-05:00 Node root: +============================================================================== 2021-10-10T20:05:10-05:00 Node root:
I've asked that the monitor be left on so I can hopefully get some sort of a clue as to wtf is going on as I don't know what else to do besides just start replacing hardware. I guess I could upgrade to 6.10-RC1 but it was stable on 6.9.2 for months before this so I don't think it has anything to do with the OS version.
node-diagnostics-20211010-2007.zip
EDIT: I have this photo of a kernel panic from when I was still having the RAM issue, but its the only clue I've got right now.
-
Apparently I'm just going to have to un-monitor that smart attribute. Disk1 just keeps randomly hitting me with notifications for the raw read error rate even though it isn't actually incrementing whatsoever.
-
6 hours ago, ChatNoir said:
65536 is exactly 2^16.
That is odd that you get that value from a notification and nothing from the GUI.
You might want to do an extended SMART test on that drive after the parity check.
I ran a short SMART yesterday but i'll turn off drive sleep and run an extended now that the parity check is finished.
5 hours ago, JorgeB said:Those are usually due to firmware issues, value changed and went back to 0, that should be safe to ignore.
That's good to hear. I know it isn't one of the default monitored SMART attributes-I assume for reasons like this-but I had enabled it after coming across a recommendation in another thread when I was troubleshooting some disk issues that its useful for drives from certain vendors.
I've never seen a WD drive with anything but zero for that attribute but I have seagate drives in my other server that all report a very high number for this attribute so I don't monitor it on VOID.
EDIT: Oh and the correcting check completed with zero errors! Thanks Turl and JorgeB for helping me figure out it was the RAM.I think this is the first time I've ever had a computer issue and it was actually a bad stick of RAM.
- 1
-
It seems like the RAM has done the trick, its ~75% through a correcting check with no errors and it hasn't hard locked or crashed yet.
However I just got a random notification from the server letting me know my RAW Read error rate on disk 1 is some ridiculous number
28-09-2021 05:27 PM Unraid Disk 1 SMART health [1] Warning [NODE] - raw read error rate is 65536 WDC_WD80EFAX-68KNBN0_VAJBBYUL (sdd)
Which is odd because when I go to check the smart stats in the GUI it says my raw read error rate is zero???
-
7 hours ago, JorgeB said:
You should try that if possible.
I was able to find the same kit NIB on ebay so once it gets here I will replace and run a test with just the two new sticks to see if the panics stop.
-
@JorgeBIt finished the non-correcting check, 73 errors.
within a few hours of starting a new correcting check with the second set of RAM it has again hard locked and the server is unresponsive.
So at this point I've tried two correcting checks and it kernel panics each time with this RAM. Should I assume it's bad and replace?
I find it interesting that it only happens during the correcting check.
I had no problems with the first set of RAM.
-
14 hours ago, JorgeB said:
yep.
Well it hard locked and crashed within a few hours of the second set being installed and a parrot check started.
I've got someone going over today to power it off and back on. If it continues to be unstable with this set of DIMMS I wager a replacement is in order?
EDIT: Unclean shutdown. I'm letting it run its non-correcting check, its already found 73 errors. It was in the middle of importing a bunch of stuff from sonarr, but it was all going to the cache drive (mover doesnt run till 3AM) so that wouldn't be the cause of these new parity errors right?
After the non-correcting check is finished should I continue with the correcting checks?
unRAID 6 NerdPack - CLI tools (iftop, iotop, screen, kbd, etc.)
in Plugin Support
Posted · Edited by weirdcrap
@dmaciasThe latest version of unRAR supplied is subject to a directory traversal vulnerability:
https://nvd.nist.gov/vuln/detail/CVE-2022-30333#vulnCurrentDescriptionTitle
I found a slackware package for 6.1.7 (needs to be >= 6.1.2) so assuming this is usable in UnRAID it should be a simple swap? *SEE EDIT BELOW
https://slackware.pkgs.org/current/slackers/unrar-6.1.7-x86_64-1cf.txz.html
For those of you who don't want to wait for the plugin to be updated can place your packages in /boot/extra for install.
This is unsupported, not recommended by FCP (it will produce a warning on scan), and may break your system but in my testing with the few packages I use it works fine for the time being.
EDIT: I totally misread the CVE version #. It's 6.12 or later not 6.1.2 . So the version I linked above, while newer than what's in NerdPack, still doesn't address that CVE. I can't seem to find a 6.12 version for slackware anywhere? Can anyone else?