elbweb

Members
  • Posts

    7
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

elbweb's Achievements

Noob

Noob (1/14)

1

Reputation

  1. Hello! I've been having some issues lately where most docker services become unresponsive and it seems to be because the system itself has high CPU usage. I am assuming it's locking some resource. My problem is less about the high usage and more that I've having a hard time *seeing* what the usage is coming from. I've included 3 files that are all from the exact same time (had all the windows open at once and took a screenshot). The main Unraid GUI shows ~50% usage with 10 threads being maxed out. The top command from a command line shows about a total of ~30% cpu usage across multiple processes, with 10% being in containerd (my understanding is that this is a percentage of total system capacity, so 30% here would be ~7 threads at 100% and similar to what's shown in the unraid GUI?). Docker Stats shows mostly no usage (my understanding is that these percentages are relative, e.g. 100% would be one thread at 100%, not the system's total capacity): So, my questions, are these 'command's in top threads that a docker container has created but isn't managed by the container (e.g. the influxd line item that has 6% cpu [which is ~1.5 threads in my case]) something not counted in the docker stats command? Is there any way to determine what's causing this CPU usage? I haven't had this issue in months of running basically the same thing. Thanks in advance!
  2. Hardware: x399, Threadripper 2920X 64gb ram SAS 9211-8I 12x8tb drives 1tb, 512gb and 512gb nvme for btrfs cache Background - bought a new server case (supermicro 846). Tried the new server backplane / swapable bays and everything worked fine. In moving over to the new case and (I'm assuming) I broke the MB - would no longer post. Bought a new MB, replaced it, and it started booting, but lots of strange issues - so far I've tried a few things: Reinstalled 6.8.3 (instead of the nvidia driver version) Every combination of disabled cstates and psu idle power states (some kernel dumps were referencing CPUIDLE in the error) Disabled all docker autostarts (seemed like I got some docker related errors?) Removed references to old pass-through GPU from my plex container (thinking GUIDs might be different?) Latest BIOS on the motherboard Currently there are no autostart VMs or Dockers, I autostart the array and it crashes after a few minutes. Occasionally will not boot at all. There is apparently a tower diagnostics zip in the logs fodler from about 90 minutes prior to the log before, I'm not sure what I did to trigger it, though. I've attached it. It had once been on long enough that I had enabled mirroring the syslog to flash, got this chunk before it died: Apr 9 21:23:04 Tower kernel: traps: notify[7202] general protection ip:68d370 sp:7ffeee22bfb8 error:0 in php[433000+2b4000] Apr 9 21:23:05 Tower rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="7171" x-info="https://www.rsyslog.com"] start Apr 9 21:24:07 Tower kernel: notify[7605]: segfault at 502 ip 000000000065caae sp 00007ffebc8df200 error 4 in php[433000+2b4000] Apr 9 21:24:07 Tower kernel: Code: 15 81 8c 24 b4 00 00 00 00 00 00 01 83 ea 01 89 94 24 d0 00 00 00 48 8b 40 08 48 83 f8 01 76 28 48 3d ff 01 00 00 76 15 31 ed <80> 38 3f 40 0f 94 c5 48 01 c5 4d 85 e4 0f 84 7f 05 00 00 81 8c 24 Apr 9 21:24:07 Tower kernel: mdcmd (49): nocheck Pause Apr 9 21:24:50 Tower init: Switching to runlevel: 0 Apr 9 21:24:50 Tower init: Trying to re-exec init Apr 9 21:25:34 Tower ntpd[2465]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Apr 9 21:25:48 Tower kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Apr 9 21:25:48 Tower kernel: rcu: 21-...0: (67 ticks this GP) idle=836/1/0x4000000000000000 softirq=3068/3068 fqs=58815 Apr 9 21:25:48 Tower kernel: rcu: (detected by 18, t=240007 jiffies, g=14481, q=73080) Apr 9 21:25:48 Tower kernel: Sending NMI from CPU 18 to CPUs 21: Apr 9 21:25:48 Tower kernel: NMI backtrace for cpu 21 Apr 9 21:25:48 Tower kernel: CPU: 21 PID: 5299 Comm: unraidd0 Tainted: G D O 4.19.107-Unraid #1 Apr 9 21:25:48 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019 Apr 9 21:25:48 Tower kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x6b/0x171 Apr 9 21:25:48 Tower kernel: Code: 42 f0 8b 07 30 e4 09 c6 f7 c6 00 ff ff ff 74 0e 81 e6 00 ff 00 00 75 1a c6 47 01 00 eb 14 85 f6 74 0a 8b 07 84 c0 74 04 f3 90 <eb> f6 66 c7 07 01 00 c3 48 c7 c2 40 07 02 00 65 48 03 15 80 6a f8 Apr 9 21:25:48 Tower kernel: RSP: 0018:ffffc9000730bd80 EFLAGS: 00000002 Apr 9 21:25:48 Tower kernel: RAX: 0000000000000101 RBX: ffff888ff8830b08 RCX: 0000000000000000 Apr 9 21:25:48 Tower kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff889031c3fd70 Apr 9 21:25:48 Tower kernel: RBP: ffff889031c3fd70 R08: 0000000000000000 R09: ffffc9000730bd48 Apr 9 21:25:48 Tower kernel: R10: 0000000000000fe0 R11: ffff888ff8830b88 R12: ffff888ff8830af8 Apr 9 21:25:48 Tower kernel: R13: ffff889031c3f800 R14: ffff888ff8831540 R15: ffff888ffc16e800 Apr 9 21:25:48 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903d340000(0000) knlGS:0000000000000000 Apr 9 21:25:48 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 9 21:25:48 Tower kernel: CR2: 0000000000514e54 CR3: 0000000fec62c000 CR4: 00000000003406e0 Apr 9 21:25:48 Tower kernel: Call Trace: Apr 9 21:25:48 Tower kernel: _raw_spin_lock_irq+0x1d/0x20 Apr 9 21:25:48 Tower kernel: release_stripe+0x1b/0x3d [md_mod] Apr 9 21:25:48 Tower kernel: unraidd+0x12d7/0x136e [md_mod] Apr 9 21:25:48 Tower kernel: ? __switch_to_asm+0x35/0x70 Apr 9 21:25:48 Tower kernel: ? __schedule+0x4f7/0x548 Apr 9 21:25:48 Tower kernel: ? md_thread+0xee/0x115 [md_mod] Apr 9 21:25:48 Tower kernel: md_thread+0xee/0x115 [md_mod] Apr 9 21:25:48 Tower kernel: ? wait_woken+0x6a/0x6a Apr 9 21:25:48 Tower kernel: ? md_open+0x2c/0x2c [md_mod] Apr 9 21:25:48 Tower kernel: kthread+0x10c/0x114 Apr 9 21:25:48 Tower kernel: ? kthread_park+0x89/0x89 Apr 9 21:25:48 Tower kernel: ret_from_fork+0x22/0x40 Apr 9 21:26:33 Tower root: Status of all loop devices Apr 9 21:26:33 Tower root: /dev/loop1: [2049]:4 (/boot/bzfirmware) Apr 9 21:26:33 Tower root: /dev/loop2: [0037]:260 (/mnt/cache/system/docker/docker.img) Apr 9 21:26:33 Tower root: /dev/loop0: [2049]:3 (/boot/bzmodules) Apr 9 21:26:33 Tower root: Active pids left on /mnt/* Apr 9 21:26:33 Tower root: USER PID ACCESS COMMAND Apr 9 21:26:33 Tower root: /mnt/cache: root kernel mount /mnt/cache Apr 9 21:26:33 Tower root: /mnt/disk1: root kernel mount /mnt/disk1 Apr 9 21:26:33 Tower root: /mnt/disk10: root kernel mount /mnt/disk10 Apr 9 21:26:33 Tower root: /mnt/disk2: root kernel mount /mnt/disk2 Apr 9 21:26:33 Tower root: /mnt/disk3: root kernel mount /mnt/disk3 Apr 9 21:26:33 Tower root: /mnt/disk4: root kernel mount /mnt/disk4 Apr 9 21:26:33 Tower root: /mnt/disk5: root kernel mount /mnt/disk5 Apr 9 21:26:33 Tower root: /mnt/disk6: root kernel mount /mnt/disk6 Apr 9 21:26:33 Tower root: /mnt/disk7: root kernel mount /mnt/disk7 Apr 9 21:26:33 Tower root: /mnt/disk8: root kernel mount /mnt/disk8 Apr 9 21:26:33 Tower root: /mnt/disk9: root kernel mount /mnt/disk9 Apr 9 21:26:33 Tower root: /mnt/user: root kernel mount /mnt/user Apr 9 21:26:33 Tower root: /mnt/user0: root kernel mount /mnt/user0 Apr 9 21:26:33 Tower root: Active pids left on /dev/md* Apr 9 21:26:33 Tower root: USER PID ACCESS COMMAND Apr 9 21:26:33 Tower root: /dev/md1: root kernel mount /mnt/disk1 Apr 9 21:26:33 Tower root: /dev/md10: root kernel mount /mnt/disk10 Apr 9 21:26:33 Tower root: /dev/md2: root kernel mount /mnt/disk2 Apr 9 21:26:33 Tower root: /dev/md3: root kernel mount /mnt/disk3 Apr 9 21:26:33 Tower root: /dev/md4: root kernel mount /mnt/disk4 Apr 9 21:26:33 Tower root: /dev/md5: root kernel mount /mnt/disk5 Apr 9 21:26:33 Tower root: /dev/md6: root kernel mount /mnt/disk6 Apr 9 21:26:33 Tower root: /dev/md7: root kernel mount /mnt/disk7 Apr 9 21:26:33 Tower root: /dev/md8: root kernel mount /mnt/disk8 Apr 9 21:26:33 Tower root: /dev/md9: root kernel mount /mnt/disk9 Apr 9 21:26:33 Tower root: Generating diagnostics... Apr 9 21:26:39 Tower kernel: BUG: unable to handle kernel paging request at 00000000000096d1 Apr 9 21:26:39 Tower kernel: PGD fdf792067 P4D fdf792067 PUD fdf178067 PMD 0 Apr 9 21:26:39 Tower kernel: Oops: 0000 [#2] SMP NOPTI Apr 9 21:26:39 Tower kernel: CPU: 5 PID: 207 Comm: kworker/u256:6 Tainted: G D O 4.19.107-Unraid #1 Apr 9 21:26:39 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Taichi, BIOS P3.90 12/04/2019 Apr 9 21:26:39 Tower kernel: Workqueue: events_power_efficient gc_worker Apr 9 21:26:39 Tower kernel: RIP: 0010:gc_worker+0x8c/0x270 Apr 9 21:26:39 Tower kernel: Code: 93 00 48 8b 15 e4 9a 93 00 3b 05 c2 9a 93 00 75 dd 39 cd 72 02 31 ed 89 e8 48 8d 04 c2 4c 8b 30 41 f6 c6 01 0f 85 4a 01 00 00 <41> 0f b6 46 37 49 c7 c0 f0 ff ff ff 41 ff c5 48 6b c0 38 49 29 c0 Apr 9 21:26:39 Tower kernel: RSP: 0018:ffffc90006ecbe60 EFLAGS: 00010246 Apr 9 21:26:39 Tower kernel: RAX: ffff889031125bb0 RBX: 0000000000000000 RCX: 0000000000010000 Apr 9 21:26:39 Tower kernel: RDX: ffff889031100000 RSI: 0000000000000175 RDI: ffffffff822aa760 Apr 9 21:26:39 Tower kernel: RBP: 0000000000004b76 R08: ffffffffffffffb8 R09: 0000746e65696369 Apr 9 21:26:39 Tower kernel: R10: 8080808080808080 R11: fefefefefefefeff R12: ffffffff822aa760 Apr 9 21:26:39 Tower kernel: R13: 0000000000000001 R14: 000000000000969a R15: ffff888fe4180000 Apr 9 21:26:39 Tower kernel: FS: 0000000000000000(0000) GS:ffff88903cf40000(0000) knlGS:0000000000000000 Apr 9 21:26:39 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 9 21:26:39 Tower kernel: CR2: 00000000000096d1 CR3: 0000001035c16000 CR4: 00000000003406e0 Apr 9 21:26:39 Tower kernel: Call Trace: Apr 9 21:26:39 Tower kernel: process_one_work+0x16e/0x24f Apr 9 21:26:39 Tower kernel: worker_thread+0x1e2/0x2b8 Apr 9 21:26:39 Tower kernel: ? rescuer_thread+0x2a7/0x2a7 Apr 9 21:26:39 Tower kernel: kthread+0x10c/0x114 Apr 9 21:26:39 Tower kernel: ? kthread_park+0x89/0x89 Apr 9 21:26:39 Tower kernel: ret_from_fork+0x22/0x40 Apr 9 21:26:39 Tower kernel: Modules linked in: ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod bonding igb(O) edac_mce_amd kvm_amd kvm btusb btrtl btbcm btintel bluetooth crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 i2c_piix4 crypto_simd wmi_bmof mxm_wmi i2c_core k10temp mpt3sas ecdh_generic cryptd glue_helper raid_class ccp scsi_transport_sas nvme ahci nvme_core libahci wmi pcc_cpufreq button acpi_cpufreq [last unloaded: igb] Apr 9 21:26:39 Tower kernel: CR2: 00000000000096d1 Apr 9 21:26:39 Tower kernel: ---[ end trace 0d847ac0fcfecec6 ]--- tower-diagnostics-20200409-1959.zip
  3. Hello! Recently I've had Fix Common Problems plugin let me know about a hacking attempt. That's definitely what the logs imply. Looks like someone was attempting to connect via a bunch of standard / known users and passwords (this was repeated over two days, and 100's of times a day with similar information): Jan 9 02:22:03 Tower sshd[1979]: Failed password for mysql from 91.xxx.x.x port 56816 ssh2 Jan 9 02:22:03 Tower sshd[1979]: Connection closed by authenticating user mysql 91.xxx.x.x port 56816 [preauth] Jan 9 04:43:27 Tower sshd[130858]: Invalid user nginx from 91.xxx.x.x port 52020 Jan 9 04:43:27 Tower sshd[130858]: error: Could not get shadow information for NOUSER Jan 9 04:43:27 Tower sshd[130858]: Failed password for invalid user nginx from 91.xxx.x.x port 52020 ssh2 Jan 9 04:43:27 Tower sshd[130858]: Connection closed by invalid user nginx 91.xxx.x.x port 52020 [preauth] The part that I don't understand is the ports, and what this log really means. My server is exposed to the internet, but only on a non-standard port that is forwarded to SSH, and port 80 (redirected to 443)/443. One of the port 443 redirects goes to the unraid web portal, but hidden behind an NGINX auth - on top of the unraid auth itself. So, my question - how was a login attempt made for these different ports? Beyond taking the access that I have down, what else should I be doing to limit this? Thanks!
  4. Hello! A little bit ago (about 6 weeks) I added a few drives, doubled the memory, added two GPUs, and a third NVME to the cache array. Since then (I've just now noticed) my parity checks are getting errors. I'm wondering what the best way to track down what's causing these errors? Seems like I should be able to use the smart information but.. any way to do that server wide without going into each report manually? The server details: 2920x, 64GB ram, 12 8tb disks (1 parity, 1 precleared to swap), 3 cache nvme (512 + 512 + 1024)
  5. So - I went through and disabled one of the states in my bios and haven't had the problem again yet. No idea if that's the solution but so far so good. Thanks @johnnie.black
  6. Cache drive isn't full - it has about 400GB free. From what I read about the 'error' in the log it's more about writes being directed directly to the drive for one reason or another, instead of being in the cache and the migrated. I've since turned off caching for the two shares that are stored on disk. It's a clean break - shares are either in cache or on disk. Thanks! I'll look into this! It's hard to tell since I can't 'trigger' it to fail, but, it can't hurt!
  7. Hello! I've been running unraid for a bit now and generally loving it. I have a problem where, every few days, the whole machine will stop responding. I have it plugged into an external monitor and keyboard. When it enters this state not even numlock will toggle on the keyboard. I piped the system logs to the flash drive in hopes of getting more information (as the logs shown while it was running were fine). This is the last bit of the log which doesn't seem interesting at all: Jun 5 01:54:47 Tower shfs: share cache full Jun 5 01:55:19 Tower shfs: share cache full Jun 5 02:47:07 Tower shfs: share cache full Jun 5 02:48:37 Tower shfs: share cache full Jun 5 03:00:01 Tower Plugin Auto Update: Checking for available plugin updates Jun 5 03:00:05 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished Jun 5 03:40:13 Tower crond[2476]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null From the looks of the log file itself it was last written to about an hour after the last log entry. When it comes online it takes about 12 hours to verify the array if there is no usage, 24-36 while the server gets used. Seems normal but I'd like to avoid this all together. I have the 'performance' mode turned on. General hardware: AMD 2920X, 32GB ram, 8x8TB hdd, 2x512GB nvme cache. My downloads, appdata, domains, system and isos all live on cache only. Any help would be appreciated! tower-diagnostics-20190605-1333.zip