kharntiitar

Members
  • Posts

    50
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed
  • Location
    Australia

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

kharntiitar's Achievements

Rookie

Rookie (2/14)

4

Reputation

  1. For anybody else looking up similar questions some 3+ years later ... The very helpful "archived notification menu" can be found from: webui > tools > archived notifications You can then click on a notification, and it will give you further details, as per extrobe's "NOK" paste.. (Thanks for the pointers, resolved my issue, also ended up being a hot disk )
  2. Cool cool, makes sense. I record both, I’ll just keep the purchased date in your app and just keep my secondary spreadsheet for the extra details many thanks!!
  3. Hiya @olehj, first off, many thank for this great plugin!!! I've a question for you, how difficult/feasible would it be to add an extra manual column? Specifically, I'd like to have a "Manufactured Date". I realise this info could be adden in the comments area, but that begins to look untidy quickly. Cheers
  4. So I did the `killall lsof` about an hour ago, and it's just started a dozen or so instances of the previous mentioned 'lsof' command, and all cpu cores are at 100% .. I failed to mention before, that the whole system is lagging when this happens, even ICMP calls to it the latency will go up by 5-20 seconds.. due to CPU wait time.
  5. Hi one and all, for the past few weeks I've been having some weird CPU issues, I would see my docker images og to 80-160%, and a whole heap of "lsof" tasks would start being created. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3035 201 39 19 186904 154500 0 S 104.6 0.5 276:22.82 netdata 9886 nobody 20 0 1745432 424548 19832 S 100.3 1.3 176:57.72 mono 8705 nobody 20 0 4245124 506380 19328 S 97.0 1.5 567:37.83 Radarr 29112 root 20 0 1142100 92976 1136 S 78.3 0.3 218:40.05 shfs 23106 root 20 0 118948 12160 4768 R 59.2 0.0 0:11.66 runc 23159 root 20 0 188844 9564 4768 R 58.6 0.0 0:10.05 runc 23700 root 20 0 188780 11592 4704 R 51.3 0.0 0:02.11 runc 6192 root 20 0 2572 92 0 R 42.4 0.0 4:09.84 lsof 11290 root 20 0 3036 2032 1788 R 42.1 0.0 2:52.91 lsof 13951 root 20 0 3036 2008 1764 R 41.8 0.0 2:10.79 lsof 1186 root 20 0 3036 1992 1748 R 41.1 0.0 5:40.82 lsof 5954 root 20 0 3036 2032 1788 R 41.1 0.0 4:29.93 lsof 17990 root 20 0 3036 1992 1748 R 41.1 0.0 1:17.00 lsof 22466 root 20 0 3036 2020 1772 R 40.8 0.0 0:14.81 lsof 14587 root 20 0 3036 2104 1860 R 40.5 0.0 2:06.38 lsof 4547 root 20 0 3036 1920 1664 R 39.8 0.0 4:48.70 lsof 8947 root 20 0 3036 2044 1804 R 39.8 0.0 3:36.63 lsof 16682 root 20 0 3036 1944 1692 R 39.8 0.0 1:32.10 lsof 17283 root 20 0 3036 2088 1844 R 39.5 0.0 1:24.76 lsof 21801 root 20 0 3036 2008 1764 R 39.5 0.0 0:22.42 lsof 18524 root 20 0 3036 2008 1764 R 39.1 0.0 1:10.05 lsof 23589 root 20 0 12488 10748 2068 R 39.1 0.0 0:02.45 find 19585 root 20 0 3036 1936 1684 R 38.8 0.0 0:46.96 lsof 8086 root 20 0 3036 1932 1684 R 38.5 0.0 3:45.83 lsof 12631 root 20 0 3036 2028 1788 R 38.5 0.0 2:34.27 lsof 13322 root 20 0 3036 1988 1748 R 38.2 0.0 2:23.64 lsof 3817 root 20 0 3036 2008 1764 R 37.2 0.0 5:01.57 lsof 18737 root 20 0 3036 2020 1772 R 37.2 0.0 1:02.63 lsof 21037 root 20 0 3036 2048 1804 R 35.9 0.0 0:28.03 lsof 11991 root 20 0 3036 1988 1748 R 35.5 0.0 2:44.38 lsof 2731 root 20 0 3036 2100 1860 R 34.5 0.0 5:15.58 lsof 23379 root 20 0 3036 2096 1860 R 28.9 0.0 0:03.30 lsof 530 root 20 0 1712768 1.1g 22432 S 8.6 3.6 179:43.71 qemu-system-x86 1968 root 20 0 21.0g 815380 20404 R 6.9 2.5 35:21.06 influxd 1027 root 20 0 1691316 1.2g 22204 S 6.6 3.7 58:46.69 qemu-system-x86 23179 nobody 20 0 4312 3276 1768 R 4.9 0.0 0:00.83 unrar 10366 root 20 0 842064 92836 6196 S 3.6 0.3 672:58.29 telegraf 23750 root 20 0 16656 4540 3652 R 2.0 0.0 0:00.06 snmpget At first I thought it was the docker images/disk that was the issue, but I learnt today that it is far more likely to be these lsof tasks as the cause, and that the docker cpu utilisation is a symptoms. All of the lsof images look the same: root 11289 0.0 0.0 3840 2916 ? S 14:15 0:00 sh -c LANG='en_US.UTF8' lsof -Owl /mnt/disk[0-9]* 2>/dev/null|awk '/^shfs/ && $0!~/\.AppleD(B|ouble)/ && $5=="REG"'|awk -F/ '{print $4}' root 11290 48.7 0.0 3036 2032 ? R 14:15 3:08 lsof -Owl /mnt/disk1 /mnt/disk10 /mnt/disk11 /mnt/disk12 /mnt/disk13 /mnt/disk14 /mnt/disk15 /mnt/disk2 /mnt/disk3 /mnt/disk4 /mnt/disk5 /mnt/disk6 /mnt/disk7 /mnt/disk8 /mnt/disk9 I've grepped through various folders that I thought might have something and the only thing I can find that was close was, the plugin file for dynamix' stop shell: # find /boot/config/plugins -type f -exec grep -H lsof {} \; /boot/config/plugins/dynamix.stop.shell.plg:for PID in $(lsof /mnt/disk[0-9]* $cache /mnt/user /mnt/user0 2>/dev/null|awk '/^(bash|sh|mc) /{print $2}'); do I have nothing in crontab that's even slightly close, and no other scripts set to run that are at all similar in user.scripts etc.. perfoming a `killall lsof` will make it good for a short while, but there is still something performing an lsof on unassigned drives. I also had one Google search, whcih looked promising, and led me to here but then I couldn't find the text within the actual page. I did have a CPU issue a while back due to ryzen cpu, but that has (presumably) been resolved for a while now .. I can post diagnostics if needed, but if anybody has suggestions, I'd highly appreciate it!!!
  6. @paperblankets I got mine solved, hopefully you can too. I installed a second docker, which worked fine, then had a look at the file structure, the permissions were very different, one by one I chmod'd every single file and directory in /var/log for poste's docker. I now get no errors at all, even after reboot. Running Version 2.2.21 Free. Hope it helps.
  7. A very short time after this, I rebooted the container, and ran the logrotate again. This time I got a bunch of errors for items in /var/log: " because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation. I changed permissions on those folders to 755, and chowned to root, and it all started working, no errors. On reboot though, it reclaims the folders and errors come back. I tried rolling the docker back 2 months, and then 2 years, but same issue. Might look at editing the "su" config for it.
  8. Can't tell you why, but I'm having the exact some issue... exact same number and everything. It's working fine, just get that email every day for the past while. root@mail:/etc/cron.daily# logrotate /etc/logrotate.conf pkill: cannot allocate 4611686018427387903 bytes
  9. Well, uptime is now 49 Hours, I'm going to go ahead and close this off as solved. Thank you very much @johnnie.black for your help and pointing me to the solution!!! TL;DR dropped ram speed from 3200MHz to (auto) 2133MHz solved the issue.
  10. So far so good, uptime 21 hours 47 minutes since the ram speed change. Being that I'd previously had the same C-State settings, I doubt that will have made a difference, but who knows?! Will keep monitoring
  11. Purely out of interest, setting the memory to defaults, changed the speed to 2133 MHz.
  12. Thanks @johnnie.black I note two things from that: First that my alleged max supported memory speed for 2nd gen is not 3200 as per what the motherboard says (and what I've configured), I'll lower this and see how that goes. Second, I also note from the link in that thread that I shouldn't be turning of c-states, and instead leave it to auto. I'll do that as well.. Will let you know how I get on tomorrow
  13. Of note, one other thing I did was to disable C-States, in case the NVME drive was somehow going to sleep, I haven't undone this change.
  14. Version 6.8.3 2020-03-05 Hi all, So I've spent a bit of time narrowing this down, but in the interest of full disclosure, I will explain everything I have done up until now (as I have made some big changes). A couple of weeks ago, I upgraded my unraid server to a ryzen 7 2700x with asus x470-pro motherboard. I also added two M.2 NVME drives. I used one of the NVME drives as cache, and followed instructions here to move my appdata to the second NVME drive. I noted that after about 72 hours I was having an issue where the server was unresponsive, and (after a bit of troubleshooting) the cache drive had become disconnected. This occurred twice more, after just a few hours, so I took that drive back to the shop, and they are testing it for me. I had a spare NVME drive, which was a bit older/slower, but put that in as the cache drive. Then I started getting different errors, I noticed that multiple instances of "runc" were getting high CPU > 100% as were several dockers. shutting down dockers was still possible, but it takes about an hour (as compared to a normal 30 seconds to 2 minutes). I installed a second NIC, and set up my dockers and VMs to run through the second NIC, but it occurred again, high "runc" and docker CPU utilisation, but I was able to communicate just fine with my VMs (suggesting it's not the entire network stack, and is definitely just eth0). It was at this point I noticed something similar to the following: Jun 21 10:10:48 kernel: igb 0000:07:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 21 10:10:58 kernel: igb 0000:07:00.0 eth0: igb: eth0 NIC Link is Down Jun 21 10:11:01 kernel: igb 0000:07:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 21 10:16:47 apcupsd[21860]: Communications with UPS lost. Jun 21 10:17:16 kernel: igb 0000:07:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 21 10:17:29 kernel: igb 0000:07:00.0 eth0: igb: eth0 NIC Link is Down This was repeated throughout the syslog, and was echoed as interface up/down on the switch that my server connects too. Today also, I found in my syslog a trace which would appear to be exactly what the issue is, however I'm not sure what it means exactly; or what I can do about it. Jun 21 09:19:22 kernel: ------------[ cut here ]------------ Jun 21 09:19:22 kernel: NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out Jun 21 09:19:22 kernel: WARNING: CPU: 11 PID: 28598 at net/sched/sch_generic.c:465 dev_watchdog+0x161/0x1bb Jun 21 09:19:22 kernel: Modules linked in: vhost_net tun vhost tap kvm_amd ccp kvm xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 ip6table_filter ip6_tables wireguard ip6_udp_tunnel udp_tunnel iptable_raw iptable_mangle xt_nat veth macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs nfsd lockd grace sunrpc md_mod e1000e igb(O) edac_mce_amd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mpt3sas pcbc wmi_bmof mxm_wmi aesni_intel aes_x86_64 crypto_simd i2c_piix4 cryptd i2c_core nvme raid_class ahci k10temp libahci scsi_transport_sas glue_helper nvme_core wmi button [last unloaded: ccp] Jun 21 09:19:22 kernel: CPU: 11 PID: 28598 Comm: runc Tainted: G O 4.19.107-Unraid #1 Jun 21 09:19:22 kernel: Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019 Jun 21 09:19:22 kernel: RIP: 0010:dev_watchdog+0x161/0x1bb Jun 21 09:19:22 kernel: Code: 5f 94 00 00 75 39 48 89 ef c6 05 4e 5f 94 00 01 e8 a1 a8 fd ff 44 89 e9 48 89 ee 48 c7 c7 57 2a da 81 48 89 c2 e8 cd 0b af ff <0f> 0b eb 11 41 ff c5 48 81 c2 40 01 00 00 41 39 cd 75 95 eb 13 48 Jun 21 09:19:22 kernel: RSP: 0018:ffff8887fe8c3ea0 EFLAGS: 00010286 Jun 21 09:19:22 kernel: RAX: 0000000000000000 RBX: ffff8887f770e438 RCX: 0000000000000007 Jun 21 09:19:22 kernel: RDX: 0000000000000b7e RSI: 0000000000000002 RDI: ffff8887fe8d64f0 Jun 21 09:19:22 kernel: RBP: ffff8887f770e000 R08: 0000000000000003 R09: 0000000000000400 Jun 21 09:19:22 kernel: R10: 0000000000000000 R11: 0000000000000058 R12: ffff8887f770e41c Jun 21 09:19:22 kernel: R13: 0000000000000000 R14: ffff8887f6f06940 R15: 000000000000000b Jun 21 09:19:22 kernel: FS: 0000000000d6a880(0000) GS:ffff8887fe8c0000(0000) knlGS:0000000000000000 Jun 21 09:19:22 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 21 09:19:22 kernel: CR2: 000014fd54626880 CR3: 0000000242d90000 CR4: 00000000003406e0 Jun 21 09:19:22 kernel: Call Trace: Jun 21 09:19:22 kernel: Jun 21 09:19:22 kernel: call_timer_fn+0x18/0x7b Jun 21 09:19:22 kernel: ? qdisc_reset+0xc0/0xc0 Jun 21 09:19:22 kernel: expire_timers+0x7e/0x8d Jun 21 09:19:22 kernel: run_timer_softirq+0x72/0x120 Jun 21 09:19:22 kernel: ? enqueue_hrtimer.isra.0+0x23/0x27 Jun 21 09:19:22 kernel: ? __hrtimer_run_queues+0xdd/0x10b Jun 21 09:19:22 kernel: ? ktime_get+0x44/0x95 Jun 21 09:19:22 kernel: __do_softirq+0xc9/0x1d7 Jun 21 09:19:22 kernel: irq_exit+0x5e/0x9d Jun 21 09:19:22 kernel: smp_apic_timer_interrupt+0x80/0x93 Jun 21 09:19:22 kernel: apic_timer_interrupt+0xf/0x20 Jun 21 09:19:22 kernel: Jun 21 09:19:22 kernel: RIP: 0010:prepend_path+0xb1/0x205 Jun 21 09:19:22 kernel: Code: 44 24 14 41 89 c2 eb 13 49 39 c6 74 4c 4d 8b 5e 18 4c 8d 68 20 49 89 c6 4c 89 db 48 8b 44 24 08 48 3b 58 08 74 7b 49 8b 55 00 <48> 39 da 74 09 4c 8b 5b 18 49 39 db 75 45 48 39 da 49 8b 46 10 74 Jun 21 09:19:22 kernel: RSP: 0018:ffffc9000d703cf8 EFLAGS: 00000283 ORIG_RAX: ffffffffffffff13 Jun 21 09:19:22 kernel: RAX: ffff88822463f088 RBX: ffff8887a0fb5140 RCX: ffffc9000d703da8 Jun 21 09:19:22 kernel: RDX: ffff8887a0fb5140 RSI: ffff88822463f088 RDI: ffffc9000d703da8 Jun 21 09:19:22 kernel: RBP: ffffc9000d703d68 R08: 0000000000000000 R09: ffffc9000d703da8 Jun 21 09:19:22 kernel: R10: 000000000008b216 R11: ffff8887a0fb5140 R12: 000000000053cd36 Jun 21 09:19:22 kernel: R13: ffff8881726b4920 R14: ffff8881726b4900 R15: ffffc9000d703d64 Jun 21 09:19:22 kernel: ? __dentry_path.part.0+0xa7/0x115 Jun 21 09:19:22 kernel: __d_path+0x59/0x86 Jun 21 09:19:22 kernel: seq_path_root+0x40/0x95 Jun 21 09:19:22 kernel: show_mountinfo+0xc5/0x260 Jun 21 09:19:22 kernel: seq_read+0x231/0x313 Jun 21 09:19:22 kernel: __vfs_read+0x32/0x132 Jun 21 09:19:22 kernel: ? __switch_to_asm+0x41/0x70 Jun 21 09:19:22 kernel: vfs_read+0xa4/0x124 Jun 21 09:19:22 kernel: ksys_read+0x60/0xb2 Jun 21 09:19:22 kernel: do_syscall_64+0x57/0xf2 Jun 21 09:19:22 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jun 21 09:19:22 kernel: RIP: 0033:0x4a41c0 Jun 21 09:19:22 kernel: Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 Jun 21 09:19:22 kernel: RSP: 002b:000000c00005db10 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 Jun 21 09:19:22 kernel: RAX: ffffffffffffffda RBX: 000000c00001c000 RCX: 00000000004a41c0 Jun 21 09:19:22 kernel: RDX: 0000000000001000 RSI: 000000c0000e3000 RDI: 0000000000000004 Jun 21 09:19:22 kernel: RBP: 000000c00005db60 R08: 0000000000000000 R09: 0000000000000000 Jun 21 09:19:22 kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001 Jun 21 09:19:22 kernel: R13: 0000000000000075 R14: 0000000000895f8a R15: 0000000000000038 Jun 21 09:19:22 kernel: ---[ end trace ad1ca502756d72cd ]--- The NIC that it is talking about, is the onboard nic, lspci output is here: 07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) Subsystem: ASUSTeK Computer Inc. I211 Gigabit Network Connection Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 38 Region 0: Memory at fc500000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at b000 [size=32] Region 3: Memory at fc520000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #7, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <16us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Device Serial Number 4c-ed-fb-ff-ff-7a-07-47 Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported Steering table in TPH capability structure Kernel driver in use: igb Kernel modules: igb If I stop all dockers from cli (I've had to write a script and do it via screen, as I keep getting disconnected), everything goes back to normal, and I can start them again fine, without rebooting the system. At least, until the next time it dies. If anyone might be able to point me in the right direction as to what I've done wrong (and I'm sure it's me, as I made a heap of changes) I'd really appreciate it.. If I can provide more info, or clarify something, just say the word.