DieFalse

Members
  • Posts

    432
  • Joined

  • Last visited

  • Days Won

    1

Everything posted by DieFalse

  1. GuildDarts, The fix worked. I was able to add an Animated When Hover only svg icon. I only created one folder and put 4 things in it, and it works. I will fully test later but so far, no network interruption either.
  2. I continue to have problems with GSA and Arcanine. Arcanine became completely unresponsive last night. GSA is SSH'able but nothing will run correct.y Neither is unusable in its state, so I will be forced to revert to 6.9.2 soon to bring them back online, I would like to give all information I can to help resolve this. Please let me know what to provide. I can even provide remote access if you want to PM me.
  3. Ok, Arcanine is on 6.10rc1 with no issues as of this morning. GSA I could not get into it in any way other than SSH as previously mentioned, so it still has the issues originally reported. Except error 500 on loading the webui instead of 302. I used use_ssl no and got into the web ui, call traces are still heavily happening. The kernel files should all be stock now. I since I had previously downgraded, manually copied 6.9.2 to root, rebooted, then upgraded to 6.10rc1 through UI and verified bz files matched the downloaded zip for 6.10rc1 gsa-diagnostics-20210822-1057.zip
  4. I will restore via instructions this evening. The btrfs issues were resolved in another thread that lead to me trying 6.10 as the call traces on previous versions. Leading me to here. I'll advise when both are 100% stock and then upgrade to 6.10rc1 again and see if any difference is noticed.
  5. That plugin has been removed, the behaviour exists on GSA and Arcanine both. There was no "revert" option so I assume on Arcanine if thats the only one with bad .bz, I will need to do something to revert? I thought when you run upgrade it installs the latest .bz, so that makes me think something would have to modify the .bz post upgrade?
  6. No, I haven't had a non-stock kernel on GSA ever, Arcanine used to for mellanox and something else but it went to stock over 3 releases ago. How do I restore .bz to stock if it's modified?
  7. Thank you - I will try that to see if I can get into the WebUI but the hangs / inability to maneuver anything even in ssh is worrisome. Anything in diagnostics useful?
  8. Makes sense - sorry, didn't want to reset until I knew it was the right thing to do, and missed the "lifetime" note. Thanks again. The call traces apparently relate to nvidia modules, so I opened a thread specific to that.
  9. I have been tracking a continuous call trace problem for many days and it seems the built in nvidia kernel/drivers is causing them. Any insight to alleviate this would be great. I am not sure if this is a "bug" or a "support issue" so wanted to start here and be moved if needed. System Info Unraid Version:6.9.2 && 6.10-rc1 Kernel:5.10.28-Unraid Compile Date:Wed Apr 7 08:23:18 PDT 2021 nVidia Info: Nvidia Driver Version:470.63.01 (latest stable) Installed GPU(s):0: Quadro P1000 43:00.0 Aug 20 07:10:58 GSA kernel: ------------[ cut here ]------------ Aug 20 07:10:58 GSA kernel: WARNING: CPU: 15 PID: 0 at net/netfilter/nf_nat_core.c:614 nf_nat_setup_info+0x6c/0x6aa [nf_nat] Aug 20 07:10:58 GSA kernel: Modules linked in: nvidia_uvm(PO) xt_mark xt_comment xt_nat veth nfsv3 nfs nfs_ssc xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle nf_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter xfs nfsd lockd grace sunrpc md_mod nvidia_drm(PO) nvidia_modeset(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops nvidia(PO) drm backlight agpgart ip6table_filter ip6_tables iptable_filter ip_tables x_tables mlx4_en mlx4_core tg3 sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ipmi_ssif rapl intel_cstate i2c_core intel_uncore mpt3sas input_leds led_class raid_class scsi_transport_sas megaraid_sas wmi acpi_power_meter ipmi_si button [last unloaded: mlx4_core] Aug 20 07:10:58 GSA kernel: CPU: 15 PID: 0 Comm: swapper/15 Tainted: P W O 5.10.28-Unraid #1 Aug 20 07:10:58 GSA kernel: Hardware name: Dell Inc. PowerEdge R720xd/0JP31P, BIOS 2.9.0 12/06/2019 Aug 20 07:10:58 GSA kernel: RIP: 0010:nf_nat_setup_info+0x6c/0x6aa [nf_nat] Aug 20 07:10:58 GSA kernel: Code: 89 fb 49 89 f6 41 89 d4 76 02 0f 0b 48 8b 93 80 00 00 00 89 d0 25 00 01 00 00 45 85 e4 75 07 89 d0 25 80 00 00 00 85 c0 74 07 <0f> 0b e9 77 05 00 00 48 8b 83 90 00 00 00 4c 8d 6c 24 20 48 8d 73 Aug 20 07:10:58 GSA kernel: RSP: 0018:ffffc90000494810 EFLAGS: 00010202 Aug 20 07:10:58 GSA kernel: RAX: 0000000000000080 RBX: ffff88830f355b80 RCX: ffff88821645e500 Aug 20 07:10:58 GSA kernel: RDX: 0000000000000180 RSI: ffffc900004948ec RDI: ffff88830f355b80 Aug 20 07:10:58 GSA kernel: RBP: ffffc900004948d8 R08: 000000007313a8c0 R09: 0000000000000000 Aug 20 07:10:58 GSA kernel: R10: 0000000000000158 R11: ffff88814f13bf00 R12: 0000000000000000 Aug 20 07:10:58 GSA kernel: R13: 000000007313a800 R14: ffffc900004948ec R15: 0000000000000001 Aug 20 07:10:58 GSA kernel: FS: 0000000000000000(0000) GS:ffff88debf5c0000(0000) knlGS:0000000000000000 Aug 20 07:10:58 GSA kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 20 07:10:58 GSA kernel: CR2: 00000000017d62f0 CR3: 000000000200a001 CR4: 00000000000606e0 Aug 20 07:10:58 GSA kernel: Call Trace: Aug 20 07:10:58 GSA kernel: Aug 20 07:10:58 GSA kernel: ? __fib_validate_source+0x24c/0x2a5 Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x4bb/0x5c0 [ip_tables] Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables] Aug 20 07:10:58 GSA kernel: __nf_nat_alloc_null_binding+0x5f/0x76 [nf_nat] Aug 20 07:10:58 GSA kernel: nf_nat_inet_fn+0x91/0x183 [nf_nat] Aug 20 07:10:58 GSA kernel: nf_nat_ipv4_local_in+0x25/0xa9 [nf_nat] Aug 20 07:10:58 GSA kernel: nf_hook_slow+0x39/0x8e Aug 20 07:10:58 GSA kernel: nf_hook.constprop.0+0xb1/0xd8 Aug 20 07:10:58 GSA kernel: ? ip_protocol_deliver_rcu+0xfe/0xfe Aug 20 07:10:58 GSA kernel: ip_local_deliver+0x49/0x75 Aug 20 07:10:58 GSA kernel: ip_sabotage_in+0x43/0x4d [br_netfilter] Aug 20 07:10:58 GSA kernel: nf_hook_slow+0x39/0x8e Aug 20 07:10:58 GSA kernel: nf_hook.constprop.0+0xb1/0xd8 Aug 20 07:10:58 GSA kernel: ? l3mdev_l3_rcv.constprop.0+0x50/0x50 Aug 20 07:10:58 GSA kernel: ip_rcv+0x41/0x61 Aug 20 07:10:58 GSA kernel: __netif_receive_skb_one_core+0x74/0x95 Aug 20 07:10:58 GSA kernel: netif_receive_skb+0x79/0xa1 Aug 20 07:10:58 GSA kernel: br_handle_frame_finish+0x30d/0x351 Aug 20 07:10:58 GSA kernel: ? skb_copy_bits+0xe8/0x197 Aug 20 07:10:58 GSA kernel: ? ipt_do_table+0x570/0x5c0 [ip_tables] Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda Aug 20 07:10:58 GSA kernel: br_nf_hook_thresh+0xa3/0xc3 [br_netfilter] Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda Aug 20 07:10:58 GSA kernel: br_nf_pre_routing_finish+0x23d/0x264 [br_netfilter] Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda Aug 20 07:10:58 GSA kernel: ? br_handle_frame_finish+0x351/0x351 Aug 20 07:10:58 GSA kernel: ? nf_nat_ipv4_pre_routing+0x1e/0x4a [nf_nat] Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Aug 20 07:10:58 GSA kernel: ? br_handle_frame_finish+0x351/0x351 Aug 20 07:10:58 GSA kernel: NF_HOOK+0xd7/0xf7 [br_netfilter] Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Aug 20 07:10:58 GSA kernel: br_nf_pre_routing+0x229/0x239 [br_netfilter] Aug 20 07:10:58 GSA kernel: ? br_nf_forward_finish+0xd0/0xd0 [br_netfilter] Aug 20 07:10:58 GSA kernel: br_handle_frame+0x25e/0x2a6 Aug 20 07:10:58 GSA kernel: ? br_pass_frame_up+0xda/0xda Aug 20 07:10:58 GSA kernel: __netif_receive_skb_core+0x335/0x4e7 Aug 20 07:10:58 GSA kernel: ? dev_gro_receive+0x55d/0x578 Aug 20 07:10:58 GSA kernel: __netif_receive_skb_list_core+0x78/0x104 Aug 20 07:10:58 GSA kernel: netif_receive_skb_list_internal+0x1bf/0x1f2 Aug 20 07:10:58 GSA kernel: gro_normal_list+0x1d/0x39 Aug 20 07:10:58 GSA kernel: napi_complete_done+0x79/0x104 Aug 20 07:10:58 GSA kernel: mlx4_en_poll_rx_cq+0xa8/0xc7 [mlx4_en] Aug 20 07:10:58 GSA kernel: net_rx_action+0xf4/0x29d Aug 20 07:10:58 GSA kernel: __do_softirq+0xc4/0x1c2 Aug 20 07:10:58 GSA kernel: asm_call_irq_on_stack+0x12/0x20 Aug 20 07:10:58 GSA kernel: Aug 20 07:10:58 GSA kernel: do_softirq_own_stack+0x2c/0x39 Aug 20 07:10:58 GSA kernel: __irq_exit_rcu+0x45/0x80 Aug 20 07:10:58 GSA kernel: common_interrupt+0x119/0x12e Aug 20 07:10:58 GSA kernel: asm_common_interrupt+0x1e/0x40 Aug 20 07:10:58 GSA kernel: RIP: 0010:arch_local_irq_enable+0x4/0x8 Aug 20 07:10:58 GSA kernel: Code: d4 39 18 00 48 83 c4 28 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 9c 58 66 66 90 66 90 c3 fa 66 66 90 66 66 90 c3 fb 66 66 90 <66> 66 90 c3 55 8b af 28 04 00 00 b8 01 00 00 00 45 31 c9 53 45 31 Aug 20 07:10:58 GSA kernel: RSP: 0018:ffffc900000f7ea0 EFLAGS: 00000246 Aug 20 07:10:58 GSA kernel: RAX: ffff88debf5e2380 RBX: 0000000000000004 RCX: 000000000000001f Aug 20 07:10:58 GSA kernel: RDX: 0000000000000000 RSI: 000000003333348b RDI: 0000000000000000 Aug 20 07:10:58 GSA kernel: RBP: ffffe8fffddfed00 R08: 00003d6121a8df03 R09: 00003d667688e0ff Aug 20 07:10:58 GSA kernel: R10: 00000000000664e6 R11: 071c71c71c71c71c R12: 00003d6121a8df03 Aug 20 07:10:58 GSA kernel: R13: ffffffff820c5dc0 R14: 0000000000000004 R15: 0000000000000000 Aug 20 07:10:58 GSA kernel: cpuidle_enter_state+0x101/0x1c4 Aug 20 07:10:58 GSA kernel: cpuidle_enter+0x25/0x31 Aug 20 07:10:58 GSA kernel: do_idle+0x1a6/0x214 Aug 20 07:10:58 GSA kernel: cpu_startup_entry+0x18/0x1a Aug 20 07:10:58 GSA kernel: secondary_startup_64_no_verify+0xb0/0xbb Aug 20 07:10:58 GSA kernel: ---[ end trace d61aac45b3f9ccb8 ]---
  10. Ok, I deleted the files, re-ran scrub and it comes back with: Scrub started: Thu Aug 19 12:34:41 2021 Status: finished Duration: 6:06:46 Total to scrub: 34.36TiB Rate: 1.60GiB/s Error summary: no errors found BUT the script you linked to returns this: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 1137 [/dev/sdb1].generation_errs 0 So I still have corruption that scrub nor the log is showing, I think... advice?
  11. I will this afternoon and report back.
  12. Upgraded to 6.10-RC1 to attempt to resolve a kernel issue (call trace) and now I can only access SSH and not the WebUI. WebUI times out regardless of https://ip http://ip or fqdn or hash.unraid.net. Notes: While I can SSH - Running anything freezes the session for eternity unless a basic no-drawing command (ls,ps,etc works) (top,htop,mc freezes) While I can see the Samba shares or NFS shares, trying to browse into any of them freezes the session It seems the server is overloaded, I was able to run "docker stop $(docker ps -q)" which worked, but still can not do anything of use. also ran "/etc/rc.d/rc.libvirt stop" and the server is still hung, currently waiting to see if "fuser -mv /mnt/disk* /mnt/user/*" returns anything as its hung. netstat shows listening on the right ports for the nginx server: netstat -tulpn | grep LISTEN tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 7642/rpcbind tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 47166/nginx: master tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 7447/sshd: /usr/sbi tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 47166/nginx: master tcp 0 0 0.0.0.0:41787 0.0.0.0:* LISTEN 7646/rpc.statd tcp6 0 0 :::111 :::* LISTEN 7642/rpcbind tcp6 0 0 :::80 :::* LISTEN 47166/nginx: master tcp6 0 0 :::22 :::* LISTEN 7447/sshd: /usr/sbi tcp6 0 0 :::36151 :::* LISTEN 7646/rpc.statd tcp6 0 0 :::443 :::* LISTEN 47166/nginx: master /etc/rc.d/rc.nginx is running and restart makes no difference curl localhost results in 302 error, so I am assuming this has to do with the dns redirect that unraid uses to the hash url? (is there anyway to disable that since I use my own dns and hostnames with valid certs anyway and it would be easy to configure). I had to mv /boot/previous to /boot to restore access and control I was unable to pull diagnostics as all method timed out, and pulling the usb was not possible due to physical access limitations until I restored. I did collect two sets during the issues though and have attached. Diagnostics attached from both servers Arcanine = AMD Threadripper on X399D8A-2T GSA = Intel Xeon Dell Poweredge R7220XD Similarity is the 10GbE SFP+ card is the same in both units arcanine-diagnostics-20210819-1207.zip gsa-diagnostics-20210819-0946.zip gsa-diagnostics-20210819-1005.zip syslog.zip
  13. Excellent I will do that. Loaded the RC1 for 6.10 and am having an issue accessing the WebUI so I will have to sort that first. (never loads) will need another thread.
  14. Ok, two uncorrectables, any assist on what to do to resolve / find what they are to fix?
  15. Ok. I setup the script in user scripts and scheduled it, I have also started a scrub now that parity check finished (the reboot caused that and it has zero errors). I will report back after the scrub and when it finishes, I will then try the beta to see if that sorts the call traces.
  16. Here is the syslog, I was able SSH in this time, using the shutdown script failed. It appears I have some sort of corruption occurring so I definitely need your advice. syslog
  17. Yes, I did use the only enable animate on hover as I can't stand continual motion, the moving items need my attentions (ADHD) so only when hovering is needed. When you push the hover fix, I will retry and maybe we can troubleshoot the communication issue.
  18. I installed docker folders and have two weird issues, let me start with I followed Ibracorps video and guide and love the idea of it working. I tried to use a single folder animated or static icon and it immediately breaks the folders and shows one folder ( the one the image is assigned to) and then all the remaining dockers separate with no image, I tried multiple images and even local file vs url to image. second, stranger than above, my cloudflared stops communicating with my lan and many of my dockers lose connectivity when in a folder. two dockers in their own separate folders can't talk but the two in the same folder can. ALL are on the Bridge or Host. I did try using a custom docker network, but it can't be assigned, as it shows in the drop down, select then apply and the docker says "unable to find network "custom: __________" Removing folders plugin immediately resolves the above issues.
  19. In an effort to keep the thread clean and on point, I adjusted this thread to the single issue it helped resolve, Corruption on the cache pool. I recently resolved an MCE event by replacing hardware. Everything tests perfect and I receive no errors, while likely unrelated I wanted to mention. I have for the last three mornings, awoke to find my server completely unreachable and offline frozen. Even IDRAC console yields no response and I have to warm-cycle the server to bring it back online. Any insight available would be helpful. Attached is the diagnostics pulled when it loaded this morning. gsa-diagnostics-20210816-0916.zip
  20. This seems to be my best option. Thanks all for the support. I wonder if they don't release or cover the PSID's for security now? Strange indeed.
  21. @SimonF there is no PSID on the disk that I can find unless its under the "Tamper Evident Label" which I find to be weird, these are new replacement drives from Dell, so its possible. I did download the sedutil to see if I could get any info, but can't run it on unraid. The Kernel flag libata.allow_tpm is not set correctly Please see the readme note about setting the libata.allow_tpm Invalid or unsupported disk /dev/sdz
  22. Ok. So even though it read that it was not protected, I did get this: sg_format failed: Data protect So it seems there is some sort of protection but how to clear it. I will try the raid controller next, need to open it for ram removal anyways.
  23. This does not seem to be the case, no PSID on the drive and smartctl doesnt show protection: root@GSA:~# smartctl -i /dev/sdaa smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.10.28-Unraid] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST91000642SS Revision: ASF9 Compliance: SPC-4 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Logical block size: 512 bytes Rotation Rate: 7200 rpm Form Factor: 2.5 inches Logical Unit id: 0x5000c50055ca49d3 Serial number: 9XG36YA6 Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Mon Jul 19 12:48:33 2021 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled
  24. Great call out - any way for me to check from within unraid? They're new, so wiping them is ok also.
  25. Diagnostics attached - I have 6 brand new blank drives that are Dell Seagate's. They wont attach, mount, or select. gsa-diagnostics-20210719-0249.zip