billchurch

Members
  • Posts

    11
  • Joined

  • Last visited

Everything posted by billchurch

  1. Same here, it seems unnecessary and is just noise making it difficult to find real problems in the log.
  2. 9 days uptime, no issues so far. I think I'm going to now add docker back into the mix and see what happens. Log files have been clear, no more kernel entries...
  3. This is irritating and embarrassing. I know I checked that before and when @John_M mentioned that I did a double-take and sure enough there's an update. Now I just have to figure out how to make the update stick, lol. Thanks for calling that out guys, I'll give that a shot. Also... If this **is** the issue, sorry for wasting everyone's time.
  4. Okay so I just decided to disable docker and kvm altogether. kvm had a virtual but it wasn't powered or being used. longest uptime I've had so far has been 8 days, I guess I'll see what I get now. A little bit discouraged as this platform is made to be a NAS platform, I can't find others with this problem AFAIK so maybe it is bad hardware, but it's been a pretty painful thing to try and diagnose.
  5. oh for sure. most recent bios update, the board is supported until 2020 supposedly. Is it possible that the macvlan stuff is making things unstable? I can't remember when I actually enabled that. I'm going to disable Unifi (since that's the only thing that's really using that) and see if that helps. Plan on running a full memtest tonight, I like it go through about 7% (5-6 test IIRC) but would like to see a full run.
  6. Yeah, it seems like that's not the problem, farther down I'm seeing this which is the most concerning. Bad "page map" Seems almost like it's pointing to memory (I'm running ECC and not seeing any ECC related errors) so I'm not sure. This thing, with the same hardware configuration, had been stable for months... Nov 20 08:29:02 nas kernel: BUG: Bad page map in process php-fpm7 pte:ffff880f7acfd9b8 pmd:56ef73067 Nov 20 08:29:02 nas kernel: addr:000000000a2169c4 vm_flags:00000075 anon_vma: (null) mapping:000000005445b8e9 index:85 Nov 20 08:29:02 nas kernel: file:xmlreader.so fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage Nov 20 08:29:02 nas kernel: CPU: 6 PID: 6559 Comm: php-fpm7 Tainted: G B D W 4.18.17-unRAID #1 Nov 20 08:29:02 nas kernel: Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 Nov 20 08:29:02 nas kernel: Call Trace: Nov 20 08:29:02 nas kernel: dump_stack+0x5d/0x79 Nov 20 08:29:02 nas kernel: print_bad_pte+0x212/0x22f Nov 20 08:29:02 nas kernel: _vm_normal_page+0x50/0xa6 Nov 20 08:29:02 nas kernel: unmap_page_range+0x4b6/0x88a Nov 20 08:29:02 nas kernel: unmap_vmas+0x4b/0x7f Nov 20 08:29:02 nas kernel: exit_mmap+0xc8/0x16a Nov 20 08:29:02 nas kernel: ? wake_bit_function+0x1/0x20 Nov 20 08:29:02 nas kernel: mmput+0x4d/0xe5 Nov 20 08:29:02 nas kernel: do_exit+0x3a4/0x8a4 Nov 20 08:29:02 nas kernel: ? dput.part.5+0xdf/0xea Nov 20 08:29:02 nas kernel: do_group_exit+0x9a/0x9a Nov 20 08:29:02 nas kernel: get_signal+0x417/0x44c Nov 20 08:29:02 nas kernel: ? wait_woken+0x68/0x68 Nov 20 08:29:02 nas kernel: do_signal+0x31/0x59d Nov 20 08:29:02 nas kernel: ? inet_accept+0x3e/0x127 Nov 20 08:29:02 nas kernel: ? put_unused_fd+0x31/0x40 Nov 20 08:29:02 nas kernel: ? __do_page_fault+0x379/0x40b Nov 20 08:29:02 nas kernel: exit_to_usermode_loop+0x25/0x96 Nov 20 08:29:02 nas kernel: do_syscall_64+0xdf/0xe6 Nov 20 08:29:02 nas kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nov 20 08:29:02 nas kernel: RIP: 0033:0x14ed7bb025e4 Nov 20 08:29:02 nas kernel: Code: Bad RIP value. Nov 20 08:29:02 nas kernel: RSP: 002b:00007fff4e407778 EFLAGS: 00000246 ORIG_RAX: 000000000000002b Nov 20 08:29:02 nas kernel: RAX: fffffffffffffe00 RBX: 000014ed7bd3eb88 RCX: 000014ed7bb025e4 Nov 20 08:29:02 nas kernel: RDX: 00007fff4e4077f0 RSI: 00007fff4e4077f8 RDI: 0000000000000008 Nov 20 08:29:02 nas kernel: RBP: 000000000000002b R08: 0000000000000000 R09: 0000000000000000 Nov 20 08:29:02 nas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008 Nov 20 08:29:02 nas kernel: R13: 00007fff4e407f08 R14: 0000000000000000 R15: 0000000000000000 Nov 20 08:29:02 nas kernel: BUG: Bad page map in process php-fpm7 pte:ffff880f7acc5998 pmd:6b1c61067 Nov 20 08:29:02 nas kernel: addr:00000000c2da8294 vm_flags:00000075 anon_vma: (null) mapping:000000002c2a5265 index:15e Nov 20 08:29:02 nas kernel: file:libldap_r-2.4.so.2.10.9 fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage Nov 20 08:29:02 nas kernel: CPU: 7 PID: 6557 Comm: php-fpm7 Tainted: G B D W 4.18.17-unRAID #1 Nov 20 08:29:02 nas kernel: Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 Nov 20 08:29:02 nas kernel: Call Trace: Nov 20 08:29:02 nas kernel: dump_stack+0x5d/0x79 Nov 20 08:29:02 nas kernel: print_bad_pte+0x212/0x22f Nov 20 08:29:02 nas kernel: _vm_normal_page+0x50/0xa6 Nov 20 08:29:02 nas kernel: unmap_page_range+0x4b6/0x88a Nov 20 08:29:02 nas kernel: unmap_vmas+0x4b/0x7f Nov 20 08:29:02 nas kernel: exit_mmap+0xc8/0x16a Nov 20 08:29:02 nas kernel: mmput+0x4d/0xe5 Nov 20 08:29:02 nas kernel: do_exit+0x3a4/0x8a4 Nov 20 08:29:02 nas kernel: ? handle_mm_fault+0x159/0x1a8 Nov 20 08:29:02 nas kernel: do_group_exit+0x9a/0x9a Nov 20 08:29:02 nas kernel: __x64_sys_exit_group+0xf/0xf Nov 20 08:29:02 nas kernel: do_syscall_64+0x57/0xe6 Nov 20 08:29:02 nas kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Nov 20 08:29:02 nas kernel: RIP: 0033:0x14ed7bacbf9a Nov 20 08:29:02 nas kernel: Code: Bad RIP value. Nov 20 08:29:02 nas kernel: RSP: 002b:00007fff4e4075f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Nov 20 08:29:02 nas kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000014ed7bacbf9a Nov 20 08:29:02 nas kernel: RDX: 00000000000004ca RSI: 0000000000000000 RDI: 0000000000000000 Nov 20 08:29:02 nas kernel: RBP: 00000000000004ca R08: 00000000000000ca R09: f270ebec9d813dc2 Nov 20 08:29:02 nas kernel: R10: 000055661af14178 R11: 0000000000000246 R12: 000055661afccf40 Nov 20 08:29:02 nas kernel: R13: 00007fff4e407728 R14: 0000000000000001 R15: 000055661afcc640
  7. My syslog from today attached. Seems like this was the first occurrence this morning: Nov 20 05:01:50 nas kernel: WARNING: CPU: 6 PID: 23287 at net/netfilter/nf_conntrack_core.c:763 __nf_conntrack_confirm+0x96/0x4fc Nov 20 05:01:50 nas kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT ebtable_filter ebtables ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat macvlan ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat xfs nfsd lockd grace sunrpc md_mod ipmi_devintf bonding igb i2c_algo_bit intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper mpt3sas ahci libahci intel_cstate raid_class scsi_transport_sas ipmi_ssif i2c_i801 i2c_core pcc_cpufreq button ipmi_si acpi_cpufreq [last unloaded: i2c_algo_bit] Nov 20 05:01:50 nas kernel: CPU: 6 PID: 23287 Comm: kworker/6:1 Tainted: G B D W 4.18.17-unRAID #1 Nov 20 05:01:50 nas kernel: Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.1a 08/27/2015 Nov 20 05:01:50 nas kernel: Workqueue: events macvlan_process_broadcast [macvlan] Nov 20 05:01:50 nas kernel: RIP: 0010:__nf_conntrack_confirm+0x96/0x4fc Nov 20 05:01:50 nas kernel: Code: c1 ed 20 89 2c 24 e8 26 f7 ff ff 8b 54 24 04 89 ef 89 c6 41 89 c5 e8 bc f8 ff ff 84 c0 75 b9 49 8b 86 80 00 00 00 a8 08 74 02 <0f> 0b 4c 89 f7 e8 04 ff ff ff 49 8b 86 80 00 00 00 0f ba e0 09 73 Nov 20 05:01:50 nas kernel: RSP: 0018:ffff880fefd83d30 EFLAGS: 00010202 Nov 20 05:01:50 nas kernel: RAX: 0000000000000188 RBX: ffff880d928d3200 RCX: 0000000000000101 Nov 20 05:01:50 nas kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81e092a0 Nov 20 05:01:50 nas kernel: RBP: 000000000000d9d7 R08: 0000000009d52fd0 R09: 0000000000000000 Nov 20 05:01:50 nas kernel: R10: 0000000000000000 R11: ffff880d50e60000 R12: ffffffff81e8ccc0 Nov 20 05:01:50 nas kernel: R13: 0000000000006b28 R14: ffff880d9c5d63c0 R15: ffff880d9c5d6418 Nov 20 05:01:50 nas kernel: FS: 0000000000000000(0000) GS:ffff880fefd80000(0000) knlGS:0000000000000000 Nov 20 05:01:50 nas kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 20 05:01:50 nas kernel: CR2: 0000146bfa86f000 CR3: 0000000001e0a000 CR4: 00000000001006e0 Nov 20 05:01:50 nas kernel: Call Trace: Nov 20 05:01:50 nas kernel: <IRQ> Nov 20 05:01:50 nas kernel: ipv4_confirm+0xaf/0xb7 [nf_conntrack_ipv4] Nov 20 05:01:50 nas kernel: nf_hook_slow+0x37/0x96 Nov 20 05:01:50 nas kernel: ip_local_deliver+0xa7/0xd5 Nov 20 05:01:50 nas kernel: ? inet_del_offload+0x3e/0x3e Nov 20 05:01:50 nas kernel: ip_rcv+0x2dc/0x317 Nov 20 05:01:50 nas kernel: ? ip_local_deliver_finish+0x1aa/0x1aa Nov 20 05:01:50 nas kernel: __netif_receive_skb_core+0x6b2/0x740 Nov 20 05:01:50 nas kernel: process_backlog+0x7e/0x116 Nov 20 05:01:50 nas kernel: net_rx_action+0x10b/0x274 Nov 20 05:01:50 nas kernel: __do_softirq+0xce/0x1c8 Nov 20 05:01:50 nas kernel: do_softirq_own_stack+0x2a/0x40 Nov 20 05:01:50 nas kernel: </IRQ> Nov 20 05:01:50 nas kernel: do_softirq+0x4d/0x59 Nov 20 05:01:50 nas kernel: netif_rx_ni+0x1c/0x22 Nov 20 05:01:50 nas kernel: macvlan_broadcast+0x10f/0x153 [macvlan] Nov 20 05:01:50 nas kernel: ? __switch_to_asm+0x34/0x70 Nov 20 05:01:50 nas kernel: macvlan_process_broadcast+0xd5/0x131 [macvlan] Nov 20 05:01:50 nas kernel: process_one_work+0x16e/0x243 Nov 20 05:01:50 nas kernel: ? cancel_delayed_work_sync+0xa/0xa Nov 20 05:01:50 nas kernel: worker_thread+0x1dc/0x2ac Nov 20 05:01:50 nas kernel: kthread+0x10b/0x113 Nov 20 05:01:50 nas kernel: ? kthread_flush_work_fn+0x9/0x9 Nov 20 05:01:50 nas kernel: ret_from_fork+0x35/0x40 Nov 20 05:01:50 nas kernel: ---[ end trace cf2d1fc891b38b47 ]--- oldsyslog.txt
  8. I just caught one, I think I have the full dump now. This wasn't a kernel panic at least but all network activity stopped. This coincided with me trying to update my Unifi docker container. In fact, Unifi was already unresponsive before I started, so I figured I'd just update it. Then docker got about half way through to where it was shutting the container down and I lost network connectivity. I was able to console in locally (IPMI) and read the syslog and get screen shots, I copied the syslog to the USB stick and I'm waiting for it to reboot now, hoping it sticks.
  9. I've had a few, fairly consistent Kernel panics on most version of UNRAID v6.6.x, I can't say exactly when it started because I didn't always notice them, and I think I blamed a couple initially due to some power loss we had. Hardware is a Supermicro MBD-A1SRi-2758F-O Mini ITX Server Motherboard. It's been running just fine previously with Proxmox and previous versions of UNRAID. I also have all drives connected to an LSISAS2008 : FWVersion(18.00.00.00), ChipRevision(0x03), BiosVersion(07.35.00.00) (SAS9211-8I 8PORT Int 6GB Sata+sas Pcie 2.0) which I believe is used by a few people here. Additionally I have 64GB of ECC ram. All of this has been running solid for a few months (since at least December of last year to maybe a month or so ago). I've got 4 1-GB NICs aggregated together to a Unifi switch which is configured for aggregation. I have IPMI on this device, but I can only see the last bits of the console, what I've captured is attached, but i fear the useful information will be above those screens from what I can tell. I was using AFP on this server, I saw an article saying this might cause some issues so I've disabled AFP altogether (was using for TimeMachine backups). Only a few docker containers binhex-jenkins, duckdns, letsencrypt (not running), Netdata (not runing), unifi. And all up to date. I enabled the mcelog, or thought i did, with nerdtools but I don't see if running, I may need to something else. I also setup the CA Fix Common Problems and did everything it said to do. The usage on this system is pretty light. Right now its mostly just used for backups from a couple of Proxmox servers and their backup schedules don't seem to correlate with the panics...that I can tell. Is there a correct way to make sure mcelog is running and logging? What else can I do here to troubleshoot this? I just no updated to 6.6.5, would like to make any other settings changes now that I have a reboot and try and see if I can catch this thing and get some usable data.