Sinopsis

Members
  • Posts

    22
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Sinopsis's Achievements

Noob

Noob (1/14)

1

Reputation

  1. I was able to solve this by starting another container with the version of mysql in the log file, then connecting to the container and shut down sql safely with the following command mysqladmin shutdown -p Then restart your other container with the latest tag or whatever
  2. It's an old supermicro rackmount server, so that shouldn't be a problem
  3. Just stop the array, pull the 2 parity drives, replace and start the array?
  4. If I'm ok taking the risk, can i just pull both my parity drives and throw new ones in and let it rebuild them both @ the same time?
  5. My trial is expired. What is the process for moving everything to a new USB before purchasing a license? Will anything be lost?
  6. 2 VMs currently active. 1 was a Windows Server 2019 and the other is Home Assistant (HassOS)
  7. Was watching system log this time when it crashed....This was in it, and the console is a little different this time Jul 9 23:29:17 SERVER1 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Jul 9 23:29:17 SERVER1 kernel: PGD 0 P4D 0 Jul 9 23:29:17 SERVER1 kernel: Oops: 0000 [#1] SMP PTI Jul 9 23:29:17 SERVER1 kernel: CPU: 5 PID: 3593 Comm: CPU 10/KVM Tainted: G W O 4.19.107-Unraid #1 Jul 9 23:29:17 SERVER1 kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.3 07/13/2018 Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm] Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6 Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202 Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668 Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008 Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000 Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0 Jul 9 23:29:17 SERVER1 kernel: Call Trace: Jul 9 23:29:17 SERVER1 kernel: kvm_zap_rmapp+0x3a/0x5e [kvm] Jul 9 23:29:17 SERVER1 kernel: ? kvm_io_bus_read+0x43/0xcc [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_unmap_rmapp+0x5/0x9 [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_handle_hva_range+0x11c/0x159 [kvm] Jul 9 23:29:17 SERVER1 kernel: ? kvm_zap_rmapp+0x5e/0x5e [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_mmu_notifier_invalidate_range_start+0x49/0x8f [kvm] Jul 9 23:29:17 SERVER1 kernel: __mmu_notifier_invalidate_range_start+0x78/0xc9 Jul 9 23:29:17 SERVER1 kernel: change_protection+0x300/0x879 Jul 9 23:29:17 SERVER1 kernel: change_prot_numa+0x13/0x22 Jul 9 23:29:17 SERVER1 kernel: task_numa_work+0x20b/0x2b5 Jul 9 23:29:17 SERVER1 kernel: task_work_run+0x77/0x88 Jul 9 23:29:17 SERVER1 kernel: exit_to_usermode_loop+0x4b/0xa2 Jul 9 23:29:17 SERVER1 kernel: do_syscall_64+0xdf/0xf2 Jul 9 23:29:17 SERVER1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 9 23:29:17 SERVER1 kernel: RIP: 0033:0x152a3f5e14b7 Jul 9 23:29:17 SERVER1 kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48 Jul 9 23:29:17 SERVER1 kernel: RSP: 002b:0000152a383fe678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jul 9 23:29:17 SERVER1 kernel: RAX: 0000000000000000 RBX: 000000000000ae80 RCX: 0000152a3f5e14b7 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001f Jul 9 23:29:17 SERVER1 kernel: RBP: 0000152a3988a2c0 R08: 000055c2583d0770 R09: 000000000000ffff Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R13: 0000152a3dcc0002 R14: 0000000000001072 R15: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm cdc_acm ccp xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables xt_nat veth macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod ixgbe(O) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper isci ipmi_ssif intel_cstate mpt3sas nvme libsas i2c_i801 ahci raid_class pcc_cpufreq scsi_transport_sas intel_uncore i2c_core intel_rapl_perf nvme_core libahci wmi ipmi_si button [last unloaded: tun] Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 Jul 9 23:29:17 SERVER1 kernel: ---[ end trace 1c4b462ac4b3e0e1 ]--- Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm] Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6 Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202 Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668 Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008 Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000 Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0
  8. Not sure if this is somehow related, but two times today while mover was running, i started getting tons of errors like this: Jul 9 17:00:59 SERVER1 move: move: create_parent: /mnt/cache/media/Movies/The Fifth Element (1997) (PG-13)/extrafanart error: Read-only file system
  9. For sure...HyperV is rather lacking....although, to be fair, if it had USB pass through, I probably would have just left it as a Windows box on a RAID10 volume I'm much more comfortable with M$. No, it has the most current bios update, from 7/2017. And I think the only thing that update addressed was the Spectre vulnerability. I'll try moving it off 0,12 and see if its more stable. If it crashes again, I'll swap the usb and disks to the 2nd box and move those box's components to this box to see if I experience the same behavior. If so, I'll try disabling IOMMU (not familiar with that)
  10. I pulled a pair of these out of our datacenter and brought them home: https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRH-7F.cfm They were rock solid as our HyperV hypervisors for several years with no issues. The only difference is that I can think of is I've flashed the onboard LSI 2208 to be 2308 HBA instead.
  11. I had crashes before with the default path (on the cache mount), but couldn't get the console to come up via IPMI in the previous crashes, so was unable to see the call stack. This is the first time it's crashed and I was able to not only see the console, but interact with..could login and use the cli, but had no network connectivity. I couldn't shutdown the VM gracefully or even force shut it down. I hate trying to troubleshoot problems that I can't reproduce to test
  12. Ok, I've unselected cpu 0/12 from the vm. The crashes are pretty random and don't seem to follow any pattern that I can see. Unrelated, should we also try to prevent docker from running on 0/12 ?
  13. No, I'm not trying to pass it through. I just have my VM storage set to the unassigned device that happens to be that PCIe NVMe drive. In my case, thats: /mnt/disks/VirtualMachines/
  14. Update: If I'm reading this correctly: root@SERVER1:/sys# lscpu --all --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 yes 2500.0000 1200.0000 1 0 0 1 1:1:1:0 yes 2500.0000 1200.0000 2 0 0 2 2:2:2:0 yes 2500.0000 1200.0000 3 0 0 3 3:3:3:0 yes 2500.0000 1200.0000 4 0 0 4 4:4:4:0 yes 2500.0000 1200.0000 5 0 0 5 5:5:5:0 yes 2500.0000 1200.0000 6 1 1 6 6:6:6:1 yes 2500.0000 1200.0000 7 1 1 7 7:7:7:1 yes 2500.0000 1200.0000 8 1 1 8 8:8:8:1 yes 2500.0000 1200.0000 9 1 1 9 9:9:9:1 yes 2500.0000 1200.0000 10 1 1 10 10:10:10:1 yes 2500.0000 1200.0000 11 1 1 11 11:11:11:1 yes 2500.0000 1200.0000 12 0 0 0 0:0:0:0 yes 2500.0000 1200.0000 13 0 0 1 1:1:1:0 yes 2500.0000 1200.0000 14 0 0 2 2:2:2:0 yes 2500.0000 1200.0000 15 0 0 3 3:3:3:0 yes 2500.0000 1200.0000 16 0 0 4 4:4:4:0 yes 2500.0000 1200.0000 17 0 0 5 5:5:5:0 yes 2500.0000 1200.0000 18 1 1 6 6:6:6:1 yes 2500.0000 1200.0000 19 1 1 7 7:7:7:1 yes 2500.0000 1200.0000 20 1 1 8 8:8:8:1 yes 2500.0000 1200.0000 21 1 1 9 9:9:9:1 yes 2500.0000 1200.0000 22 1 1 10 10:10:10:1 yes 2500.0000 1200.0000 23 1 1 11 11:11:11:1 yes 2500.0000 1200.0000 root@SERVER1:/sys# Then the logical cpu selection corresponds to: 0,12, 1,13, 2,14, 3,15, 4,16, 5,17 are physical cpu #1 6,18, 7,19, 8,20, 9,21, 10,22, 11,23 are physical cpu #2 Which makes sense, but shoots a hole in my theory about the pcie bus
  15. I feel pretty confident that the lockups have to do with the vms. I rebuilt this box and right now only have one vm on it. I see kvm references in the call stack on the crash information. My first thought is that maybe the storage that the vm is on might be plugged into a pcie lane that is connected to different physical cpu maybe? It's on an Intel i750 PCIE NvME drive plugged into PCIE Slot 2, which according to the diagram on page 1-4 of this manual: https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1306.pdf Should be CPU1 In the attached "capture.png", which cpu's might be physical cpu 1 and which might be physical cpu 2?