Sinopsis

Members
  • Posts

    22
  • Joined

  • Last visited

Everything posted by Sinopsis

  1. I was able to solve this by starting another container with the version of mysql in the log file, then connecting to the container and shut down sql safely with the following command mysqladmin shutdown -p Then restart your other container with the latest tag or whatever
  2. It's an old supermicro rackmount server, so that shouldn't be a problem
  3. Just stop the array, pull the 2 parity drives, replace and start the array?
  4. If I'm ok taking the risk, can i just pull both my parity drives and throw new ones in and let it rebuild them both @ the same time?
  5. My trial is expired. What is the process for moving everything to a new USB before purchasing a license? Will anything be lost?
  6. 2 VMs currently active. 1 was a Windows Server 2019 and the other is Home Assistant (HassOS)
  7. Was watching system log this time when it crashed....This was in it, and the console is a little different this time Jul 9 23:29:17 SERVER1 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 Jul 9 23:29:17 SERVER1 kernel: PGD 0 P4D 0 Jul 9 23:29:17 SERVER1 kernel: Oops: 0000 [#1] SMP PTI Jul 9 23:29:17 SERVER1 kernel: CPU: 5 PID: 3593 Comm: CPU 10/KVM Tainted: G W O 4.19.107-Unraid #1 Jul 9 23:29:17 SERVER1 kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.3 07/13/2018 Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm] Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6 Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202 Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668 Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008 Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000 Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0 Jul 9 23:29:17 SERVER1 kernel: Call Trace: Jul 9 23:29:17 SERVER1 kernel: kvm_zap_rmapp+0x3a/0x5e [kvm] Jul 9 23:29:17 SERVER1 kernel: ? kvm_io_bus_read+0x43/0xcc [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_unmap_rmapp+0x5/0x9 [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_handle_hva_range+0x11c/0x159 [kvm] Jul 9 23:29:17 SERVER1 kernel: ? kvm_zap_rmapp+0x5e/0x5e [kvm] Jul 9 23:29:17 SERVER1 kernel: kvm_mmu_notifier_invalidate_range_start+0x49/0x8f [kvm] Jul 9 23:29:17 SERVER1 kernel: __mmu_notifier_invalidate_range_start+0x78/0xc9 Jul 9 23:29:17 SERVER1 kernel: change_protection+0x300/0x879 Jul 9 23:29:17 SERVER1 kernel: change_prot_numa+0x13/0x22 Jul 9 23:29:17 SERVER1 kernel: task_numa_work+0x20b/0x2b5 Jul 9 23:29:17 SERVER1 kernel: task_work_run+0x77/0x88 Jul 9 23:29:17 SERVER1 kernel: exit_to_usermode_loop+0x4b/0xa2 Jul 9 23:29:17 SERVER1 kernel: do_syscall_64+0xdf/0xf2 Jul 9 23:29:17 SERVER1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 9 23:29:17 SERVER1 kernel: RIP: 0033:0x152a3f5e14b7 Jul 9 23:29:17 SERVER1 kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48 Jul 9 23:29:17 SERVER1 kernel: RSP: 002b:0000152a383fe678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jul 9 23:29:17 SERVER1 kernel: RAX: 0000000000000000 RBX: 000000000000ae80 RCX: 0000152a3f5e14b7 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001f Jul 9 23:29:17 SERVER1 kernel: RBP: 0000152a3988a2c0 R08: 000055c2583d0770 R09: 000000000000ffff Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R13: 0000152a3dcc0002 R14: 0000000000001072 R15: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm cdc_acm ccp xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables xt_nat veth macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod ixgbe(O) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper isci ipmi_ssif intel_cstate mpt3sas nvme libsas i2c_i801 ahci raid_class pcc_cpufreq scsi_transport_sas intel_uncore i2c_core intel_rapl_perf nvme_core libahci wmi ipmi_si button [last unloaded: tun] Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 Jul 9 23:29:17 SERVER1 kernel: ---[ end trace 1c4b462ac4b3e0e1 ]--- Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm] Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6 Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202 Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000 Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668 Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008 Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000 Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0
  8. Not sure if this is somehow related, but two times today while mover was running, i started getting tons of errors like this: Jul 9 17:00:59 SERVER1 move: move: create_parent: /mnt/cache/media/Movies/The Fifth Element (1997) (PG-13)/extrafanart error: Read-only file system
  9. For sure...HyperV is rather lacking....although, to be fair, if it had USB pass through, I probably would have just left it as a Windows box on a RAID10 volume I'm much more comfortable with M$. No, it has the most current bios update, from 7/2017. And I think the only thing that update addressed was the Spectre vulnerability. I'll try moving it off 0,12 and see if its more stable. If it crashes again, I'll swap the usb and disks to the 2nd box and move those box's components to this box to see if I experience the same behavior. If so, I'll try disabling IOMMU (not familiar with that)
  10. I pulled a pair of these out of our datacenter and brought them home: https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRH-7F.cfm They were rock solid as our HyperV hypervisors for several years with no issues. The only difference is that I can think of is I've flashed the onboard LSI 2208 to be 2308 HBA instead.
  11. I had crashes before with the default path (on the cache mount), but couldn't get the console to come up via IPMI in the previous crashes, so was unable to see the call stack. This is the first time it's crashed and I was able to not only see the console, but interact with..could login and use the cli, but had no network connectivity. I couldn't shutdown the VM gracefully or even force shut it down. I hate trying to troubleshoot problems that I can't reproduce to test
  12. Ok, I've unselected cpu 0/12 from the vm. The crashes are pretty random and don't seem to follow any pattern that I can see. Unrelated, should we also try to prevent docker from running on 0/12 ?
  13. No, I'm not trying to pass it through. I just have my VM storage set to the unassigned device that happens to be that PCIe NVMe drive. In my case, thats: /mnt/disks/VirtualMachines/
  14. Update: If I'm reading this correctly: root@SERVER1:/sys# lscpu --all --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 yes 2500.0000 1200.0000 1 0 0 1 1:1:1:0 yes 2500.0000 1200.0000 2 0 0 2 2:2:2:0 yes 2500.0000 1200.0000 3 0 0 3 3:3:3:0 yes 2500.0000 1200.0000 4 0 0 4 4:4:4:0 yes 2500.0000 1200.0000 5 0 0 5 5:5:5:0 yes 2500.0000 1200.0000 6 1 1 6 6:6:6:1 yes 2500.0000 1200.0000 7 1 1 7 7:7:7:1 yes 2500.0000 1200.0000 8 1 1 8 8:8:8:1 yes 2500.0000 1200.0000 9 1 1 9 9:9:9:1 yes 2500.0000 1200.0000 10 1 1 10 10:10:10:1 yes 2500.0000 1200.0000 11 1 1 11 11:11:11:1 yes 2500.0000 1200.0000 12 0 0 0 0:0:0:0 yes 2500.0000 1200.0000 13 0 0 1 1:1:1:0 yes 2500.0000 1200.0000 14 0 0 2 2:2:2:0 yes 2500.0000 1200.0000 15 0 0 3 3:3:3:0 yes 2500.0000 1200.0000 16 0 0 4 4:4:4:0 yes 2500.0000 1200.0000 17 0 0 5 5:5:5:0 yes 2500.0000 1200.0000 18 1 1 6 6:6:6:1 yes 2500.0000 1200.0000 19 1 1 7 7:7:7:1 yes 2500.0000 1200.0000 20 1 1 8 8:8:8:1 yes 2500.0000 1200.0000 21 1 1 9 9:9:9:1 yes 2500.0000 1200.0000 22 1 1 10 10:10:10:1 yes 2500.0000 1200.0000 23 1 1 11 11:11:11:1 yes 2500.0000 1200.0000 root@SERVER1:/sys# Then the logical cpu selection corresponds to: 0,12, 1,13, 2,14, 3,15, 4,16, 5,17 are physical cpu #1 6,18, 7,19, 8,20, 9,21, 10,22, 11,23 are physical cpu #2 Which makes sense, but shoots a hole in my theory about the pcie bus
  15. I feel pretty confident that the lockups have to do with the vms. I rebuilt this box and right now only have one vm on it. I see kvm references in the call stack on the crash information. My first thought is that maybe the storage that the vm is on might be plugged into a pcie lane that is connected to different physical cpu maybe? It's on an Intel i750 PCIE NvME drive plugged into PCIE Slot 2, which according to the diagram on page 1-4 of this manual: https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1306.pdf Should be CPU1 In the attached "capture.png", which cpu's might be physical cpu 1 and which might be physical cpu 2?
  16. I see the benefits, but as someone who primarily deals with enterprise systems, I prefer to have direct support for products I pay for, even if I have to pay more. Maybe I'm being overly critical because I'm frustrated and having so many issues with the system (besides this one that I posted about which is just an annoyance more than anything). Random hard locks - have to power cycle the server to get it back. I just pulled this server out of our datacenter where it was one of our primary hypervisors and had been rock solid for years. Active directory integration seems completely broken. Every time it reboots it shows as "unjoined", and the logs are full of "root: chown: invalid user: 'Domain Admins:Domain Users'" errors when it finally does show joined. I've got about 10 days left on this trial, and at this point I'm considering just scrapping it completely and just using ProxMox and the hardware raid controller. I liked the idea of not having to have all 28 disks ( 2 different servers - same specs) spun up the majority of the time which is why I was even looking at this.
  17. Got it, so no support from the actual company that charges for this...Seems like all of you are working for free and they're reaping all the benefits.
  18. Maybe, but that is ALSO NOT WHAT I POSTED ABOUT. I posted asking why the installer is only partitioning half of my flash drive. Is this the level of support I should expect if I decide to purchase a license?
  19. Reformatting it might solve the fat_free_clusters error, but it wont solve the issue I posted about, nor would it explain why I see the same behavior on two different machines with 2 different flash drives.
  20. Attached homeserver-diagnostics-20200626-1419.zip
  21. It's shown in unassigned devices since the initial install, on both servers. The servers only have usb 2.0 ports (they're older supermicro servers)
  22. I've been running trial now for about 2 weeks on a brand new https://amzn.to/2Yz2Amc I'm getting flash write errors. It's also only showing that its 16GB in unassigned disks, but the fdisk -l output is below: Disk /dev/sda: 28.67 GiB, 30765219840 bytes, 60088320 sectors Disk model: Cruzer Fit Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x00000000 Device Boot Start End Sectors Size Id Type /dev/sda1 * 2048 60088319 60086272 28.7G c W95 FAT32 (LBA) I bought a 2nd of the same flash and am seeing the same thing on a second server.