Sinopsis

August 18, 2022

On 8/15/2022 at 9:53 AM, dustyken said:

Having same issue as @SockDust. Anyone know how to proceed?

I was able to solve this by starting another container with the version of mysql in the log file, then connecting to the container and shut down sql safely with the following command

mysqladmin shutdown -p

Then restart your other container with the latest tag or whatever

July 6, 2021

It's an old supermicro rackmount server, so that shouldn't be a problem

July 6, 2021

Just stop the array, pull the 2 parity drives, replace and start the array?

July 6, 2021

If I'm ok taking the risk, can i just pull both my parity drives and throw new ones in and let it rebuild them both @ the same time?

August 2, 2020

My trial is expired. What is the process for moving everything to a new USB before purchasing a license? Will anything be lost?

July 10, 2020

6 hours ago, Jerky_san said:

What type of VM are you running? Perhaps a fedora VM?

2 VMs currently active. 1 was a Windows Server 2019 and the other is Home Assistant (HassOS)

July 10, 2020

Was watching system log this time when it crashed....This was in it, and the console is a little different this time

Jul 9 23:29:17 SERVER1 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
Jul 9 23:29:17 SERVER1 kernel: PGD 0 P4D 0
Jul 9 23:29:17 SERVER1 kernel: Oops: 0000 [#1] SMP PTI
Jul 9 23:29:17 SERVER1 kernel: CPU: 5 PID: 3593 Comm: CPU 10/KVM Tainted: G W O 4.19.107-Unraid #1
Jul 9 23:29:17 SERVER1 kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.3 07/13/2018
Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm]
Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6
Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202
Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c
Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668
Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008
Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000
Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0
Jul 9 23:29:17 SERVER1 kernel: Call Trace:
Jul 9 23:29:17 SERVER1 kernel: kvm_zap_rmapp+0x3a/0x5e [kvm]
Jul 9 23:29:17 SERVER1 kernel: ? kvm_io_bus_read+0x43/0xcc [kvm]
Jul 9 23:29:17 SERVER1 kernel: kvm_unmap_rmapp+0x5/0x9 [kvm]
Jul 9 23:29:17 SERVER1 kernel: kvm_handle_hva_range+0x11c/0x159 [kvm]
Jul 9 23:29:17 SERVER1 kernel: ? kvm_zap_rmapp+0x5e/0x5e [kvm]
Jul 9 23:29:17 SERVER1 kernel: kvm_mmu_notifier_invalidate_range_start+0x49/0x8f [kvm]
Jul 9 23:29:17 SERVER1 kernel: __mmu_notifier_invalidate_range_start+0x78/0xc9
Jul 9 23:29:17 SERVER1 kernel: change_protection+0x300/0x879
Jul 9 23:29:17 SERVER1 kernel: change_prot_numa+0x13/0x22
Jul 9 23:29:17 SERVER1 kernel: task_numa_work+0x20b/0x2b5
Jul 9 23:29:17 SERVER1 kernel: task_work_run+0x77/0x88
Jul 9 23:29:17 SERVER1 kernel: exit_to_usermode_loop+0x4b/0xa2
Jul 9 23:29:17 SERVER1 kernel: do_syscall_64+0xdf/0xf2
Jul 9 23:29:17 SERVER1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 9 23:29:17 SERVER1 kernel: RIP: 0033:0x152a3f5e14b7
Jul 9 23:29:17 SERVER1 kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48
Jul 9 23:29:17 SERVER1 kernel: RSP: 002b:0000152a383fe678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 9 23:29:17 SERVER1 kernel: RAX: 0000000000000000 RBX: 000000000000ae80 RCX: 0000152a3f5e14b7
Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001f
Jul 9 23:29:17 SERVER1 kernel: RBP: 0000152a3988a2c0 R08: 000055c2583d0770 R09: 000000000000ffff
Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: R13: 0000152a3dcc0002 R14: 0000000000001072 R15: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm cdc_acm ccp xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables xt_nat veth macvlan ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod ixgbe(O) sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper isci ipmi_ssif intel_cstate mpt3sas nvme libsas i2c_i801 ahci raid_class pcc_cpufreq scsi_transport_sas intel_uncore i2c_core intel_rapl_perf nvme_core libahci wmi ipmi_si button [last unloaded: tun]
Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040
Jul 9 23:29:17 SERVER1 kernel: ---[ end trace 1c4b462ac4b3e0e1 ]---
Jul 9 23:29:17 SERVER1 kernel: RIP: 0010:drop_spte+0x4b/0x78 [kvm]
Jul 9 23:29:17 SERVER1 kernel: Code: 4c 01 e0 72 09 ba ff ee 00 00 48 c1 e2 1f 48 01 d0 ba f5 ff 7f 00 4c 89 e6 48 c1 e8 0c 48 c1 e2 29 48 c1 e0 06 48 8b 54 10 28 <48> 2b 72 40 48 89 d7 48 c1 fe 03 e8 63 d6 ff ff 48 89 ef 48 89 c6
Jul 9 23:29:17 SERVER1 kernel: RSP: 0018:ffffc9000ce53c50 EFLAGS: 00010202
Jul 9 23:29:17 SERVER1 kernel: RAX: 000000007f20a640 RBX: ffffc900243250e0 RCX: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: RDX: 0000000000000000 RSI: ffff889fc8299668 RDI: 7fffc4408733186c
Jul 9 23:29:17 SERVER1 kernel: RBP: ffffc9000cb14000 R08: 0000000000000001 R09: 0000000000000000
Jul 9 23:29:17 SERVER1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fc8299668
Jul 9 23:29:17 SERVER1 kernel: R13: 0000000000000000 R14: ffff8884a1450000 R15: ffff8884a1450008
Jul 9 23:29:17 SERVER1 kernel: FS: 0000152a383ff700(0000) GS:ffff889fff940000(0000) knlGS:0000000000000000
Jul 9 23:29:17 SERVER1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 9 23:29:17 SERVER1 kernel: CR2: 0000000000000040 CR3: 0000000124c1e005 CR4: 00000000000626e0

July 10, 2020

Not sure if this is somehow related, but two times today while mover was running, i started getting tons of errors like this:

Jul 9 17:00:59 SERVER1 move: move: create_parent: /mnt/cache/media/Movies/The Fifth Element (1997) (PG-13)/extrafanart error: Read-only file system

July 9, 2020

1 minute ago, jonp said:

Ok, to be fair, Hyper-V and KVM are not anywhere close on the spectrum of hypervisors and if other underlying gear changed (including the HBA and storage), that obviously could have an impact. What about BIOS updates? Any available? Another thing you could try would be to disable IOMMU in the BIOS to see if that has any impact.

For sure...HyperV is rather lacking....although, to be fair, if it had USB pass through, I probably would have just left it as a Windows box on a RAID10 volume I'm much more comfortable with M$.

No, it has the most current bios update, from 7/2017. And I think the only thing that update addressed was the Spectre vulnerability.

I'll try moving it off 0,12 and see if its more stable. If it crashes again, I'll swap the usb and disks to the 2nd box and move those box's components to this box to see if I experience the same behavior. If so, I'll try disabling IOMMU (not familiar with that)

July 9, 2020

1 minute ago, jonp said:

Wow, that's pretty concerning. If there is no hardware pass-through happening and you're getting these kinds of crashes, it leads me to believe a buggy BIOS on your hardware. What is the underlying hardware on this system?

I pulled a pair of these out of our datacenter and brought them home:

https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRH-7F.cfm

They were rock solid as our HyperV hypervisors for several years with no issues.

The only difference is that I can think of is I've flashed the onboard LSI 2208 to be 2308 HBA instead.

July 9, 2020

2 minutes ago, jonp said:

Ok, what happens if you path the storage to something other than that PCIe NVMe Unassigned Device? Again, the goal here is to narrow down the root cause or what combination is causing it.

Another thing you could try would be changing the Machine Type or the BIOS type to see if that has an affect.

I had crashes before with the default path (on the cache mount), but couldn't get the console to come up via IPMI in the previous crashes, so was unable to see the call stack. This is the first time it's crashed and I was able to not only see the console, but interact with..could login and use the cli, but had no network connectivity. I couldn't shutdown the VM gracefully or even force shut it down.

I hate trying to troubleshoot problems that I can't reproduce to test

July 9, 2020

3 minutes ago, Jerky_san said:

Spin locks I believe are when something just constantly sits there waiting on something. I can say your allocating core0/12. I would say don't do that because unraid will ALWAYS use core0/12 even when you try isolating it. It just doesn't work so I'd highly suggest removing that one. I'm still researching but who knows it might fix it lol

Ok, I've unselected cpu 0/12 from the vm. The crashes are pretty random and don't seem to follow any pattern that I can see.

Unrelated, should we also try to prevent docker from running on 0/12 ?

July 9, 2020

3 minutes ago, jonp said:

Hi there,

Are you trying to pass through the NVMe drive to the VM directly? If so, try not doing that and see if you can reproduce the lockup. If so, then the issue stems from the underlying hardware/VM configuration. If the issue goes away, then you know it's isolated to that PCIe device.

No, I'm not trying to pass it through. I just have my VM storage set to the unassigned device that happens to be that PCIe NVMe drive. In my case, thats: /mnt/disks/VirtualMachines/

July 9, 2020

Update:

If I'm reading this correctly:

root@SERVER1:/sys# lscpu --all --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 2500.0000 1200.0000
  1    0      0    1 1:1:1:0          yes 2500.0000 1200.0000
  2    0      0    2 2:2:2:0          yes 2500.0000 1200.0000
  3    0      0    3 3:3:3:0          yes 2500.0000 1200.0000
  4    0      0    4 4:4:4:0          yes 2500.0000 1200.0000
  5    0      0    5 5:5:5:0          yes 2500.0000 1200.0000
  6    1      1    6 6:6:6:1          yes 2500.0000 1200.0000
  7    1      1    7 7:7:7:1          yes 2500.0000 1200.0000
  8    1      1    8 8:8:8:1          yes 2500.0000 1200.0000
  9    1      1    9 9:9:9:1          yes 2500.0000 1200.0000
 10    1      1   10 10:10:10:1       yes 2500.0000 1200.0000
 11    1      1   11 11:11:11:1       yes 2500.0000 1200.0000
 12    0      0    0 0:0:0:0          yes 2500.0000 1200.0000
 13    0      0    1 1:1:1:0          yes 2500.0000 1200.0000
 14    0      0    2 2:2:2:0          yes 2500.0000 1200.0000
 15    0      0    3 3:3:3:0          yes 2500.0000 1200.0000
 16    0      0    4 4:4:4:0          yes 2500.0000 1200.0000
 17    0      0    5 5:5:5:0          yes 2500.0000 1200.0000
 18    1      1    6 6:6:6:1          yes 2500.0000 1200.0000
 19    1      1    7 7:7:7:1          yes 2500.0000 1200.0000
 20    1      1    8 8:8:8:1          yes 2500.0000 1200.0000
 21    1      1    9 9:9:9:1          yes 2500.0000 1200.0000
 22    1      1   10 10:10:10:1       yes 2500.0000 1200.0000
 23    1      1   11 11:11:11:1       yes 2500.0000 1200.0000
root@SERVER1:/sys#

Then the logical cpu selection corresponds to:

0,12, 1,13, 2,14, 3,15, 4,16, 5,17 are physical cpu #1

6,18, 7,19, 8,20, 9,21, 10,22, 11,23 are physical cpu #2

Which makes sense, but shoots a hole in my theory about the pcie bus

July 9, 2020

I feel pretty confident that the lockups have to do with the vms. I rebuilt this box and right now only have one vm on it. I see kvm references in the call stack on the crash information.

My first thought is that maybe the storage that the vm is on might be plugged into a pcie lane that is connected to different physical cpu maybe?

It's on an Intel i750 PCIE NvME drive plugged into PCIE Slot 2, which according to the diagram on page 1-4 of this manual: https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1306.pdf Should be CPU1

In the attached "capture.png", which cpu's might be physical cpu 1 and which might be physical cpu 2?

July 3, 2020

I see the benefits, but as someone who primarily deals with enterprise systems, I prefer to have direct support for products I pay for, even if I have to pay more.

Maybe I'm being overly critical because I'm frustrated and having so many issues with the system (besides this one that I posted about which is just an annoyance more than anything).

Random hard locks - have to power cycle the server to get it back. I just pulled this server out of our datacenter where it was one of our primary hypervisors and had been rock solid for years.

Active directory integration seems completely broken. Every time it reboots it shows as "unjoined", and the logs are full of "root: chown: invalid user: 'Domain Admins:Domain Users'" errors when it finally does show joined.

I've got about 10 days left on this trial, and at this point I'm considering just scrapping it completely and just using ProxMox and the hardware raid controller. I liked the idea of not having to have all 28 disks ( 2 different servers - same specs) spun up the majority of the time which is why I was even looking at this.

July 3, 2020

Got it, so no support from the actual company that charges for this...Seems like all of you are working for free and they're reaping all the benefits.

June 27, 2020

5 hours ago, johnnie.black said:

That would better asked on the UD support thread, but no Unraid flash drive should ever appear as an unassigned device.

Maybe, but that is ALSO NOT WHAT I POSTED ABOUT. I posted asking why the installer is only partitioning half of my flash drive. Is this the level of support I should expect if I decide to purchase a license?

June 27, 2020

Reformatting it might solve the fat_free_clusters error, but it wont solve the issue I posted about, nor would it explain why I see the same behavior on two different machines with 2 different flash drives.

June 26, 2020

Attached

homeserver-diagnostics-20200626-1419.zip

June 26, 2020

1 hour ago, trurl said:

If your boot flash is showing up in Unassigned Devices it has already disconnected.

Make sure you are booting from USB2 port.

It's shown in unassigned devices since the initial install, on both servers. The servers only have usb 2.0 ports (they're older supermicro servers)

June 26, 2020

I've been running trial now for about 2 weeks on a brand new https://amzn.to/2Yz2Amc

I'm getting flash write errors. It's also only showing that its 16GB in unassigned disks, but the fdisk -l output is below:

Disk /dev/sda: 28.67 GiB, 30765219840 bytes, 60088320 sectors
Disk model: Cruzer Fit      
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start      End  Sectors  Size Id Type
/dev/sda1  *     2048 60088319 60086272 28.7G  c W95 FAT32 (LBA)

I bought a 2nd of the same flash and am seeing the same thing on a second server.

Sinopsis

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by Sinopsis

[support] Bungy's docker repository

replace parity drives

replace parity drives

replace parity drives

Trial Expired - New USB before purchase

System lockups

System lockups

System lockups

System lockups

System lockups

System lockups

System lockups

System lockups

System lockups

System lockups

Issues with trial

Issues with trial

Issues with trial

Issues with trial

Issues with trial

Issues with trial

Issues with trial