Recurrent CPU-tainted errors with Sys-log filling rapidly


Recommended Posts

12 minutes ago, Aceriz said:

But again my understanding was that the stubbing occurred after the boot process had already grabbed a graphics card... hence need for onboard...  

The stubbing is a kernel parameter which is passed at the time the kernel is loaded. So the kernel shouldn't try to touch that card, besides binding vfio to it so nothing else will use it, besides the VM.

20 minutes ago, Aceriz said:

I apologize if this is a stupid question. 

no stupid questions

Link to comment
9 minutes ago, Aceriz said:

unable to load nvidia --- something

It might have been related to when you were using the nvidia build, or it could be from the plugin. if you don't plan on ever using that build again then you can delete the plugin.

 

11 minutes ago, Aceriz said:

Ya,  It is a bit hidden, but it has on-onboard VGA--  Not anything fancy mind you.... lol

I see, yes I would have never noticed that!

 

So I had another thought just now, you probably need to update both the BIOS **and** the BMC firmware at the same time for the onboard video to work again.  And then hunt through the BIOS menus for that hidden option.

Link to comment
9 minutes ago, civic95man said:

The stubbing is a kernel parameter which is passed at the time the kernel is loaded. So the kernel shouldn't try to touch that card, besides binding vfio to it so nothing else will use it, besides the VM.

no stupid questions

Thanks:) 

 

So I rebooted the server to try and capture that weird nvidia note that i saw and it was the following

 

"/bin/bash: line 14:  nvidia-smi: command not found"

 

This occurred right after the Starting Samba,  and just before the transition to the GUI login page pops up.. 

Again I had installed the unraid default 6.8.3  from the nvidia plugin-- i didn't uninstall the plugin but am happy to do that if needed... 

 

Link to comment
1 minute ago, civic95man said:

So I had another thought just now, you probably need to update both the BIOS **and** the BMC firmware at the same time for the onboard video to work again.  And then hunt through the BIOS menus for that hidden option.

Okay I can definitely do a Bios update... That I have gotten comfortable with.. Will have to do a bit of reading to figure out how to go about a firmware update... (any general tips would be much appreciated :) )   Is there an order that I should do them in.?   ie bios first then firmware. or other way around 

Link to comment
2 minutes ago, Aceriz said:

"/bin/bash: line 14:  nvidia-smi: command not found"

That's from the nvidia plugin which calls the nvidia-smi to get a listing of the available cards.  Since you're on stock, that program is nolonger there, hence the error.  Its harmless. It will go away if you uninstall the plugin.

Link to comment
Just now, civic95man said:

That's from the nvidia plugin which calls the nvidia-smi to get a listing of the available cards.  Since you're on stock, that program is nolonger there, hence the error.  Its harmless. It will go away if you uninstall the plugin.

I have uninstalled... At this point no plan to need it.. (main reason I had installed plugin was to pass gpu through to plex... but realized need plex pass lol).  also I always have my VM on... 

Link to comment
2 minutes ago, Aceriz said:

Okay I can definitely do a Bios update... That I have gotten comfortable with.. Will have to do a bit of reading to figure out how to go about a firmware update... (any general tips would be much appreciated :) )   Is there an order that I should do them in.?   ie bios first then firmware. or other way around

I didn't see anything listed for an order so i can only assume it doesn't matter.  Mine (supermicro mobo) was accessed via the BMC IP address.  Within that, it gave me the option to both update the bios there and the firmware.  It may also include it's own installer/updater (haven't checked) but it would most likely require a windows/dos environment.

Link to comment
On 7/8/2020 at 2:44 PM, civic95man said:

I didn't see anything listed for an order so i can only assume it doesn't matter.  Mine (supermicro mobo) was accessed via the BMC IP address.  Within that, it gave me the option to both update the bios there and the firmware.  It may also include it's own installer/updater (haven't checked) but it would most likely require a windows/dos environment.

 

So I have tried over the last day to do the update to the BIOS, unfortunately it didn't work setup wise (could not use GUI on the onboard with updated bios/firmware, and have VM running), as well continued to have the errors in the syslog..   So I reverted back to the older bios I was on.... with the change to having the GPU stubbed... 

 

I have attached the syslog.  It doen't appear to be filling as quickly..but might need to monitor for a few more days.. 

 

I might end-up still eventually going back to the latest bios just as while there noticed a setting which helped with performance....lol.. 

need to figure out how to do GUI or similar without need for second computer... any suggestions would be great).  if it helps I use GUI mainly to reset VM if they freeze up.  or to do things like close VM load kruisader and make copy of vm v-disk etc... )

 

 

 

 

 

rizznetunraid-syslog-20200710-0117.zip

Link to comment
2 hours ago, Aceriz said:

as well continued to have the errors in the syslog

Do these error only come up when the VM is active?

 

2 hours ago, Aceriz said:

any suggestions would be great

maybe a second cheap video card if you have room on the board

 

have you tried the 6.9 beta yet?  I would only proceed down that route with caution as it isn't declared stable yet. Maybe 6.9beta1 so you aren't messing with pools yet.  

 

Last resort you could use a phone or tablet if available to manage the VMs

Link to comment
  • 6 months later...
On 7/10/2020 at 1:13 AM, civic95man said:

Do these error only come up when the VM is active?

 

maybe a second cheap video card if you have room on the board

 

have you tried the 6.9 beta yet?  I would only proceed down that route with caution as it isn't declared stable yet. Maybe 6.9beta1 so you aren't messing with pools yet.  

 

Last resort you could use a phone or tablet if available to manage the VMs

Hey, So I know this is a delayed response.   I wasn't able to in the summer update to a new bios that worked in my config due to the issues with the newer bios at the time not having the option to have the primary on-board GPU selected.   After much talking with Asus and techinical service there they release a bios that does have the option I need.   So I am now on the most up-to-date bios. 

 

Over the last 4-5 days I have also had the time to really try and isolate down the problem.   So I am able to run the system in Safe-Mode with docker and VM off without any of the errors.   When I run in Safemode with VM on and Dockers off I get errors.  

 

I then ran the system for about 48 hours in normal mode just dockers and pluggins and do not get errors.   But within about 30 second of starting the VM get the following string of errors in the syslog... with the recurrent tainted errors.   I have copied the below from the Syslog which was the first error that occurred following Instruction to turn on the VM within like I said 20-30 second...   not sure if this line was significant :Jan 15 20:05:59 UNRAID kernel: type mismatch for 8c000000,4000000 old: write-back new: write-combining"   or this one "Jan 15 20:07:36 UNRAID kernel: alloc_bts_buffer: BTS buffer allocation failure"   as these occur right before the errrors 

 

thanks all for thoughts and next steps of my troubleshooting 

 

 

 

  • Jan 15 20:05:20 UNRAID kernel: br0: port 2(vnet0) entered blocking state
  • Jan 15 20:05:20 UNRAID kernel: br0: port 2(vnet0) entered disabled state
  • Jan 15 20:05:20 UNRAID kernel: device vnet0 entered promiscuous mode
  • Jan 15 20:05:20 UNRAID kernel: br0: port 2(vnet0) entered blocking state
  • Jan 15 20:05:20 UNRAID kernel: br0: port 2(vnet0) entered forwarding state
  • Jan 15 20:05:42 UNRAID kernel: vfio-pci 0000:65:00.0: enabling device (0100 -> 0103)
  • Jan 15 20:05:42 UNRAID kernel: vfio_ecap_init: 0000:65:00.0 hiding ecap 0x1e@0x258
  • Jan 15 20:05:42 UNRAID kernel: vfio_ecap_init: 0000:65:00.0 hiding ecap 0x19@0x900
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 11
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 12
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 13
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 14
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 15
  • Jan 15 20:05:43 UNRAID acpid: input device has been disconnected, fd 16
  • Jan 15 20:05:58 UNRAID kernel: no MTRR for 8c000000,4000000 found
  • Jan 15 20:05:58 UNRAID acpid: client 12903[0:0] has disconnected
  • Jan 15 20:05:58 UNRAID acpid: client connected from 7254[0:0]
  • Jan 15 20:05:58 UNRAID acpid: 1 client rule loaded
  • Jan 15 20:05:59 UNRAID kernel: type mismatch for 8c000000,4000000 old: write-back new: write-combining
  • Jan 15 20:06:22 UNRAID kernel: usb 1-3: reset full-speed USB device number 2 using xhci_hcd
  • Jan 15 20:06:22 UNRAID kernel: usb 1-7: reset full-speed USB device number 4 using xhci_hcd
  • Jan 15 20:06:22 UNRAID kernel: usb 1-4: reset full-speed USB device number 3 using xhci_hcd
  • Jan 15 20:07:36 UNRAID kernel: ------------[ cut here ]------------
  • Jan 15 20:07:36 UNRAID kernel: alloc_bts_buffer: BTS buffer allocation failure
  • Jan 15 20:07:36 UNRAID kernel: WARNING: CPU: 14 PID: 6191 at arch/x86/events/intel/ds.c:402 reserve_ds_buffers+0xe4/0x382
  • Jan 15 20:07:36 UNRAID kernel: Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost veth tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs dm_crypt dm_mod dax md_mod nct6775 hwmon_vid bonding igb(O) skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc wmi_bmof mxm_wmi intel_wmi_thunderbolt aesni_intel aes_x86_64 crypto_simd ipmi_ssif cryptd glue_helper intel_cstate mpt3sas pcc_cpufreq raid_class intel_uncore intel_rapl_perf i2c_i801 scsi_transport_sas ahci sr_mod i2c_core libahci wmi ipmi_si cdrom button [last unloaded: igb]
  • Jan 15 20:07:36 UNRAID kernel: CPU: 14 PID: 6191 Comm: CPU 6/KVM Tainted: G O 4.19.107-Unraid #1
  • Jan 15 20:07:36 UNRAID kernel: Hardware name: System manufacturer System Product Name/WS C422 PRO_SE, BIOS 3305 11/17/2020
  • Jan 15 20:07:36 UNRAID kernel: RIP: 0010:reserve_ds_buffers+0xe4/0x382
  • Jan 15 20:07:36 UNRAID kernel: Code: ff ff 48 85 c0 75 2c 80 3d e7 94 e9 00 00 75 1c 48 c7 c6 e0 22 c0 81 48 c7 c7 0c 10 d3 81 c6 05 d0 94 e9 00 01 e8 2b 42 04 00 <0f> 0b bb 01 00 00 00 eb 58 44 89 e7 49 89 85 40 09 00 00 48 89 04
  • Jan 15 20:07:36 UNRAID kernel: RSP: 0018:ffffc90004477a70 EFLAGS: 00010282
  • Jan 15 20:07:36 UNRAID kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000007
  • Jan 15 20:07:36 UNRAID kernel: RDX: 000000000000075a RSI: 0000000000000002 RDI: ffff88887f9964f0
  • Jan 15 20:07:36 UNRAID kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000027000
  • Jan 15 20:07:36 UNRAID kernel: R10: 0000000000000000 R11: 0000000000000050 R12: 0000000000000015
  • Jan 15 20:07:36 UNRAID kernel: R13: ffff88887fb4f3e0 R14: fffffe00003a8000 R15: 0000000000000015
  • Jan 15 20:07:36 UNRAID kernel: FS: 00001496b13ff700(0000) GS:ffff88887f980000(0000) knlGS:0000000000000000
  • Jan 15 20:07:36 UNRAID kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  • Jan 15 20:07:36 UNRAID kernel: CR2: 00000209280ff038 CR3: 0000000113140002 CR4: 00000000003626e0
  • Jan 15 20:07:36 UNRAID kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  • Jan 15 20:07:36 UNRAID kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  • Jan 15 20:07:36 UNRAID kernel: Call Trace:
  • Jan 15 20:07:36 UNRAID kernel: ? kvm_dev_ioctl_get_cpuid+0x1d3/0x1d3 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: x86_reserve_hardware+0x134/0x14f
  • Jan 15 20:07:36 UNRAID kernel: x86_pmu_event_init+0x3a/0x1d5
  • Jan 15 20:07:36 UNRAID kernel: ? kvm_dev_ioctl_get_cpuid+0x1d3/0x1d3 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: perf_try_init_event+0x4f/0x7d
  • Jan 15 20:07:36 UNRAID kernel: perf_event_alloc+0x46e/0x821
  • Jan 15 20:07:36 UNRAID kernel: perf_event_create_kernel_counter+0x1a/0xff
  • Jan 15 20:07:36 UNRAID kernel: pmc_reprogram_counter+0xd9/0x111 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: reprogram_fixed_counter+0xd8/0xfc [kvm]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6b8/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6ac/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: intel_pmu_set_msr+0xf4/0x2e4 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6ac/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: kvm_set_msr_common+0xc6e/0xd24 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6b8/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6ac/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6b8/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6ac/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: ? vmx_vcpu_run+0x6b8/0xa97 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: handle_wrmsr+0x4b/0x85 [kvm_intel]
  • Jan 15 20:07:36 UNRAID kernel: kvm_arch_vcpu_ioctl_run+0x10d0/0x1367 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: kvm_vcpu_ioctl+0x17b/0x4b1 [kvm]
  • Jan 15 20:07:36 UNRAID kernel: ? __seccomp_filter+0x39/0x1ed
  • Jan 15 20:07:36 UNRAID kernel: vfs_ioctl+0x19/0x26
  • Jan 15 20:07:36 UNRAID kernel: do_vfs_ioctl+0x533/0x55d
  • Jan 15 20:07:36 UNRAID kernel: ksys_ioctl+0x37/0x56
  • Jan 15 20:07:36 UNRAID kernel: do_syscall_64+0x57/0xf2
  • Jan 15 20:07:36 UNRAID kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
  • Jan 15 20:07:36 UNRAID kernel: RIP: 0033:0x1496b60454b7
  • Jan 15 20:07:36 UNRAID kernel: Code: 00 00 90 48 8b 05 d9 29 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 29 0d 00 f7 d8 64 89 01 48
  • Jan 15 20:07:36 UNRAID kernel: RSP: 002b:00001496b13fe678 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  • Jan 15 20:07:36 UNRAID kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00001496b60454b7
  • Jan 15 20:07:36 UNRAID kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001d
  • Jan 15 20:07:36 UNRAID kernel: RBP: 00001496b3fac000 R08: 0000564f326e7770 R09: 00000002d55a4cc0
  • Jan 15 20:07:36 UNRAID kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
  • Jan 15 20:07:36 UNRAID kernel: R13: 0000000000000006 R14: 00001496b13ff700 R15: 0000000000000000
  • Jan 15 20:07:36 UNRAID kernel: ---[ end trace 84ab53b7f22d7e46 ]---
  • Jan 15 20:07:36 UNRAID kernel: CPU 6/KVM: page allocation failure: order:4, mode:0x6080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
  • Jan 15 20:07:36 UNRAID kernel: CPU 6/KVM cpuset=vcpu6 mems_allowed=0
  •  
  •  
Link to comment
On 1/15/2021 at 6:35 PM, Aceriz said:

Over the last 4-5 days I have also had the time to really try and isolate down the problem.   So I am able to run the system in Safe-Mode with docker and VM off without any of the errors.   When I run in Safemode with VM on and Dockers off I get errors.

So it seems to be VM related.  Might be good to post diagnostics AFTER this happens again.  That snippet of the syslog left out a lot of details and I saw reference to another OOM error. 

 

Also, I looked into your previous OOM error from the first post one last time and I can *kinda* see how it gave you the error.  If anyone is curious, technically, you ran out of memory on the Normal zone and couldn't assign a contiguous block (order of 4).  I don't know why it didn't use DMA32 zone.  Maybe someone else can answer that.

 

On 1/15/2021 at 6:35 PM, Aceriz said:

not sure if this line was significant :Jan 15 20:05:59 UNRAID kernel: type mismatch for 8c000000,4000000 old: write-back new: write-combining"

seems to be related to intel integrated graphics??? for now, just assume it's nothing.

 

I did find reference to your current issues in an older forum post: 

Their solution was to nuke the offending VM and start over.  I guess you could try that.  You could first try removing the xml but keeping the vdisk (assuming you're using vdisks for the VM).  If that doesn't work then try creating a new vdisk, keeping the old one.

 

If that doesn't work either, then you could always go with the latest 6.9.0 release candidate.  The new kernel might help things out and I think there is a newer release of qemu rolled up in there as well.

Link to comment
4 hours ago, civic95man said:

So it seems to be VM related.  Might be good to post diagnostics AFTER this happens again.  That snippet of the syslog left out a lot of details and I saw reference to another OOM error. 

Hey So please see the below attached which is the diagnostic I just pulled from the system.  As I have had the system running for the last 2 days the syslog is full of the errors so you can't see things from the initiation of the system.  As such  I will also post a new diagnostic in like the next hour after a reboot that will show just the system starting up with plugins, and dockers without the error and then will timestamp in the post when I start the VM with a highly probable association to the error as I posted about yesterday. 

 

 

4 hours ago, civic95man said:

Their solution was to nuke the offending VM and start over.  I guess you could try that.  You could first try removing the xml but keeping the vdisk (assuming you're using vdisks for the VM).  If that doesn't work then try creating a new vdisk, keeping the old one.

I will do this this weekend.  I have no problems nuking the VM and re-doing things :)  one great thing about Unraid.    I will post an update after I have done that.   Any particular settings you recommend within the config .... I generally just follow spaceinvader's video for it.. 

 

4 hours ago, civic95man said:

latest 6.9.0 release candidate. 

If after I try the above.   Is there a way to be setup so that I could try the latest release candidate. But if it doesn't work I can revert? Just not sure how to go about that if I cross that bridge 

rizznetunraid-diagnostics-20210118-2245.zip

Link to comment
6 hours ago, civic95man said:

diagnostics AFTER t

Hey, So as I noted above Here is the new Diagnostics.    So in this fresh reboot.  I turned the VM on at 12:07.  Did not notice error until about the 12 min later..

 

I am going to do another test tomorrow night.  This time around I just turned on the VM and didn't do anything with it other than load the syslog page and a few other webpages and hit refresh for the 12 min.. but noticed I didn't get the error(as quickly as I had the other day).   . . as it was a memory error I turned on a (plex in a browser) and got the error very quickly...  

In my other testing last week though, I had run PLEX on this server, via another computer on my network and got no error for multi day run of constant video stream (got through 2multple continuous seasons of a show with no errors lol)... 

I am going to tomorrow night run  a test where I turn VM on but perhaps stress the memory and system with something other than PLEX to see what happens. Will post after. 

 

Thanks again for help.. and again will try and re-do the VM this weekend 

rizznetunraid-diagnostics-20210119-0017.zip

Link to comment

Looked through your diagnostics (both) and still see the OOM errors.  The first diagnostics, which the system ran for about 2 days, was full of them as you stated.  I find it very odd that your memory seems very fragmented, and thus why it can't allocate an order 4 block of contiguous memory - especially after a fresh reboot.  

 

Here is a suggestion: have you tried using Hugepages for your VM?  It's typically only needed for very large capacities, or if you are suffering performance issues; however, in this case, it's worth a shot.  Here is a post about how to utilize it:

 

If that doesn't work then I would suggest either trying the 6.9-rc or adding more memory.  The 6.9 series has added pools and changed the format option of the SSD cache, so while it's not a one-way trip, it's not as simple to revert to the prior 6.8 or 6.7 release.  With that said, the 6.9-rc seems very stable and should work fine.

Link to comment
11 hours ago, civic95man said:

Hugepages for your VM?

So I hadn't tried Hugepages before.   I think I have set it up correctly.  My system has 32GB of ECC memory  so in the FLASH section behind the append I put in the following.   "append hugepagesz=2M hugepages=12096 vfio-pci.ids=1b73:1100,10de:1ad8,10de:1e87,10de:10f8,10de:1ad9, initrd=/bzroot,/bzroot-gui nomodeset"

I chose the 12096 as I had tried putting it at the full 16128   but my system would not boot past the initial selection page for the gui safemode etc... froze on the countdown...  so I reset and put the lower number in. 

 

I had the system boot up without the VM running.  (Didn't give it that long but will test again over night)... ran the VM and within a few minutes got the error again I have attached the most recent Diagnostics Post hugepages change.  

 

I will reset the server and overnight run the system without the VM started with Hugepages to see that that had no issues...(again the prior testing had isolated down to the VM but will retest).   Then if anything I will look this weekend as I said at redoing the VM from scratch... 

 

 

11 hours ago, civic95man said:

format option of the SSD cache,

Just so I know is this something that will erase the data off of the SSD cache when If I look at updating as part of the problem solving ? Sorry if stupid question... 

rizznetunraid-diagnostics-20210120-0006.zip

Link to comment
20 hours ago, civic95man said:

especially after a fresh reboot.  

 

So I reset the system last night after setting up the Hugepages correctly I think(see above post)...   But this time ran the system over night without the VM. 

Overnight I didn't get any significant errors... 

 

But on logging in this morning to the GUI got an order O error..  Not sure if this was due to me doing hugepages incorrectly or not...   I am going to remove the hugepages and retest today, 

 

I have attached the diagnostics from overnight and then  mylogin to the gui this morning...

 

Thanks again for all help

rizznetunraid-diagnostics-20210120-0900.zip

Link to comment
12 hours ago, Aceriz said:

My system has 32GB of ECC memory  so in the FLASH section behind the append I put in the following.   "append hugepagesz=2M hugepages=12096 vfio-pci.ids=1b73:1100,10de:1ad8,10de:1e87,10de:10f8,10de:1ad9, initrd=/bzroot,/bzroot-gui nomodeset"

I chose the 12096 as I had tried putting it at the full 16128   but my system would not boot past the initial selection page for the gui safemode etc... froze on the countdown...  so I reset and put the lower number in. 

I was thinking of trying the 1G option since your processor supports it, but 2M still works.  So you basically tell the kernel to set aside XXXX contiguous block of memory of 2M in size (instead of the default 4K).  The kicker here is that some applications, besides the VM, can and will use the hugepages, so plan accordingly. If you want 16G set aside as hugepages then you would put "hugepages=8192", since 16GB / 2MB = 8192.

 

I think the mis-configured hugepages was causing the oom killing spree in those diagnostics.

 

 

12 hours ago, Aceriz said:

Just so I know is this something that will erase the data off of the SSD cache when If I look at updating as part of the problem solving ? Sorry if stupid question...

This is in regard to the transition from prior versions of unraid to 6.9-rc; where there were excessive writes to the SSD and part of the solution was to align the partition to 1MB.  This would require a repartitioning of the SSD - but you would have to manually invoke it.  The issue is that supposedly, 6.8 and earlier doesn't recognize this layout (but I could be mistaken).  In either case, just don't format the SSD in 6.9 and you'll be fine.

 

Storage pools WAS introduced in 6.9 and there was a process to revert back to 6.8 and earlier, but I think those notes were lost in the beta release updates somewhere.  As always, backup your flash drive before updating.

 

At this point, I think the 6.9 route would be the best choice since it utilizes a newer kernel which should better support your MB and CPU, and possibly get rid of that BTS buffer allocation failure.

Link to comment
3 hours ago, civic95man said:

At this point, I think the 6.9 route

In updating to 6.9  Would you recommend using the built in update tool ? Or should I try and do a "Fresh" install on 6.9...ie make a copy of the FLASH drive/ drive order main page print out(so i can reput this in on new launch/ key file for flash drive/ and  share directories.    That way I can just re-install fresh dockers, and fresh pluggins?   Or do you think this won't make much of a difference... 

thanks

Link to comment
On 1/18/2021 at 4:56 PM, civic95man said:

Their solution was to nuke the offending VM and start over

So this might possibly be the solution... Over the last 3-4 days I started with a fresh install of windowsVM and for like the 3 days ran the VM with PLEX  running without having any errors... 

 

Then today I went to start adding my other software within the VM... (hadn't monitored the syslog during this process)..  Went to do another test and got the tainted error.. 

I wondered if it might be one of the software applications.   Specifically... I am wondering if it is the installed the iCUE corsair software that is for my H150i CPU pump.. (use for colour control/ control of fans/pump etc.... I do pass through the pump under the VM management and had it passed through in the last 3 days without the software...  

 

Any thoughts on this ??

 

I am going to run the test again tonight having uninstalled the iCUE software and removing the pump from the VM management.. 

If this doesn't work I will follow through with the update to 6.9 rc :) 

 

 

Link to comment
On 1/23/2021 at 5:54 PM, Aceriz said:

I wondered if it might be one of the software applications.   Specifically... I am wondering if it is the installed the iCUE corsair software that is for my H150i CPU pump..

I would be surprised if that was the cause; specifically, the VM is isolated from your unraid system.  That corsair pump is just another USB peripheral as far as unraid, the VM, and Windows is concerned (much like a mouse or keyboard).  If you want to pursue this route then it can't hurt anything.

Link to comment
12 hours ago, civic95man said:

If you want to pursue this route then it can't hurt anything.

So the weird thing, even though this makes absolutely no sense to me at all... But I think it might be the software thing with the ICUE and the passthrough. 

So since I posted on Saturday I have uninstalled the ICUE and removed the device from the VM selection setup screen.  (Only change made from before when there was an error after installing software and running PLEX). 

Now in this post uninstall state I have been running the VM with a PRIME 95 blended load test in the VM now for almost the last 48 hours, with PLEX running on the server as well.. Without any errors what-so-ever ...... 

again this makes no sense as you had said... So I am going to do a test over this coming weekend where the only change I will make is re-adding the software and VM selection for the USB and if I get the error again it would seem to isolate down to that...


 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.