Jump to content

bjsmith911

Members
  • Posts

    13
  • Joined

  • Last visited

bjsmith911's Achievements

Noob

Noob (1/14)

0

Reputation

  1. I keep seeing posts re: intermittent crashing. These folks are lambasted by people saying it isn't a problem, but then why do the posts keep coming? Anyway. I am not a developer, and I have inconsistently downloaded and examined my syslogs, but they do consistently show BUGS as the last timestamp before the system goes unresponsive. E.g.: Nov1123:21:42Vesper kernel:BUG: unable to handle page fault for address: 00000200636d12d0 Nov1123:21:42Vesper kernel:#PF: supervisor read access in kernel mode Nov1123:21:42Vesper kernel:#PF: error_code(0x0000) - not-present page Nov1123:21:42Vesper kernel:PGD 0 P4D 0 Nov1123:21:42Vesper kernel:Oops: 0000 [#1] PREEMPT SMP NOPTI Nov1123:21:42Vesper kernel:CPU: 2 PID: 163 Comm: kswapd0 Tainted: P U O 6.1.49-Unraid #1 Nov1123:21:42Vesper kernel:Hardware name: Micro-Star International Co., Ltd. MS-7D06/MPG Z590 GAMING CARBON WIFI (MS-7D06), BIOS 1.B0 06/12/2023 and Jul293:05:44Vesper kernel: BUG: kernel NULL pointer dereference, address: 0000000000000081 Jul293:05:44Vesper kernel: #PF: supervisor read access in kernel mode Jul293:05:44Vesper kernel: #PF: error_code(0x0000) - not-present page Jul293:05:44Vesper kernel: PGD 15fcd5067 P4D 15fcd5067 PUD 15fcd4067 PMD 0 Jul293:05:44Vesper kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI Jul293:05:44Vesper kernel: CPU: 7 PID: 15899 Comm: shfs Tainted: P UD O 6.1.79-Unraid #1 Jul293:05:44Vesper kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D06/MPG Z590 GAMING CARBON WIFI (MS-7D06), BIOS 1.B0 06/12/2023 These posts seem to be layered with animosity from both the Unraid faithful and the "victims", but what is getting lost in the dialogue is that there is an issue. I understand it may ultimately be related to some pairing of Linux Kernel-to-specific Hardware, and not an Unraid issue, but an issue nonetheless. Examples: https://www.facebook.com/groups/217132562182318/posts/1593856457843248/ https://www.facebook.com/groups/217132562182318/posts/1594972577731636/ https://www.facebook.com/groups/217132562182318/posts/1598660484029512/
  2. No issues since last boot; took it offline yesterday to run a memtest. I'll upload the full report later (don't have the memtest USB with me) but the memory checks out. (Possibly vulnerable to high frequency row hammer bit flips, but no errors). I've remove several, but not all, plugins, including the Nvidia plugin; also removed the GT1030 that was in there, left over from my quad-monitor workstation days.
  3. Edit 2: Unable to reboot; logs stalled; "Array stopping: unmounting disks" but no further unmounting attempts. Tried "reboot" from cli and GUI - did not reboot. This is getting ridiculous. No way to force a reboot or force an unmount?!? Edit1: Found this one; corrupt file wouldn't transfer; hung up doing something; also prevented Array from unmounting. Data point: removed nearly all my plugins, all VMs off, only Plex and Homebridge running. Caught server with random CPUs pinned; server idle (I had a sync running from another VM pulling files onto my backup NAS; terminated that and shutdown the VM; no change). Not sure if it's related, but it doesn't seem right. No crash yet. Nothing in "Processes" with any significant CPU usage; only 2 above 0.1%, (6% & 2%) - not sure how else to determine what is using it. vesper-diagnostics-20240131-2055.zip
  4. And crashed again today. Running in bareback mode with no VMs/Dockers running (homebridge docker is still running; nice for cameras). I'll try safe-mode next; I'll run a mem-test on the 7th if it's crashed by then. Nothing mentioned about the crash in the syslog at all.
  5. Not resolved. Unassigned Devices is still uninstalled; qBit runs in a VM (I have it shutdown now, but didnt' before, but still - it's in a Windows 10 VM)s. Two crashes in the past week; one after nearly a month of uptime, the second less than 24h after the first's parity check finished. Same symptoms; machine still running (Fans & lights on, HDDs spinning) but no access to Shares/Web/Dockers(Tailscale, Homebridge)/VMs; requires hard shutdown and reboot. I only managed to capture the syslog from crash 2 (syslog server being on my backup OMV NAS that needed some attention). syslog (2).txt vesper-diagnostics-20240129-0831.zip
  6. I moved my VMs and converted my Unassigned Devices NVMe drives to a Cache Pool. Still running qBit, but uninstalled the Unassigned Devices plugin. Current uptime 7 days 10 hours (Since 6.12.6 Upgrade). It appears to be an issue with Unassigned Devices.
  7. Folks. Syslog and diagnostics attached. Symptom: server will run for some number of days or weeks, and then crashes. This has happened maybe 3 times, but I only just moved the syslog to a different machine. Dockers stop running, VMs stop running, webGUI non-responsive, does not respond to ping; machine itself remains powered with fans and lights on. Its on UPS with auto-shutdown, so I do not think it's a power blip (I had one of those a week ago, where the machine died and did not reboot when power came back. I see the warnings/errors at 23:21:42, but do not know how to interpret. Server is back online now, hence the diagnostics file. It isn't a major issue, but would prefer to not have to run parity checks all the time lol . vesper-diagnostics-20231112-1717.zip syslog.txt
  8. Oh. That's what QSV is. I gotcha; looking for the Intel iGPU (which I wasn't using as that's been assigned to Plex). Thank you!
  9. Nvidia Driver: root@Vesper:~# nvidia-smi Wed Oct 18 11:46:10 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GT 1030 Off | 00000000:01:00.0 Off | N/A | | 44% 31C P0 N/A / 30W | 0MiB / 2048MiB | 1% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ Homebridge Docker Homebridge config (cameras are all available in Homekit, works great): { "controllers": [ { "address": "192.168.69.2", "password": "***", "username": "***" } ], "_bridge": { "username": "***", "port": 48233 }, "platform": "UniFi Protect", "options": [ "Enable.Video.Transcode.Hardware.7483C271B443", "Enable.Video.Transcode.7483C271B443", "Enable.Video.Transcode.Hardware", "Enable.Video.Transcode" ] } Error from Homebridge on Docker start: [10/18/2023, 8:00:48 AM] [homebridge-unifi-protect] Hardware-accelerated decoding and encoding using qsv will be unavailable: unable to successfully validate capabilities. Did I skip a step?
  10. Scratch that; installing W10. I've seen read a few topics on slow Windows VM performance, and have taken a few of the 'tuning' / performance improvement steps, but I'm still struggling to get my W11 VM to perform as expected. Symptoms: nothing quantifiable. 1. 100% Unraid/host CPU usage on VM boot and for "longer than required" once VM running/logged in. 2. Higher than expected VM idle CPU usage (10-20%) 3. Even with #2's 80% CPU head room available, opening applications, even as simple as Windows Explorer, has a noticeable 1-5s pause "Expectation" baseline? Well, the W11 VM in question is installed on a WD Black NVMe drive, that was the previous primary OS on this box. I followed Space Invader's tutorial to stub the hard drive through to the VM. I have since learned this is the old fashioned way of doing this, so in my troubleshooting attempts, I have used System Devices to bind the hard drive to VFIO on boot. System Specs: Z590 GAMING CARBON WIFI (MS-7D06 11th Gen Intel® Core™ i5-11600K @ 3.90GHz 64 GiB DDR4 I have followed this post items #1 & #3. I skipped #2 as I have assign the VM 10GB vRAM, and it's only using ~3. I started following Space Invader's "Advanced Server Tuning" video tutorials, but they seem pretty focused on the Array and Cache drives. I do not have a cache, and the WD Black is stubbed to the VM, so my understanding is that there is no interaction with Unraid OS itself. Also: Guest Agent is installed. I have a W10Pro VM installed on an Unassigned WD Blue that is performing fine. Is this a hard drive stubbing issue? Windows 11? PEBCAK? VM config .xml attached. Am I missing something? Scratch that; installing W10. 11.txt
  11. Okay. That doesn't currently make sense to me but we'll see if it works. NIC = NIC, iGPU = iGPU ... I'm sure you know better than I, but to me, at this point in my life, iGPU != NIC. I guess if the GUI at least loads, then we have a reliable way to troubleshoot further. Sitrep: BIOS updated to the latest from MSI (Note - there is a specific USB port that needs to be used. CMOS reset was required, then worked as expected). Disabled fast boot. Disabled secure boot. Enabled virtualization. Boot loading screen loads again. I removed my extra GPU to ensure I could plug both display outputs into monitors and not miss anything. Video signal of boot screen comes out of HDMI. Boot to GUI does not boot to GUI. No network active, cannot access webGUI at tower.local. Interface not active - no activity reported in Unifi. Ping to previously assigned Unifi "Fixed IP" and initial DHCP IP (in case the lease hasn't expired yet), .28 and .157 respectively, unreachable. Blinky lights on the port; known good cable (swapped with the one on this machine I am typing on now). OH IT LOADED ... and it's gone. Online for 30s, tower.local did not accept credentials and load main screen in that time.... waiting again. It's at .28 now. IT'S BACK. echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf SENT SUCCESSFULLY. Twice. No response given in terminal, just new line loaded. Okay. MAIN loaded >> Array Stopped. REBOOT sent. How do I know if it's rebooting/rebooted? Going to wait for it to come back and SHUTDOWN to be sure. <Inside thoughts> man it would be nice if Unraid allowed the integrated WiFi to work...even just a little... It's been 20 minutes. Hasn't come back up. Clicked power button. I hear that triggers a shutdown. We'll see. 10 minutes. It doesn't. Hard shutdown, power button 6 seconds. Off. Booting. Keep in mind - no changes to ANYTHING BIOS or other wise other than echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf Black screen. No boot. Single click off, not a good sign. Booting. MSI logo .... black screen. No ping. No tower.local. What next? EDIT # 2 My bad. Turns out i'm a liar? Secure Boot and Fast Boot were re-enabled. Disabled both. I have half a GUI and webGUI loading. I will confirm if it was FAST or SECURE or both and then mark as resolved. EDIT # 3 Re-enabled Secure boot. Boots, but no GUI. WebGUI works. Reverted to no Fast, no furious, no Secure. No GUI. Again. WebGUI works. Very inconsistent this Unraid thing.
  12. Thanks! I'll give that a try later tonight - I currently have a bricked Z590 mobo to sort out (attempted to flash BIOS to latest and it uh...didn't). How does the referenced "echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.confcommand" relate to the NIC? It sounds like it's iGPU related? Or is it a common driver issue in this chipset? OR will this "only" fix my no-GUI GUI-boot issue? My system doesn't crash per se. It doesn't boot to GUI, and the NIC works about 1% of the time. Is this relevant still? 'Adding "i915.force_probe=4c8b" to "/boot/config/modprobe.d/i915.conf"' This seems sneakily similar to the "echo "options i915 enable_dc=0" >> /boot/config/modprobe.d/i915.conf" mentioned in your link. I will Edit this post tonight as I am still an unwelcome visitor and can only post once per day 🙂
  13. I have tried every combination of CFS/UEFI Secure Boot, EFI/EFI-, Fast Boot Disabled, different USB keys, GUI/no GUI, different USB ports ..... yea. Not booting. Box boots from USB; that's how the previous install of W10 was installed. IFF I use UEFI, "EFI", and either boot option "Unraid OS" or "Unraid OIS GUI Mode", I get the mass booting text, and then blank screen. http://tower.local doesn't load. IFF I use any other combination of CFS, I do not get the boot menu. BIOS Setting Screenshot Spam. Mobo = MPG Z590 GAMING CARBON WIFI. It booted far enough once so I could load http://tower.local. Once. Long enough for me to bind the license...but then I needed to boot back into Windows to log on to my PCI RAID adapter and delete that array (to be used in Unraid, obvs). Now it won't boot again. I even moved the USB stick. Doesn't respond to ping (Fixed IP set in Unifi). No other potential clients come up in Unifi on the interface. EDIT #2!!!: Pretty sure this is related to the network driver for the Intel® I225V 2.5Gbps LAN controller. I finally remember the W10 install having intermittent connectivity issues on this mobo. This reflects my current symptoms. I have 1-2 minutes of connectivity followed by 5-30minutes of disconnectoin. How do I install the drivers? https://www.intel.com/content/www/us/en/download/15084/intel-ethernet-adapter-complete-driver-pack.html EDIT #3: Where is the diagnostics export? Since the GUI doesn't load (separate issue, since the boot screen does sometimes work on a monitor, this is still a bug) I can only use the web GUI for a few seconds at a time. Are there known working PCI NICs? Preferably dual NIC (for teaming). tower-diagnostics-20230829-2000.zip
×
×
  • Create New...