-
Posts
1895 -
Joined
-
Last visited
-
Days Won
8
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Report Comments posted by jbartlett
-
-
Looks like the cause for the IOWAIT is a fstrim being executed on a WD Black 256GB nvme drive, executed by the Trim plugin. The process gets stuck in an uninterruptible sleep.
Not a bug in the Dashboard.
-
I tried to identify what might be causing that - stopped my VM's and the array but the current pegged CPU stayed pegged. Found a WD NVMe drive hasn't unmounted and wouldn't unmount from "Unassigned Devices", nor could I pull up a LS of the share, it just hung. Didn't look like I was actually using it so I rebooted & unmounted it. If the pegging is not related to that, it'll show up again in a few days.
-
Here ya go. CPU 26 is pegging.
-
53 minutes ago, IamSpartacus said:
I didn't do it manually. By isolating the CPUs in the settings it appended the syslinux file automatically.
Oh neat, I never noticed the CPU Isolation part on the CPU Pinning page (never scrolled down far enough)
-
Other than I didn't know about being able to hyphenate the range, no.
-
I had that same thought. I just built a 2950 system but it hasn't been on long enough at once to see if it shows up. It has not shown up on my Intel build.
-
On your HTOP, the matching CPU from the Dashboard graph is also maxed which isn't the scenario I reported here.
I'd recommend excluding the VM CPU's in the sys config to keep the OS away from them.
IE: append isolcpus=12,13,14,15,28,29,30,31 initrd=/bzroot
-
Happening in RC7 after about 5 days uptime.
-
Using htop will give you an apples to apples comparison.
-
Addendum: It seems the parity job duration displayed only includes the time it took since it was last resumed from a paused state. My array takes roughly 12 hours to complete start to finish but it's saying it took only 5 hours.
-
When pausing a Parity Check, the Dashboard reports "Error code: aborted"
-
My Intel box updated from RC6 to RC7 with no issues. My Threadripper box did not upgrade cleanly from RC5 to RC7.
After it taking 3x as long to reboot as normal, the web admin was still offline. I pinged, got a response, telnetted in and pulled up a tail of the syslog. It looked odd so I copied the syslog to the flash drive and looked at it and I realized the system had not rebooted yet as it still had events from days ago. It looked like it stalled while checking PIDs left on /dev/md* when the only ones left was for the array drives. I executed the "reboot" command and five minutes later I still had no ping response.
After connecting a monitor and forcing a reboot, the system locked up at the UEFI boot menu with only CTRL-ALT-DEL working. I plugged the flash drive into my Win10 machine, it detected errors and prompted me to scan it which I did so. No errors found. Plugged the USB drive back into the NAS and it rebooted without issue.
Attached is the diagnostics from after the reboot. Also attached is the syslog entries from when I clicked the "Check for Update" button on to the last entry. I could not find any new diagnostic file referenced in the 2nd to last line on the root directory of the flash drive.
-
I don't think it has anything to do with actual CPU usage. My guess is that there's a bug in the data gathering/display formatting process.
-
Try "Inspect Element" on the CPU bar that's pegging 100% to see if it's looping over a fractional number between 99 and 100. If it is, it's related to my issue linked in the first comment.
Animation of the change I see: https://gyazo.com/88ebee4954d9d3b7e9655c6f7e9f2a80
-
Now it's CPU 27 & 28 doing it. New diagnostic file attached. You can ignore the USB errors at the end of the syslog, was testing a failing USB security dongle.
-
Rebooting makes the "stuck bar" go away. I've been really busy over the past few days and haven't been monitoring it closely but now CPU 27 is pegging in the Dashboard which is not represented in htop. Only CPU's 2-15 are in use, the rest are available for unraid's use. The average load includes the pegged VM. This issue has not appeared on my Intel backup system with a hex core CPU.
Side by side of the Dashboard & htop video
-
I did an "Inspect Element" on the bar and it's doing something odd. The other CPU graphs are updating every second and only integers but this one is going from 99% to 100% in about half of that time, going through several decimal stops along the way.
Animation of the change: https://gyazo.com/88ebee4954d9d3b7e9655c6f7e9f2a80
-
Just for my understanding so correct me if I'm worng, if the first line doesn't report that the microcode was updated, then unraid left things as they were and the following lines in the syslog are just reported for informational purposes.
Nov 10 22:04:44 NAS kernel: microcode: CPU0: patch_level=0x08001137 (repeated for each hyperthreaded core)
Nov 10 22:04:44 NAS kernel: microcode: Microcode Update Driver: v2.2.
-
I wasn't able to duplicate with my VM's nor having a "Windows 10 New" VM.
-
One thing to note about Windows 10 VM's, you can install an instance without a Windows Key - you can bypass entering it during the installation. Create a local User ID if you do this vs using a Microsoft account. You can then simply copy the hard drive of this instance to create additional VM's. This is a typical practice for testing quick installations but if you want to persist them, then you can enter a key for each instance to make it all good.
-
I have multiple Windows VM's running on one of my unraid servers with one conveniently created & named "Windows 10". I'll back up the VM's files tonight and will will try to delete "Windows 10 Handbrake 1".
Any other test condition to make sure of prior?
-
I've had this happen a couple times, the cursor on the console will even stop blinking. Seemed to happen whenever I made changes to the hardware such as adding or removing a PCI-E card to test hardware passthrough to a VM. Required a hard reset but never happened again after in my case (until the next mucking about the PCI-E cards).
Ryzen Threadripper, current BIOS.
This is not going to be an easy thing to track down with no means to capture logs or specifying the syslog to be written to the flash drive instead of RAM.
-
I've got two unraid systems, both with two M.2 NVMe drives as well, VM's running of them and no issues so far.
It wasn't quite clear but are the two m.2 drives the same model? If so, check out my DiskSpeed docker app and run a bench mark on both system (assuming you can access both currently) to see if they have similar speed graphs. Just because there's no flags being raised doesn't mean there aren't any.
-
Do you think that the m.2 drive failed and it's killing your system? A more descriptive title would be helpful.
[6.7.0-rc5] Dashboard CPU erroneously stuck at 100%
-
-
-
-
-
in Prereleases
Posted
I'm guessing bad device. I rebooted to free it up and now the NVMe drive doesn't register.