Jump to content

bungee91

Members
  • Posts

    744
  • Joined

  • Last visited

Everything posted by bungee91

  1. (I agree with the above, however) You could PXE boot the install disc, set dad's machine to network boot. How much time you have? http://lime-technology.com/forum/index.php?topic=31297.0 Don't ask grumpy for help, I don't think he'll answer anymore..
  2. What he said. Also since a lot of platforms have IGD (integrated graphics), this isn't a big issue as both Nvidia or AMD would work in that situation. The X99 chipset has no built-in IGD, but most Haswell/Skylake/Ivy ones (speaking Intel) do. Even a lot of server boards have some very generic (perfectly good for its use case) VGA output that would suffice for this use case. If your board has an old PCI slot, this may also suffice for Nvidia as a work around. eBay for a PCI video card for ~$10.
  3. I'd check to see if there is a newer firmware for your drive. When I was looking at purchasing an SSD ~1 year ago, there was discussion about the Evo getting or needing updates for some issues. I cannot say that this is related, but certainly worth checking. I'd assume that there is something wrong with your drive or controller. Is it set to AHCI in the BIOS? I have a 512 Pro Samsung drive and do not any of these issues, and it is formatted to BTRFS.
  4. I've used this term, the wife disagrees! Use it for troubleshooting, basically: "Sets the IOMMU into passthrough mode for host devices. This reduces the overhead of the IOMMU for host owned devices, but also removes any protection the IOMMU may have provided again errant DMA from devices. If you weren't using the IOMMU before, there's nothing lost. Regardless of passthrough mode, the IOMMU will provide the same degree of isolation for assigned devices." http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-3-host.html For difficult cards it is recommended to try this, add to your syslinux.cfg as such: From this label unRAID OS (GUI) menu default kernel /bzimage append initrd=/bzroot change it to this: label unRAID OS (GUI) menu default kernel /bzimage append iommu=pt initrd=/bzroot Apply and then reboot your system.
  5. Unfortunately not, they have nothing to do with the issue. However I'm surprised that you're having issues with the items within their own IOMMU groups. 6.2 by default has IOMMU=PT set (if I'm not defining that perfectly, sorry, on phone). That may help your issue, or if not you can toggle that back to off. Which version are you currently using, 6.1.9?. You can enable this also in 6.1.9 by adding it to your syslinux.cfg file.
  6. I was just going to start talking about this (the manual part of this equation). Since this is the topic to discuss such a feature, I think this is a LARGE miss by LT for not wanting to implement this feature sooner. I strongly feel that without this feature we're losing out on making this statement truly useful "This enables users to leverage the same hardware providing NAS services to the home as a workstation, where they can do work, play media/games, and be creative with a high-performance computing platform." https://lime-technology.com/unraid-6-press-release-2/ This statement is still true, and not specifically misleading, however it glorifies the situation. Now no one said to me "Jeff, you should sell that other PC you have and virtualize everything into UnRAID, that will be the "cat's meow!"". However I and many others did just that, and the whole "one box to rule them all" concept is very good, and we're very close to being there. I wouldn't want to go back either, as I like everything in one computer, and I truly feel I'm using the processing power that I have (which was not very often prior as I don't game much, and an i5 for surfing the web is complete overkill). I do not run Pfsense or other things that are very much important to others here (even though I like the idea and may play in the future), however it is obnoxious to have to shutdown my primary PC in order to do maintenance to the array. Leading me to always having a netbook near, or use my phone (which works, but is not the best for completing tasks/typing commands). The quick and dirty solution was to add some form of GUI/X environment into 6.2 (which I have not used yet), and then open up the security risk of Firefox with admin rights (as I understand it) directly on the host. This solution still requires you to have an IGD for it to use, or sacrifice a video card for this output. This is where the "UnRAID as a guest" is very intriguing, however it is also not the path followed by many here (or they're not very vocal). However this solution still has customers buying UnRAID, so it is not exactly losing LT money. So, with all of that said. For a "primary" VM that we don't want to have shutdown unless the computer is actually rebooted or shutdown, what can we do to have another copy of libvirt/vfio/QEMU? I say "primary" as I have no issue with my other VM's being managed in the way they are now, however I wouldn't be against this being universally an option. Are the needed KVM related things in the kernel still accessible in this condition? (This is not my area of expertise) Could we place a secondary copy of bzimage to use if not, or use Fedora/Arch/whatever as a base image to do what we want here? Thinking out loud, but I think there are a decent amount of people who would benefit/appreciate this option. We should call this "Unassigned VM's"..
  7. I never touch a thing, and it does its thing very reliably. I cannot recall from my 1st run/initial setup, however you can see when/if its happened on the backend status page of Mythweb. Mine currently lists this: Last mythfilldatabase run started on Thu Mar 31 2016, 10:15 AM and ended on Thu Mar 31 2016, 10:15 AM. Successful. There's guide data until 2016-04-14 02:00:00 (14 day(s)). DataDirect Status: Your subscription expires on Tue Jul 19 2016 10:33 PM
  8. A little more investigation going against the CPU being bad. This guy had the same issue with XMP and his X99-SLI, one board worked great, the other did not (odd). https://hardforum.com/threads/ga-x99-sli-xmp-fail.1892493/ Also the TSC timer issue seems to be broken in some Gigabyte, and also Asus boards from my research. This may have always been the case (minus the lockup), however I never gave it much thought. Since I had all of these issues I'm much more critical of things looking out of place in the syslog. I think the pcierror's I received are also related to the XMP setting that I have always used on the other (exact same) MB, and not knowing it didn't work quite right with this one. Since disabling it, running the Memtest, and then booting up UnRAID (going on 12 hours here) that message has not returned! So I may be out of the woods soon, and my trigger finger may need to be relaxed from picking up the HX850i I have my eye on (currently on a pretty good sale) here: http://www.newegg.com/Product/Product.aspx?Item=N82E16817139083 Edit: Found an old syslog from my previously good working board.. Same exact TSC message, just never noticed. So with that, I think the CPU is fine. Thanks for listening! Feb 15 09:42:55 Server kernel: Switched APIC routing to physical flat. Feb 15 09:42:55 Server kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 Feb 15 09:42:55 Server kernel: TSC deadline timer enabled Feb 15 09:42:55 Server kernel: smpboot: CPU0: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz (fam: 06, model: 3f, stepping: 02) Feb 15 09:42:55 Server kernel: Performance Events: PEBS fmt2+, 16-deep LBR, Haswell events, full-width counters, Intel PMU driver. Feb 15 09:42:55 Server kernel: ... version: 3 Feb 15 09:42:55 Server kernel: ... bit width: 48 Feb 15 09:42:55 Server kernel: ... generic registers: 4 Feb 15 09:42:55 Server kernel: ... value mask: 0000ffffffffffff Feb 15 09:42:55 Server kernel: ... max period: 0000ffffffffffff Feb 15 09:42:55 Server kernel: ... fixed-purpose events: 3 Feb 15 09:42:55 Server kernel: ... event mask: 000000070000000f Feb 15 09:42:55 Server kernel: x86: Booting SMP configuration: Feb 15 09:42:55 Server kernel: .... node #0, CPUs: #1 Feb 15 09:42:55 Server kernel: TSC synchronization [CPU#0 -> CPU#1]: Feb 15 09:42:55 Server kernel: Measured 228446458923 cycles TSC warp between CPUs, turning off TSC clock. Feb 15 09:42:55 Server kernel: tsc: Marking TSC unstable due to check_tsc_sync_source failed Feb 15 09:42:55 Server kernel: #2 #3 #4 #5 #6 #7 #8 #9 #10 #11
  9. Normal behavior, mine is the same and works perfectly icon for kill switch is set default config file(s) appear to be in place .Xauthority file appears to in place kill switch is set setup desktop icon is set mythtv folders appear to be set Database(s) exists. Starting MariaDB... Checking whether database(s) are ready waiting..... waiting..... icon for kill switch is set default config file(s) appear to be in place .Xauthority file appears to in place kill switch is set setup desktop icon is set mythtv folders appear to be set Database(s) exists. Starting MariaDB... Checking whether database(s) are ready waiting..... waiting..... waiting..... icon for kill switch is set default config file(s) appear to be in place .Xauthority file appears to in place kill switch is set setup desktop icon is set mythtv folders appear to be set Database(s) exists. Starting MariaDB... Checking whether database(s) are ready waiting..... waiting..... icon for kill switch is set default config file(s) appear to be in place .Xauthority file appears to in place kill switch is set setup desktop icon is set mythtv folders appear to be set Database(s) exists. Starting MariaDB... Checking whether database(s) are ready waiting..... waiting..... icon for kill switch is set default config file(s) appear to be in place .Xauthority file appears to in place kill switch is set setup desktop icon is set mythtv folders appear to be set Database(s) exists. Starting MariaDB... Checking whether database(s) are ready waiting..... waiting.....
  10. I'm eyeing a Corsair HXi or AXi PSU, may not tell the wife and see what happens.
  11. When running these checks (never had to) can I do it with the array active, or need to be in maintenance mode? (No comment on the unfortunate circumstances for his wife.. It also hard to make it laughable in any way (I did try, seemed wrong) ) Does anyone know of a CPU utility to properly test for defective conditions? Kind of like a hard drive long test, S.M.A.R.T. check? Considering all the I/O it does, Vt-D, various extensions, memory controllers, it's hard to know for sure if something is just a little off. A dying MB that was left on with a clicking supply (likely its own internal protection) can't be good for a couple million transistors..
  12. Edit: Needed to look further. You should be able to get the 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31) or the 03:00.0 USB controller: VIA Technologies, Inc. Device 3483 (rev ff) controller. Is this the ones you're trying? The add-on card and sound card still apply to what I'm talking about below. Without the ACS patch working for Skylake builds (one user says it does work for him, but he's unfortunately not the majority of cases I've read) considering the grouping, there is NO way this is going to work.. (not trying to be mean) . The issue is that what you want to pass is all in group 7, all this stuff: /sys/kernel/iommu_groups/7/devices/0000:00:1c.0 /sys/kernel/iommu_groups/7/devices/0000:00:1c.2 /sys/kernel/iommu_groups/7/devices/0000:00:1c.4 /sys/kernel/iommu_groups/7/devices/0000:00:1c.6 /sys/kernel/iommu_groups/7/devices/0000:04:00.0 /sys/kernel/iommu_groups/7/devices/0000:05:00.0 /sys/kernel/iommu_groups/7/devices/0000:06:01.0 /sys/kernel/iommu_groups/7/devices/0000:06:02.0 /sys/kernel/iommu_groups/7/devices/0000:06:03.0 /sys/kernel/iommu_groups/7/devices/0000:06:04.0 /sys/kernel/iommu_groups/7/devices/0000:06:05.0 /sys/kernel/iommu_groups/7/devices/0000:06:06.0 /sys/kernel/iommu_groups/7/devices/0000:06:07.0 /sys/kernel/iommu_groups/7/devices/0000:07:00.0 /sys/kernel/iommu_groups/7/devices/0000:08:00.0 /sys/kernel/iommu_groups/7/devices/0000:09:00.0 /sys/kernel/iommu_groups/7/devices/0000:0c:00.0 /sys/kernel/iommu_groups/7/devices/0000:0e:00.0 /sys/kernel/iommu_groups/7/devices/0000:0f:00.0 That's a LOT of stuff unfortunately, and to properly pass hardware (anyway you get it; naturally, ACS override) you need to pass everything to each VM, or stub it (not use it). This doesn't apply to root ports in the grouping. Your sound card that failed was also in the same group (9) with your ethernet card, which I assume you don't want to pass as you need it for UnRAID.
  13. No hating on X99 , we have proper ACS and no IOMMU grouping issues (I think your thinking Skylake and Z/H170 chipset). The thread you're mentioning for the rom, etc. is here (I have not attempted this) http://lime-technology.com/forum/index.php?topic=43644.msg452464#msg452464
  14. Thanks! I appreciate the input and thoughts with this. Being that I had some pretty nasty failures, to me everything is suspect. I cannot think of anything else that would be more likely than the CPU for the TSC clock sync error, and that it at times freezes at that one point (maybe this new MB I am using temporarily is partially defective, it would be unlikely, but at this point I don't know). I have completed parity syncs after both hardware failures/lockups without any issues, while still having all other VM's, etc. operating. This is why I don't feel the PSU is the culprit. I ran Memtest over night (~8 hours) completed 3 passes without any issues (this is with XMP set to off). Here's my current plan as I see it. Wait for my MB to come back from RMA, set it all back up and see if the problem exists. Run Memtest for a couple of passes with "Profile 1" XMP set to on and verify no issues (this should work as it always did previously). At that point I'll make a decision as to if I want to replace the PSU (or test with a smaller unit I have), or RMA the CPU. I have since scrubbed my Cache drive and found no errors present. I'd like to run a verification on my array XFS discs for any corruption. What would be the best way to accomplish this? Parity sync finished both times without errors.
  15. So........... It's been a "slice" lately in the world of my server.. The short and skinny (read bold): About a week ago one of my rear USB ports no longer worked at all, just all of a sudden dead. Back story:This was being used by my UPS, so I got a notification it went offline. Checked cables, reseated, no dice. Plugged into another open USB port, all is well. A week later my server is unresponsive, frozen, all VM's, etc... offline. Server was on (fans spinning) when the wife originally let me know. Hours pass, I get home. NIC LED's are flashing, but the computer is now powered off. Turns out my primary GPU is dead, DEAD... No response, MB hates when it's installed (acts odd, won't boot, no POST beeps). Test in another computer, it won't display and I get a POST beep (likely) indicating no video. Buy a new primary GPU, all is well, back in business. A couple of days later. My motherboard died, DEAD, and needed to be RMA'd. Back story: I came home to a server off, with no NIC lights, and a power supply making a soft tick, tick, tick noise. Did all the testing one would do (swap power supplies (same tick), eliminate extra components), it was dead. I purchase the exact make/model motherboard and replace it (Gigabyte X99-SLI) to ensure it is not something else, and it boots right up, on and working (or so I thought). Hindsight (20/20 = a lot of broken shit! ) Now thinking about this all, the dead USB port, the dead video card, and the now dead motherboard I realize it's all related, and the MB while slowly dying was taking things out with it.. So anyway the new MB had some quirks when changing BIOS settings, as it wouldn't save some time, or would cycle on/off as if it couldn't boot up/POST properly (far more than normal). At times this would lead to the settings being set to default. At first I hadn't loaded the newest BIOS (which the original one was super old), so I updated that to the newest (same as my previous MB that worked fine prior to its demise). This really didn't fix anything, similar situation. I finally found the reason for what seemed to be a lack of proper power up/resetting of settings, and that was the "Profile 1" being set for the XMP memory settings. If I leave it set to off/disabled the settings retain as they should, and the power on/off at boot is better (but still not what I recall as normal for this board). My original motherboard (identical in every way) always had the XMP profile 1 set, and I had previously ran Memtest for 24+ hours verifying it was working properly. I also used this for the last ~3 months without any of the issues I'm describing to you. Anyhow, whatever, boot into UnRAID and sometimes it gets stuck loading at the attached screenshot screen. I let it sit there for a good 5 minutes, and it is stuck. Reboot server (button press) and it will likely load up the next time. Continue on, but have an error in the syslog about the clock or HPET being off, and it complaining (all settings in BIOS are correct, time is correct, not sure the exact message). I then start to get a couple of these over a short period of time: Mar 29 05:20:52 Server kernel: pcieport 0000:00:03.0: AER: Corrected error received: id=0018 Mar 29 05:20:52 Server kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0018(Transmitter ID) Mar 29 05:20:52 Server kernel: pcieport 0000:00:03.0: device [8086:2f08] error status/mask=00001000/00002000 Mar 29 05:20:52 Server kernel: pcieport 0000:00:03.0: [12] Replay Timer Timeout Never seen those before... I research the issue, it seems to be mentioned in some bug reports from as far back as 2011, nothing current, and not something you'd expect. So I figure out which GPU it is, swap it out (same model), and the error is much less frequent, but I still see it once in a while (thought that GPU may have been bad as it was in the next slot next to the one I had to RMA, but I don't think that's the case). So... As of now I am typing in a VM right now (primary computer), have 3 others going (2 of those with GPU's). My server is "working" but it seems flakey at best. No stability issues to report, I've even hammered two of the VM's with multiple performance tests to stress them. I have a Corsair TX650W that has served me well, and I really don't think it is bad in any way. However it's one of my next things to look into further. Now my question: Do you think it is possible my CPU was damaged "lightly" when the MB, or video card died? I have never seen this model MB lose settings, not be able to set the XMP profile to 1 (on), or power on/off at boot until it finally goes. I think (even though it is unlikely) that the MB died from far too much concrete dust that built up a bit in my server when I was removing some of a concrete floor in my basement (yes I'm an idiot, should have been more careful). A link to a diagnostics from the other day (with the PCIerror in syslog), and one from just now. https://app.box.com/s/r1kg2zr4jud3vkiclq0f7go36q5g4erc ---- Also: Yes I know about all the USB related "rounding interval to" and "reset low-speed USB" messages in the syslog. They've been there forever, and only changed as I attempted to disable the XHCI controller, leading to it now listing the "reset low-speed" message, instead of the rounding one.. May need some powered hubs/better USB extensions.
  16. (thinking about fiber currently) Ok, maybe I thought about this incorrectly then. The point here is to (hopefully) catch some good diagnostic information prior to a lockup, and then be able to use that after the hard reset is performed (as it would have been lost normally), right? So were hoping whatever is going on is logged prior to it getting to the point of failure. For some reason I thought cron was still operating some way, and that even though we are locked up from accessing/controlling UnRAID, that if we waited, the cron job would happen and capture all the relevant information. I guess that's not the case, so disregard.
  17. My question would be, is there a way to invoke this prior to doing a hard shutdown? I understand the powerdown script collects information when used, but this use case is for when we cannot do that. Could we monitor the power button input, for say a double press, which would invoke this?.. Just thinking out loud for a way to trigger it when the system is up, but any normal way to control it has stopped working.
  18. Since you asked this, did you follow the directions from here when doing the install (the order supposedly matters, but I doubt the order would cause your issues!)? http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Installing_a_Windows_VM I loaded only viostor driver on install and then installed the rest when windows was installed, i used version 112. You should fix this, go into the device manager and install what is likely missing. As for MSI interrupt, use this utility, run as administrator, select the GPU and audio device, save, close, reboot. http://www.mediafire.com/download/hr...i/MSI_util.zip (if the link doesn't work I'll send it to you once I'm home). Edit: Attached to my post here https://lime-technology.com/forum/index.php?topic=46264.msg442915#msg442915 You can also do this the manual, long, pain in the ass way here http://lime-technology.com/wiki/index.php/UnRAID_6/VM_Guest_Support Towards the bottom in the section Enable MSI for Interrupts to Fix HDMI Audio Support
  19. I don't know your hardware, but you can try to add iommu=pt in your syslinux.cfg file, and also vfio_iommu_type1.allow_unsafe_interrupts=1 (I don't think the order matters). It would look like this: kernel /bzimage append iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 initrd=/bzroot Try one, or both, and see if it helps.
  20. Since you asked this, did you follow the directions from here when doing the install (the order supposedly matters, but I doubt the order would cause your issues!)? http://lime-technology.com/wiki/index.php/UnRAID_Manual_6#Installing_a_Windows_VM
  21. The effort to switch from SEABios to OVMF/UEFI isn't worth it, unless you want console output (as the console dies when you start a VM with SEABios). This may change with 6.2 with the supposed UnRAID GUI on the host, but, we will wait and see what that is all about. "The VGA space itself is a shared resource, so every time the guest tries to access VGA space it gets gets trapped into QEMU, which forwards the request to VFIO, which negotiates with the VGA arbiter and adjusts chipset routing as necessary. Therefore VGA mode is bearable when there's one consumer, once you start getting contention and the arbiter needs to switch routing on each access, this can certainly become a bottleneck. However, even when using a legacy BIOS with x-vga, those VGA accesses *only* occur during early boot or if you're using non-accelerated drivers, so the window for contention is very small. Once the guest is up and running, access to legacy VGA space ceases, and there's no performance difference between legacy BIOS and UEFI that I'm aware of." https://www.redhat.com/archives/vfio-users/2016-February/msg00036.html As for Virtio, these are the drivers you install for network, storage controller, balloon (ram), and serial. If it was HDD related, I'd recommend updating the storage controller one. Have you ran any HDD benchmark's like Crystal, or similar? I DO think it's CPU related, but, I was surprised to see the results I did with latency when doing that test. You can update the ones with the red arrow by the normal way you update Windows drivers, just pick the version of Virtio you plan to test from the fedora site (don't have link at moment, the VM add page links to it).
  22. I decided to leave the latency monitor app open while running some tests with PassMark Performance Test. Most of the tests didn't generate a lot of spikes (2D, CPU, Memory), however the All option under Disk made the bars consistently as high as possible until completed (see pic of red bars spiked for a period of time). Have you tried other Virtio drivers? I know there have been complaints with the newest versions on the VFIO group, but not certain the exact issues.
  23. I decided to see what the latency checker says for me, even though I don't feel lag/etc... I moved the windows I had open around, pinned to the left, right. Anyhow, almost every time I did it, I got a bad red spike However it never felt felt anything other than as expected, bare metal, etc... My mouse doesn't jerk, jump, nada. Now you have a real issue with a game, and that is certainly not normal, however I think the latency checker results at times can be taken with a grain of salt. I do GPU accelerated playback regularly without issue, and 1080P YouTube play without any issues either. I don't have a 4k output like you however. Didn't expect the result to look this bad.
  24. However, unRAID does only seem to use the first nr that is entered after the = mark, so in this example unraid only used core nr 4 and not the inteded 4 and 5. And when i click edit on my VM after i have enabled this using the xml edit only core 4 is checked. Is there any other way i can make it so my VM uses 2 real cores per virtual core? While I have done some manual manipulation of the XML's for UnRAID, I have not played with the vcpu - logical assignments directly. So if you edit the XML manually, and change the setting from 1 - 1 (vcpu to logical) to be 1 - 2, and save, when you go to edit XML again, does your input persist? I would think it should. You can also use the <emulatorpin cpuset => entry for core assignment to force the emulator (440FX) to run on a specific core, and not interfere with interrupts from the host (UnRAID). You can assign it to the same cores as the VM, or an alternate core just for its usage. You'll have to play/experiment as needed, but should help if latency is the main issue. You'll find this discussion here https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html
  25. Unfortunately at this point I'm out of solutions to help (sorry, but didn't want to leave you hanging here). The other issue is that the amount of people doing this with AMD hardware is lower in comparison to Intel, so that gives less input from users who may have a similar situation and can provide solutions. There are certain things you can attempt to add to your syslinux.cfg file that may be of help to isolate this (not just isolcpus), but I don't have the specific answer to that. If you're adventurous, or determined, I'd highly recommend asking your question in the VFIO mailing list, as there are some incredibly bright people there who will likely give more input than I can that may help. While this is not specific to UnRAID, it is specific to VFIO/QEMU/KVM and will likely get you a quicker response than here. Anyhow the information is below: Sign up https://www.redhat.com/mailman/listinfo/vfio-users Archived threads (a lot of good information in here) https://www.redhat.com/archives/vfio-users/ Hope you get it resolved soon!
×
×
  • Create New...