Jump to content

harshl

Members
  • Posts

    106
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

harshl's Achievements

Apprentice

Apprentice (3/14)

12

Reputation

  1. Thank you @SimonF for taking the time to respond and explain the changes that have indeed taken place in software. I have some hardware work to do on the server in the coming days, so I will have the opportunity to test. Thanks again! -Landon
  2. Thank you @SimonF! I'm assuming I need to remove /boot/config/vfio-pci.cfg completely, or should I just empty the file and leave it? Any difference? Interesting that those things are in there. I am not sure where they would have come from. If memory serves me, this may have been passed down from my desktop unraid setup to a dedicated server, but that was literally years ago. So something has changed in unraid as it relates to this file and autostarting VMs, that, or this file was magically placed on my flash drive somehow, which has been in this server without removal for years. Anyway, I appreciate the guidance and I suspect this will fix my particular issue. Thanks again, -Landon
  3. Oh! I just found this! VM Autostart disabled due to vfio-bind error That is clearly the issue. I will start looking at what might be wrong, but I don't believe any hardware has changed so I am not sure what is producing that error. VFIO log looks fine to me? Loading config from /boot/config/vfio-pci.cfg BIND=0000:01:00.0|10de:1f06 0000:01:00.1|10de:10f9 0000:01:00.2|10de:1ada 0000:01:00.3|10de:1adb 0000:09:00.0|1b73:1100 --- Processing 0000:01:00.0 10de:1f06 --- Processing 0000:01:00.1 10de:10f9 --- Processing 0000:01:00.2 10de:1ada --- Processing 0000:01:00.3 10de:1adb --- Processing 0000:09:00.0 1b73:1100 --- Devices listed in /sys/bus/pci/drivers/vfio-pci: vfio-pci binding complete I don't have any references in the syslinux config like we used to do. The only passthrough I even have is a spinner hard drive I am passing to a BlueIris server, which seems to be working perfectly. No other hardware passthrough going on as far as I can see. Thoughts? -Landon
  4. Hey all, I have 3 VMs on one of my unraid servers and after an upgrade to 6.12.8, none of them will autostart after a reboot. Not sure what is going on or how to begin diagnosing it. Dockers auto start as configured, the VMs, though configured to autostart, do not. Manually starting the VMs after boot works perfectly fine. Diagnostics attached. Let me know if you need other details, I don't even know what to provide for this. Thanks for any guidance you can provide! -Landon rackserver-diagnostics-20240321-1948.zip
  5. Thank you so much for building this, works great!
  6. I think this essentially another vote for snapshots. Once the snapshot is taken you can copy that state of the data anywhere you want while everything is online. Although, I suppose this might go a step further to say, let's not only snapshot, but give us a built in mechanism to copy the snapshot data elsewhere. Regardless, snapshots is key in my mind to enable any of this.
  7. Seems to me that if ZFS is implemented in its entirety, that multiple array pools will also essentially be implemented, no? At least it would be if you chose to use that file system. In any case, I would really like to have VM snapshots implemented in the GUI in some way. It is such a powerful feature, and has been available in VMware products forever. Thanks for a great product!
  8. Honestly, I was also grasping at straws. I decided to just start throwing parts at it. Nothing made any difference until I swapped out the power supply. I had a higher wattage one, threw it in, and no more sporadic reboots. So I double checked all of my wattage calculations and purchased another power supply that should safely cover the load and everything has been fine since. If it's locking up, it might be a little different than my situation as mine was instantly rebooting itself. But I suppose it could manifest many different ways. I would recommend throwing a bigger (wattage) one at it for a time to see if it helps. Good luck @jgillman, instability is no fun.
  9. I know it is a super edge case, but I thought I would close the loop on this one. In my case, this issue was power supply related. Not a wattage issue, but a faulty power supply issue. Put in the same wattage and everything is running great again.
  10. Anyone have any thoughts on this at all? Specifically interested if anyone has thoughts around the PCIe ACS Overrides being related to crashes? Am I barking up the wrong tree there? Guess I will move my gaming VMs to be installed directly on the passed through SSD to eliminate the cache as a player in this from a load perspective. Maybe I will buy another power supply also to see if that makes any difference since it was new with the change... What would you try if it was your system? I just tried running mprime on the host with test number 2, 2 = Small FFTs (tests L1/L2/L3 caches, maximum power/heat/CPU stress). This ran just fine for a long period of time, so the CPU/CPU temp is not an issue as far as I can tell. I also just ran it with test 3, 3 = Large FFTs (stresses memory controller and RAM). This ran just fine as well, so I don't believe I have a memory problem. Well, I just ran mprep.info/gpu from each of the gaming VMs and it has been running the CPU&GPU test for over 30 minutes now without issue. Playing two games at the same time would have for sure crashed it within that time. This is steering me more towards some kind of a storage issue now. Super strange... Just finished running another test against the CPU, video card, and passed through Samsung drives on both VMs simultaneously, ran fine. Just the cache left to test now. If that is stable also, then I guess I suspect the USB controllers? That is the only thing that would be heavily utilized while playing a game that hasn't been load tested so far... Still strange... Well, I've tested the array drives (C drive in the VMs) along with CPU and video cards and still... all runs fine. I cannot crash this thing without having two people use the machines manually with some at least somewhat significant load, like playing a game... This is the weirdest issue ever... Thanks,
  11. You might also check that power management within the VM didn't get enabled somehow. If Windows is going into standby, I believe that will cause the VM to pause as well, though I haven't tested that recently. Worth a quick look anyway.
  12. Hello all, I have a system that has been running stable for nearly 2 years. I changed things up a bit by building a new server, moving most of the storage, container and a few server based VMs to that system. This system is now running only 1 container and 3 VMs. 2 of those VMs are gaming VMs with a dedicated GPU and USB controller for each. This system had heavy load while it was stable as well, but after adding the second dedicated GPU and USB controller has been exhibiting issues, namely, the host crashes/reboots without warning, generally only when under load (gaming or rendering something). It seems to be GPU load that triggers it, but CPU load comes pretty naturally with GPU load in our activities, so not sure. The CPU did not change though and in it's previous server duties would be very busy from time to time and was rock solid through it all. I also upgraded the PSU to an 850 watt PSU when adding the second GPU and USB controller. In order to pass the second GPU to the second VM I did have to enable PCIe ACS Override. Currently it is set to 'downstream'. I am trying to figure out why this machine might be crashing but am coming up empty. There doesn't seem to be any meaningful logs leading up to the crash, I am syslogging to the other unraid server, but they show nothing of significance leading up to the reboot time. Snippet of those logs before and after the crash is attached. That snippet actually shows two crashes relatively close together. I have also attached a diagnostics. I have three suspicions PCIe ACS Override is causing an issue, but the more research I do, the more it seems that is pretty much required when passing through more than one PCIe GPU. Can anyone confirm or deny if that is true? The second video card I added is a 1050 Ti that is bus powered and I wonder if the motherboard is unable to power everything with the heavy video card draw? Not sure how likely that might be. Thoughts on that? If one is correct in that ACS is causing an issue and I can get a motherboard that doesn't need it to pass through two GPUs, then perhaps a better motherboard will solve it? I wish I had another video card that wasn't bus powered to try in it, but I don't and the current market conditions don't make it very conducive to buy one to throw at it for troubleshooting. Anyway, if you have any thoughts on how to troubleshoot this, that would be appreciated. Here are the main components of the build for convenience: I9-9900K ASRock Z390 Taichi 64GB of DDR4 memory that is on the QVL, I can't find my order to see what it was right now EVGA SuperNOVA 850 G5 <--New after change 2x Mushkin 1TB drives mirrored for the array, MKNSSDRE1TB (VM OS's live here) <--New after change, but not new drives 2x WD 4TB drive for the array, WD40EFRX-68W <--New after change, but not new drives 2x Samsung 2TB 980 Pros, unassigned, set to passthrough, one each passed through to the gaming VMs (Games and large software for the gaming VMs lives here) 1x Bluray drive, BW-16D1HT RTX2060 SUPER, passed through to one gaming VM GTX1050 Ti, passed through to the other gaming VM <--New after change, but not new card 2x FL1100 controllers, one each passed through to the gaming VMs <--1x new after change There are of course some other USB accessories and things, but the above is the majority of the major components. Thanks for any thoughts you might have on how to narrow this down. cube-diagnostics-20220207-0931.zip log-snippet.txt
  13. Sounds like you might have configuration issues that are eluding you. I recently moved to NextCloud, but prior to that, I ran this ownCloud container on unraid for years public facing without any issues that weren't self created like not keeping it up to date. Never once did I ever have it default back to a setup screen or have any other major failures to function unless I was at fault in some way. Lots of people using this container with success, just sayin'...
  14. This was most definitely not the case in prior releases. I have always set my cache drives to a higher temp to avoid the false notifications. I was unable to do that in 6.9, even on 6.9.2. However, it did seem to be a browser issue for me as well. I just tried it in Edge, was previously using Chrome, and it appears to have updated and stuck in the GUI. I will report back if I still get a temp alarm email notification below what I set next time my drives are working hard.
×
×
  • Create New...