harshl

Members
  • Posts

    106
  • Joined

  • Last visited

Everything posted by harshl

  1. Thank you @SimonF for taking the time to respond and explain the changes that have indeed taken place in software. I have some hardware work to do on the server in the coming days, so I will have the opportunity to test. Thanks again! -Landon
  2. Thank you @SimonF! I'm assuming I need to remove /boot/config/vfio-pci.cfg completely, or should I just empty the file and leave it? Any difference? Interesting that those things are in there. I am not sure where they would have come from. If memory serves me, this may have been passed down from my desktop unraid setup to a dedicated server, but that was literally years ago. So something has changed in unraid as it relates to this file and autostarting VMs, that, or this file was magically placed on my flash drive somehow, which has been in this server without removal for years. Anyway, I appreciate the guidance and I suspect this will fix my particular issue. Thanks again, -Landon
  3. Oh! I just found this! VM Autostart disabled due to vfio-bind error That is clearly the issue. I will start looking at what might be wrong, but I don't believe any hardware has changed so I am not sure what is producing that error. VFIO log looks fine to me? Loading config from /boot/config/vfio-pci.cfg BIND=0000:01:00.0|10de:1f06 0000:01:00.1|10de:10f9 0000:01:00.2|10de:1ada 0000:01:00.3|10de:1adb 0000:09:00.0|1b73:1100 --- Processing 0000:01:00.0 10de:1f06 --- Processing 0000:01:00.1 10de:10f9 --- Processing 0000:01:00.2 10de:1ada --- Processing 0000:01:00.3 10de:1adb --- Processing 0000:09:00.0 1b73:1100 --- Devices listed in /sys/bus/pci/drivers/vfio-pci: vfio-pci binding complete I don't have any references in the syslinux config like we used to do. The only passthrough I even have is a spinner hard drive I am passing to a BlueIris server, which seems to be working perfectly. No other hardware passthrough going on as far as I can see. Thoughts? -Landon
  4. Hey all, I have 3 VMs on one of my unraid servers and after an upgrade to 6.12.8, none of them will autostart after a reboot. Not sure what is going on or how to begin diagnosing it. Dockers auto start as configured, the VMs, though configured to autostart, do not. Manually starting the VMs after boot works perfectly fine. Diagnostics attached. Let me know if you need other details, I don't even know what to provide for this. Thanks for any guidance you can provide! -Landon rackserver-diagnostics-20240321-1948.zip
  5. Thank you so much for building this, works great!
  6. Welcome Justin!! I hope you have great success and an enjoyable time with the Limetech folks!
  7. I think this essentially another vote for snapshots. Once the snapshot is taken you can copy that state of the data anywhere you want while everything is online. Although, I suppose this might go a step further to say, let's not only snapshot, but give us a built in mechanism to copy the snapshot data elsewhere. Regardless, snapshots is key in my mind to enable any of this.
  8. Seems to me that if ZFS is implemented in its entirety, that multiple array pools will also essentially be implemented, no? At least it would be if you chose to use that file system. In any case, I would really like to have VM snapshots implemented in the GUI in some way. It is such a powerful feature, and has been available in VMware products forever. Thanks for a great product!
  9. Honestly, I was also grasping at straws. I decided to just start throwing parts at it. Nothing made any difference until I swapped out the power supply. I had a higher wattage one, threw it in, and no more sporadic reboots. So I double checked all of my wattage calculations and purchased another power supply that should safely cover the load and everything has been fine since. If it's locking up, it might be a little different than my situation as mine was instantly rebooting itself. But I suppose it could manifest many different ways. I would recommend throwing a bigger (wattage) one at it for a time to see if it helps. Good luck @jgillman, instability is no fun.
  10. I know it is a super edge case, but I thought I would close the loop on this one. In my case, this issue was power supply related. Not a wattage issue, but a faulty power supply issue. Put in the same wattage and everything is running great again.
  11. Anyone have any thoughts on this at all? Specifically interested if anyone has thoughts around the PCIe ACS Overrides being related to crashes? Am I barking up the wrong tree there? Guess I will move my gaming VMs to be installed directly on the passed through SSD to eliminate the cache as a player in this from a load perspective. Maybe I will buy another power supply also to see if that makes any difference since it was new with the change... What would you try if it was your system? I just tried running mprime on the host with test number 2, 2 = Small FFTs (tests L1/L2/L3 caches, maximum power/heat/CPU stress). This ran just fine for a long period of time, so the CPU/CPU temp is not an issue as far as I can tell. I also just ran it with test 3, 3 = Large FFTs (stresses memory controller and RAM). This ran just fine as well, so I don't believe I have a memory problem. Well, I just ran mprep.info/gpu from each of the gaming VMs and it has been running the CPU&GPU test for over 30 minutes now without issue. Playing two games at the same time would have for sure crashed it within that time. This is steering me more towards some kind of a storage issue now. Super strange... Just finished running another test against the CPU, video card, and passed through Samsung drives on both VMs simultaneously, ran fine. Just the cache left to test now. If that is stable also, then I guess I suspect the USB controllers? That is the only thing that would be heavily utilized while playing a game that hasn't been load tested so far... Still strange... Well, I've tested the array drives (C drive in the VMs) along with CPU and video cards and still... all runs fine. I cannot crash this thing without having two people use the machines manually with some at least somewhat significant load, like playing a game... This is the weirdest issue ever... Thanks,
  12. You might also check that power management within the VM didn't get enabled somehow. If Windows is going into standby, I believe that will cause the VM to pause as well, though I haven't tested that recently. Worth a quick look anyway.
  13. Hello all, I have a system that has been running stable for nearly 2 years. I changed things up a bit by building a new server, moving most of the storage, container and a few server based VMs to that system. This system is now running only 1 container and 3 VMs. 2 of those VMs are gaming VMs with a dedicated GPU and USB controller for each. This system had heavy load while it was stable as well, but after adding the second dedicated GPU and USB controller has been exhibiting issues, namely, the host crashes/reboots without warning, generally only when under load (gaming or rendering something). It seems to be GPU load that triggers it, but CPU load comes pretty naturally with GPU load in our activities, so not sure. The CPU did not change though and in it's previous server duties would be very busy from time to time and was rock solid through it all. I also upgraded the PSU to an 850 watt PSU when adding the second GPU and USB controller. In order to pass the second GPU to the second VM I did have to enable PCIe ACS Override. Currently it is set to 'downstream'. I am trying to figure out why this machine might be crashing but am coming up empty. There doesn't seem to be any meaningful logs leading up to the crash, I am syslogging to the other unraid server, but they show nothing of significance leading up to the reboot time. Snippet of those logs before and after the crash is attached. That snippet actually shows two crashes relatively close together. I have also attached a diagnostics. I have three suspicions PCIe ACS Override is causing an issue, but the more research I do, the more it seems that is pretty much required when passing through more than one PCIe GPU. Can anyone confirm or deny if that is true? The second video card I added is a 1050 Ti that is bus powered and I wonder if the motherboard is unable to power everything with the heavy video card draw? Not sure how likely that might be. Thoughts on that? If one is correct in that ACS is causing an issue and I can get a motherboard that doesn't need it to pass through two GPUs, then perhaps a better motherboard will solve it? I wish I had another video card that wasn't bus powered to try in it, but I don't and the current market conditions don't make it very conducive to buy one to throw at it for troubleshooting. Anyway, if you have any thoughts on how to troubleshoot this, that would be appreciated. Here are the main components of the build for convenience: I9-9900K ASRock Z390 Taichi 64GB of DDR4 memory that is on the QVL, I can't find my order to see what it was right now EVGA SuperNOVA 850 G5 <--New after change 2x Mushkin 1TB drives mirrored for the array, MKNSSDRE1TB (VM OS's live here) <--New after change, but not new drives 2x WD 4TB drive for the array, WD40EFRX-68W <--New after change, but not new drives 2x Samsung 2TB 980 Pros, unassigned, set to passthrough, one each passed through to the gaming VMs (Games and large software for the gaming VMs lives here) 1x Bluray drive, BW-16D1HT RTX2060 SUPER, passed through to one gaming VM GTX1050 Ti, passed through to the other gaming VM <--New after change, but not new card 2x FL1100 controllers, one each passed through to the gaming VMs <--1x new after change There are of course some other USB accessories and things, but the above is the majority of the major components. Thanks for any thoughts you might have on how to narrow this down. cube-diagnostics-20220207-0931.zip log-snippet.txt
  14. Sounds like you might have configuration issues that are eluding you. I recently moved to NextCloud, but prior to that, I ran this ownCloud container on unraid for years public facing without any issues that weren't self created like not keeping it up to date. Never once did I ever have it default back to a setup screen or have any other major failures to function unless I was at fault in some way. Lots of people using this container with success, just sayin'...
  15. This was most definitely not the case in prior releases. I have always set my cache drives to a higher temp to avoid the false notifications. I was unable to do that in 6.9, even on 6.9.2. However, it did seem to be a browser issue for me as well. I just tried it in Edge, was previously using Chrome, and it appears to have updated and stuck in the GUI. I will report back if I still get a temp alarm email notification below what I set next time my drives are working hard.
  16. I am having an issue on my brothers server that I just can't figure out. Trying to determine if it is hardware related or not. At this point, it seems it is not, but I am running out of ideas. Diagnostic file attached. What is most noticeable is how sluggish his Windows VM feels. Containers are very slow to stop, update, and start. I simulated some disk activity by coping files to and from the cache through in this manner: /mnt/cache/tmp - dd if=/dev/zero of=loadfile bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 0.994332 s, 4.3 GB/s /mnt/cache/tmp - for i in {1..10}; do cp loadfile loadfile1; done Which yields impressive speeds and very little IO wait according to top. Approximately 8-12 while running, which seems very similar to my system, which is actually running significantly slower SSDs. iotop on my two unraid servers rarely yield an IO percentage on an individual process above single digits. Even when it does, it will only be present for one refresh and then gone and settled back down. Under normal operating circumstances, mine is typically showing sub 1 percent figures on all process'. Now on my brothers, as soon as I start containers that start messing with disk access like say, duplicati, or even leaving all of that shutdown and just start a single Windows VM, iotop goes crazy with huge percentage numbers on each line, typically between 25-99% for all of the top talking process'. See this screenshot as an example. This is after the system had been booted for some time and basically everything was idle: As you can see, the actual amount of data being transferred is quite low from a throughput/MBps perspective, but the IO percentage is high. And this never changes, it looks similar to this all the time. Inside Windows, it shows 100% disk activity time, with extremely low throughput: This is what it looks like regardless of the Windows VM we try. His long term desktop he has used for years looks just the same as the screenshot above, which was taken from a fresh install of Windows 10 with nothing installed, no Windows updates happening, nothing, just an idle Windows 10 VM. I have tried various disk cache settings for the Windows VMs, mostly testing with 'none' and the default 'writeback'. That doesn't seem to make any noticeable difference. There doesn't seem to be any obvious resource contention going on. Plenty of CPU and memory is available and as I eluded to above, the disks are capable of FAR more than they are doing, generally speaking. It doesn't feel like a hardware problem because of the sequential speeds I can get, but perhaps these drives are just no good for random I/O... though they claim to be. By the way the drive in use here are a bit out of the ordinary, they are Eluktronix, PRO-X-1TB-G2. They are formatted with BTRFS and are in a mirror. Anyone have any thoughts on things we could try before throwing different hardware at it? I am running out of ideas. unraid-diagnostics-20210317-2009.zip
  17. Thank you for pointing me to that. I guess in my panic I didn't search very well.
  18. I hit this issue for the first time yesterday. anonymized-diagnostics-20210214-1549.zip
  19. Well, I couldn't see anything obvious in the logs as to a cause. So I went ahead and rebooted. Everything came back up fine, but it does make me uneasy. If anyone has any wise ideas on what might have happened I would be excited to hear it. Nothing in the environment has changed for months that I can think of. The only thing that ran over night last night when this seemed to occur were some SSH backup sessions where the server is the target and also an SSD TRIM. I don't see much else... Thanks!
  20. I noticed today that two of my containers were down. Tried to fire them back up and received a very generic "execution error" "server error" message. Took a look at Settings > Docker and found a message stating "one or more paths to not exist" (only shown in basic view). So I went to the CLI I found this, which can't be good: drwxrwxrwx 1 nobody users 272 Feb 14 15:28 cache/ drwxrwxrwx 15 nobody users 293 Jan 31 04:30 disk1/ drwxrwxrwx 9 nobody users 149 Nov 17 16:37 disk2/ drwxrwxrwx 9 nobody users 158 Feb 14 15:28 disk3/ drwxrwxrwt 3 nobody users 60 Nov 15 10:33 disks/ drwxrwxrwt 2 nobody users 40 Dec 9 09:00 remotes/ d????????? ? ? ? ? ? user/ drwxrwxrwx 1 nobody users 293 Feb 14 15:28 user0/ Looks like the user path has gone away somehow. I am just looking through diagnostics now, but I don't do that often. If someone is faster and has time, I would appreciate any advice in knowing what happened and what I should do to correct. I am tempted to just reboot, but will be patient since I am worried that might put me hard down situation. Some things like VMs are still functioning fine, I suppose they reference the cache path, rather than user. Diags attached. Thanks! -Landon anonymized-diagnostics-20210214-1549.zip
  21. Seems that I may have misunderstood that the apps-external directory must be created by the user during upgrade. For some reason I thought it unpacked from the zip, but apparently it does not. Then I noticed there was an external folder inside the 'apps' directory which lead me down the rabbit hole. Sorry for the noise, I was just trying to helpful, but wasn't. Thanks! For reference in case others run into similar confusion, below is what caused my confusion... unzip ./owncloud-complete-20201216.zip ls -alh ./owncloud/ total 416K drwxr-xr-x 12 root root 580 Dec 16 16:02 ./ drwxrwxrwx 3 root root 60 Dec 28 09:44 ../ -rw-r--r-- 1 root root 3.3K Dec 16 16:01 .htaccess -rw-r--r-- 1 root root 163 Dec 16 16:01 .user.ini -rw-r--r-- 1 root root 8.7K Dec 16 16:01 AUTHORS -rw-r--r-- 1 root root 276K Dec 16 16:01 CHANGELOG.md -rw-r--r-- 1 root root 34K Dec 16 16:01 COPYING -rw-r--r-- 1 root root 2.2K Dec 16 16:01 README.md drwxrwxrwx 51 root root 1020 Dec 16 16:02 apps/ drwxrwxrwx 2 root root 80 Dec 16 16:01 config/ -rw-r--r-- 1 root root 4.6K Dec 16 16:01 console.php --snipped-- As you can see there is no apps-external. cd apps ls -alh total 0 drwxrwxrwx 51 root root 1020 Dec 16 16:02 ./ drwxr-xr-x 12 root root 580 Dec 16 16:02 ../ drwxr-xr-x 9 root root 240 Oct 15 23:21 activity/ drwxr-xr-x 5 root root 180 Jul 8 17:23 admin_audit/ drwxr-xr-x 9 root root 180 Nov 26 2018 announcementcenter/ drwxr-xr-x 7 root root 140 Dec 16 16:01 comments/ drwxr-xr-x 6 root root 180 Apr 16 2019 configreport/ drwxr-xr-x 9 root root 220 Feb 6 2020 customgroups/ drwxr-xr-x 6 root root 120 Dec 16 16:01 dav/ drwxr-xr-x 9 root root 240 Sep 2 2019 encryption/ drwxr-xr-x 4 root root 120 Jul 23 14:40 enterprise_key/ drwxr-xr-x 10 root root 300 Apr 16 2019 external/ drwxr-xr-x 9 root root 180 Dec 16 16:01 federatedfilesharing/ drwxr-xr-x 9 root root 180 Dec 16 16:01 federation/ drwxr-xr-x 10 root root 260 Dec 16 16:01 files/ --snipped-- Again, no apps-external, but there is an 'external' directory within 'apps'. This is what confused me and led me to believe there had been a change. Move into the backup of the prior version. cd /mnt/user/appdata/dlandon-owncloud/www/owncloud-old/ ls -alh total 372K drwxr-xr-x 1 root users 484 Oct 29 18:16 ./ drwxr-xr-x 1 nobody users 100 Dec 23 09:46 ../ -rw-r--r-- 1 root users 3.4K Oct 30 09:53 .htaccess -rw-r--r-- 1 root users 163 Aug 3 09:20 .user.ini -rw-r--r-- 1 root users 8.7K Aug 3 09:20 AUTHORS -rw-r--r-- 1 root users 230K Aug 3 09:20 CHANGELOG.md -rw-r--r-- 1 root users 34K Aug 3 09:20 COPYING -rw-r--r-- 1 root users 2.2K Aug 3 09:20 README.md drwxrwxrwx 1 nobody users 1.2K Nov 3 10:41 apps/ drwxr-xr-x 1 nobody users 0 Oct 29 18:16 apps-external/ drwxrwxrwx 1 nobody users 98 Dec 23 09:41 config/ --snipped-- And it was there. What I should have realized, but didn't, is that in the owncloud-old backup folder, ./apps/external also existed and was nothing new...
  22. We'll, it sure was for me. Thanks anyway, Landon
  23. @dlandon Just wanted to let you know that in 10.6 is looks like they have changed the directory structure within 'apps'. The following line in the upgrade post at the beginning of this thread will need a minor change as follows: Current - Copy external apps 'cp -R /mnt/user/appdata/ownCloud/www/owncloud-old/apps-external/ /mnt/user/appdata/ownCloud/www/owncloud/apps-external/'. Migration from <10.6 to 10.6 - Copy external apps 'cp -R /mnt/user/appdata/ownCloud/www/owncloud-old/apps-external/ /mnt/user/appdata/ownCloud/www/owncloud/apps/external/'. From 10.6 going forward - Copy external apps 'cp -R /mnt/user/appdata/ownCloud/www/owncloud-old/apps/external/ /mnt/user/appdata/ownCloud/www/owncloud/apps/external/'.
  24. Looks like someone on Amazon commented specifically that he is using this one with unraid, in case anyone else comes across this later. StarTech.com USB 3.1 PCIE Card - 5 Port - 1x USB-C - 2x USB-A - 1x 2 Port IDC - Internal USB Header Expansion - USB C PCIe Card (PEXUSB312EIC)