Jump to content

dAigo

Members
  • Posts

    108
  • Joined

  • Last visited

Posts posted by dAigo

  1. From what I read, it seems we share a lot of issues.

    I think its similar enough to take the differences as human perception or diffrent priorities in the testing :) 

     

    I was posting about the perfomance/load issues in beta18/19, which got fixed in beta 20: (big THANKS to Eric ;) )

    While there were no real perfomance issues in the VM, thermal throttling or errors due to a overheating CPU were possible:

    A note from the developers

    More bug fixes.  In particular, squashed a bug which resulted in Windows 10 VM's running multi-media applications causing host CPU's to peg at near 100%.  This one was a doozy and we had a -beta20 all ready to go which fixed this issue by reverting back to the linux 4.1.x kernel.  (We figured out the issue got introduced by some change in the kernel 4.3 merge window, but kernel 4.2.x is deprecated.)  Not happy with this compromise and not wanting to wait for kvm developers to acknowledge and fix this issue, our own Eric Schultz took the plunge and started "bisecting" the 4.3-rc1 release to find out what patch was the culprit.  It took something like 16 kernel builds to isolate the problem, and the fix turns out to be a truly 1-line change in a configuration file (/etc/modprobe.d/kvm.conf)!  A big Thank You to Eric for his hard work on this!

     

    - Add halt_poll_ns=0 to kvm.conf - eliminates high cpu overhead in windows 10 [kudos to Eric S. for this!]

     

    So, since beta20 I was investigating the other issue, which at least for me, seems to be a complete lockup of unraid as soon as a VM with a vDisk on the array is getting some sort of i/o on that disk.

     

     

    Even after reading all your posts, I am not sure, if your VMs are running on the array, a cache drive or outside of the array.

    What are your results, if your vm has NO vDisk on any disk that is part of the parity protected array?

    At least for me, moving all vDisks to the cache removes all issues and makes 6.2 as stable and fast as 6.1.9

     

    I posted und summarized my experience an observations in the beta21 release thread:

     

    Still can't use VMs that have at least one vDisk on a physical disk on the array.

    Worked till 6.1.9, broken since first public beta.

     

    Happens even in Safe Mode (no Plug-ins) with docker disabled and a clean "go" file.

     

    Problem:

    - VMs with with at least one vDisk on a physical disk on the array:

        - If the vDisk is the System Disk, VM boots, but never gets to the dektop.

        - If the vDisk is a second Disk, it boots/works fine until I/O is put on the vDisk, then it gets unresponsive

    - Once the VM gets unresponsive, it can no longer be shut or even force shut ("resource busy")

        - After trying to force shut the VM, unraid WebGui gets unresponsive after accessing some pages (VM-Tab, Share-Details)

    - After starting the VM, I have trouble accesing shares

        - Explorer hangs with "no response" after openig a SMB share

        - even MC through ssh locks up the whole ssh session, after trying to access /mnt/user/"share name"

    - more details in my former posts regarding the issue: (I am not the only one with that issue it seems)

        - http://lime-technology.com/forum/index.php?topic=47744.msg457766#msg457766

        - http://lime-technology.com/forum/index.php?topic=47875.msg459773#msg459773

     

    How to reproduce:

    - Start a Windows VM with at least one vDisk on a physical disk on the array

    - put some I/O on that vDisk (booting/copying files)

     

    Diagnostics were taken after I tried to force shut the VM.

     

    I physicaly removed the NVMe cache (and moved libvirt.img to the array), same issue.

    Eric contacted me via PM last weekend, to see if he can get any infos from me regarding the issue.I took a very simple VM (no passthrough and a very basic installation) that can reproduce the crash and he asked me to change the path to the vDisk from /mnt/user to /mnt/disk5. I gave it a try, although I treid it in earlier 6.2 versions.

    What I got was a very strange result, that I dont fully understand.

    So, I gave it a shot, although I tried that back in beta 19/20.

    Changed to disk5 and booted into safe mode with docker disabled.

     

    TL;DR: Better than before, not really fixed. (maybe more than one Problem)

    Depending on the amount of time you want to spend on this case, you could read on for a more detailed report of my day

     

    1. Try:

    Good news is, on the first boot, I actually got to the login-screen and could log in. (did not really try anything else)

    First thought was, "wait a minute, that can't be it, I already tried that".

    So I shut down the VM, and rebooted the Server.

     

    2. Try:

    This time, I also got to the login-Screen, but after login in, all I got was "welcome" with a spinning circle.

    "Ok" I thought, "it's progress, but not fixed."

     

    As I mentioned, in that state, I could not access some/many shares. So I tried how that would behave after changing the path.

    It seemed, that I can't acces any Folder on disk5 except that with the vDisk image.

    I could access a folder/share on disk1 that has an exclude for disk5, but another folder/share in disk1 that has a folder on disk 5 did not work.

     

    But since everything hangs after accessing a "wrong" share, I needed to restart the server, so I did..

     

    3. Try:

    Third boot looked like the first, was able to log in and still confused as to why.

    I have a Android pad and a laptop to do that maintenance stuff, the latter has some shares mounted through smb.

    Maybe it has something to do with accessing a share before/after the VM gets started... (some race between samba/libvirt?)

     

    4. Try:

    Back to the spinning "welcome" circle after logging in...

     

    So, while its definitly better with "/mnt/disk5" insted of "/mnt/user", 50/50 after 4 attempts seems odd.

     

     

    In the End I had probably betweeen 15 and 20 reboots...

    On a scale from 1-10, the range of succsess went from:

     

    1) spinning "welcome" circle

    2) Succsessfull Login and "no response" after opening the "properties" of Drive c:\

    4) Succsessfully opening the "properties" of Drive c:\ and hangig after 2-3 seconds of scandisk

    5) Succsessfully running scandisk and even defrag

    6) beeing able to run a disk benchmark (although with very bad speed (10mb/s seq, 0.3mb/s 4k rand. writes)

    7) hanging after "Disk-Cleanup"

    8 ) "some time of normal usage" before getting no response.

    9) slow but steady vm

    10) normal

     

     

    Strange thing is, it "gradually" scaled from the "1-2" range after 4 tries, to "4" after that, while the last 5 attempts always got me to "6"...

    Because a lot of time has gone by today, I thought "Ok, a slow vm is a good start, lets report back."

    Rebooted into normal mode (plugins&Docker), and even the VM on the array was "working".

     

    While typing this answer (which is problably way to detailed...) the VM got unresponsive while running the native Disk-Cleanup tool.

     

    Do you know a way to increase the log/debug output of libvirt? The libvirt log in its current state does not seem very helpfull.

     

     

    To me it seems like libvirt/samba are somehow trying to get exclusive access over some parts of the array, which eventually leads to a lockup.

    SMB3.0 and so the current samba version has a lot of new "file locks" and acls to prevent unwanted access. For Microsoft, smb3 is a way to make it a shared-storage protocol for hyper-v. maybe there is an issue with "local" access while samba needs a more exclusive access to create a share.

  2. I have just taken the liberty to create a pull that should allow me to use 4096 Bit keys. I have manually replaced my dhparam file but would rather not do manual tinkering with letsencrypt.

     

    Would you be as kind as to add this option to increase the 2048 value?

     

    Thanks

     

    (find attached the reason of my messing with your setup.. the 2048bit key is the only thing keeping me from a 100/100/100/100 score. unneccessary though that might be)

     

    I went into the docker and modified the key generation to 4096bit. (changed /defaults/letsentcrypt.sh)

    Still "only" 100/100/90/100, I think the issue ist, that the keychain includes the 2048bit keys from the LE CA itself, but I could be wrong.

     

    *EDIT: never mind, I was missing "ssl_ecdh_curve" settings für 384bit+ setting. Addes that and got 100/100/100/100 with 4096bit keys.

  3. Still can't use VMs that have at least one vDisk on a physical disk on the array.

    Worked till 6.1.9, broken since first public beta.

     

    Happens even in Safe Mode (no Plug-ins) with docker disabled and a clean "go" file.

     

    Problem:

    - VMs with with at least one vDisk on a physical disk on the array:

        - If the vDisk is the System Disk, VM boots, but never gets to the dektop.

        - If the vDisk is a second Disk, it boots/works fine until I/O is put on the vDisk, then it gets unresponsive

    - Once the VM gets unresponsive, it can no longer be shut or even force shut ("resource busy")

        - After trying to force shut the VM, unraid WebGui gets unresponsive after accessing some pages (VM-Tab, Share-Details)

    - After starting the VM, I have trouble accesing shares

        - Explorer hangs with "no response" after openig a SMB share

        - even MC through ssh locks up the whole ssh session, after trying to access /mnt/user/"share name"

    - more details in my former posts regarding the issue: (I am not the only one with that issue it seems)

        - http://lime-technology.com/forum/index.php?topic=47744.msg457766#msg457766

        - http://lime-technology.com/forum/index.php?topic=47875.msg459773#msg459773

     

    How to reproduce:

    - Start a Windows VM with at least one vDisk on a physical disk on the array

    - put some I/O on that vDisk (booting/copying files)

     

    Diagnostics were taken after I tried to force shut the VM.

     

    I physicaly removed the NVMe cache (and moved libvirt.img to the array), same issue.

    unraid-diagnostics-20160410-1036.zip

  4. I need to see the full log

    As I mentioned earlier, I went into the container to see how it looks.

    So as you will notice, a changed the logging to append, even before your update, so the logs go back til march 25.

     

    Because I dont know enough about docker, I can't rule out my editing may have something to do with it.

    I am under the assumption, that anytime you publish an update, the whole container gets "synced" so any changes should be cleared.

     

    If not, I guess removing the image from unRAID would be the best idea to make sure I have the correct version?

    Please dont waste your time on errors that may have something to do with my editing, thats my own fault :)

     

     

    The log only contains cron-output from the nightly reruns does it?

    The error I mentioned is not visible, because it appeared on the container start.

    However, it reappeared tonight.

     

    *Note: I did not remove the url/domains, they are not showing in the cron-log, only at container-start.

    cronjob running at Fri Apr 8 02:00:01 CEST 2016
    domains.conf: line 1: -d: command not found
    URL is
    Subdomains are
    updating letsencrypt from the git repo
    error: Your local changes to the following files would be overwritten by merge:
    letsencrypt-auto
    Please, commit your changes or stash them before you can merge.
    Aborting
    Updating 764770f..6a7b4a8

     

    The last update you pushed seems related, I'll remove the container/image and start fresh with the existing /config-folder.

     

     

    *UPDATE: Just did it and this time I get: "/defaults/letsencrypt.sh: line 11: ./letsencrypt-auto: No such file or directory"

    *** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
    *** Running /etc/my_init.d/firstrun.sh...
    Setting the correct time
    
    Current default time zone: 'Europe/Berlin'
    Local time is now: Fri Apr 8 19:07:24 CEST 2016.
    Universal Time is now: Fri Apr 8 17:07:24 UTC 2016.
    
    Using existing nginx.conf
    Using existing nginx-fpm.conf
    Using existing site config
    Using existing landing page
    Using existing jail.local
    Using existing fail2ban filters
    Using existing letsencrypt installation
    rm: cannot remove ‘/etc/letsencrypt’: No such file or directory
    SUBDOMAINS entered, processing
    Sub-domains processed are: -d XXXX.XXXX.XX -d XXXX.XXXX.XX -d XXXX.XXXX.XX -d XXXX.XXXX.XX -d XXXX.XXXX.XX
    Using existing DH parameters
    <------------------------------------------------->
    
    <------------------------------------------------->
    cronjob running at Fri Apr 8 19:07:24 CEST 2016
    domains.conf: line 1: -d: command not found
    URL is XXXX.XX
    Subdomains are XXXX,XXXX,XXXX,XXXX,XXXX
    letting the script update itself; help info may be displayed, you can ignore that :-)
    /defaults/letsencrypt.sh: line 11: ./letsencrypt-auto: No such file or directory
    deciding whether to renew the cert(s)
    Existing certificate is still valid for another 75 day(s); skipping renewal.
    * Starting authentication failure monitor fail2ban
    ...done.
    *** Running /etc/rc.local...
    *** Booting runit daemon...
    *** Runit started as PID 121
    Apr 8 19:07:24 88d529fcdb60 syslog-ng[128]: syslog-ng starting up; version='3.5.3'
    

    letsencrypt.zip

  5. Thanks

    Does the Samba release 4.4.0 fix the error in windows 10 not mounting iso files, and we no longer need to add max protocol = SMB2_02

    Yes, mounting iso files in Windows 10 should work now without overriding the max protocol value.

    I can confirm that, at least for me it does.

    Had to use SMB2_02 in beta20, removed it and upgraded to beta 21, can mount VHDs and ISOs.

     

    Lets hope the samba guys didn't break anything important in that version :)

  6. Nice to see regularly updated container :)

    I applied your LetsEncrypt Update for the "nightly pulls"

     

    I saw the following in the logs, is that "normal" or did something go wrong?

     

    error: Your local changes to the following files would be overwritten by merge:
    
    letsencrypt-auto
    Please, commit your changes or stash them before you can merge.
    Aborting
    

    Creating virtual environment...
    ./letsencrypt-auto: 460: ./letsencrypt-auto: virtualenv: not found
    

     

    I dont think that I saw that other versions.

    I got the first one in my log last night as well. Funny enough, that's because letsencrypt updated the auto script itself the night before (v0.4.2 to v0.5.0). Not sure if that was a mistake on their part or not. I'll investigate.

     

    For the second one, I'm assuming that your certs were renewed last night, is that correct? If so that error always happens during renewal and it never caused any issues

     

    No, next Line after the second error said, 76 days remaining, skipping renwal.

    And I checked, still the "old" one.

     

    But yeah, apart from these messages, I have no issues.

  7. Nice to see regularly updated container :)

    I applied your LetsEncrypt Update for the "nightly pulls"

     

    I saw the following in the logs, is that "normal" or did something go wrong?

     

    error: Your local changes to the following files would be overwritten by merge:
    
    letsencrypt-auto
    Please, commit your changes or stash them before you can merge.
    Aborting
    

    Creating virtual environment...
    ./letsencrypt-auto: 460: ./letsencrypt-auto: virtualenv: not found
    

     

    I dont think that I saw that other versions.

  8. The beta looks pretty good for most of my containers (emby is not playing video for me / tvheadend does play video)  my vm's (windows 10, lubuntu) don't start clean and I can't stop them either.  Most of the new features seem to function well.

    If I read your config correct, ar least your Windows VM seems to be on a non-cached share? So its on the array?

    If so it does sound like this: http://lime-technology.com/forum/index.php?topic=47875.msg459773#msg459773

     

    It could be the reason your containers stop to work, because that thing pretty much locks up the array.

     

    But I could be wrong, I am not that familiar with theese diagnostics files yet.

  9. If your answer keep getting as long as my comments, I feel bad for those with limited bandwith  :'(

     

    If 443 was not forwarded to the container, it should not have been able to validate. Plus, the script does not run the letsencrypt command unless the existing certs are at least 60 days old.

     

    Also, just so you know, there are two logs, the docker log will tell you what happened when the letsencrypt.sh ran at container start. And then there is the letsencrypt.log file in the config folder under log/nginx/ and that one stores the latest cron output. Make sure you check both.

    Yep I know, but Logs were lost, which is why I wont bother to find the reason.

    But I am pretty sure 443 was not pointing to the container, I had no proxy setup yet und was using it for another webserver.

    And the "old" keys were correctly "archived" so it had to be a renewal, that was started during the upgrade process of unraid.

     

    Well "pretty sure" is what it is. I'll try and test the setup that I think was active at that time. If it does not work, I am just getting old ;)

     

    I'll give it some thought. Although the 4096 dh might take forever to generate.

    Definetly depends on the system. ~10 minutes is my case and not something thats done every day :) But as I said, "nice to have" not "must have" :)

     

    And I certainly don't want to let the user decide on the frequency, because letsencrypt has a bunch of restrictions on that and they'll block you from further cert creations per domain or per user (this made the testing of this container fairly difficult early on as I kept hitting the limits and now I am getting a ton of e-mails daily about expiring certs that are not being used, and are all duplicates, but were created in the process).

    Can't wait for those mails, created a few dupes my own while fiddling around in your container ;) (does that sound wierd? ::))

     

    I considered that but then realized that it would never be a turn key solution. The user will always have to figure out how to set it up on their systems. I'd rather have them go and research it so they know what they are doing before attempting it, rather than me providing a partial solution and end up with a ton of support requests because they don't know what they are doing and it is just not working.

    Well I guess it worked as intended in my case  8)

     

    I did post copies of my conf files in the letsencrypt thread, though: https://lime-technology.com/forum/index.php?topic=43696.msg437353#msg437353

    Yeah saw that, loved it and learned a lot :)

    I (or my server) prefer subdomains, so I had to go a diffrent route. Working great.

    Tweaked those security settings a bit to get A+ on ssllabs.com

     

    I don't advertise the 60 day thing because honestly, the user does not even need to know that. The container will take care of it all. As long as it's running, the certs will be kept up-to-date. If it was down for a while, it will renew upon container start.

    Like I said, in my case its not for "production", but testing let's encrypt itself. (basicly what you did during coding, I'll just skip the hard work :))

    Example:

    - I renewed the cert 5 days ago. got it from "Authority X1", like the ones before and everything was fine.

    - 3 Days ago, I renewed it again, but got it from "Authority X3". I had one program, that claimed it could not verify revocation, while others could.

    Now I need to clarify, if thats a bug in that program or just a reaction of renewing to fast.

     

    So basically, this isn't really a letsencrypt container. It's actually an nginx container with letsencrypt and fail2ban built-in.

    I understood that after looking into it and thats the reason for my suggestions.

    That combination has so much potential. It feels almost bad to reduce it to letsentrypt with a little side note *proxy...

    Should be the other way around, an awesome webserver and proxy that comes with encryption and security-features :)

    It adds so much flexibility while still being secure. Especially at work, in a Windows environment... try that with IIS...

     

    Anyway, thanks again and keep it up :)

  10. There is 1 more thing i thought of which may cause trouble. My NVMe drive was formatted as XFS before i put it into unraid and it was never reformatted so it is using my old format. It picks up as XFS correctly, obviously this is mounted as my cache drive and has my vms and that on it. As its NVMe im wondering if this could cause system lockups if the drive were to crash since it is a direct pci-e device

     

    Just one to throw out there

    Same thing here, I had the NVMe disk outside of the array with xfs and just added it as the cache in 6.2. No re-formating was done, I asumed everything is fine.

     

    I do not know if its related to your issue, but I also have some lockus, but just while running VMs with vDisks on the array and only with 6.2.

    There is a third person with NVMe and lockups, see HERE for details.

     

    I was hoping it had something to do with the "Host CPU-Overhead" in beta18 & 19, but that one is fixed.

    My issue still remains.

     

    My "Server 2012R2" VM has a vDisk for backups. With beta20, I moved it back to the cache to see if its better.

    Yesterday at 3:00am the whole array and webpage got unresponsive, thats when the backupjob started writing to  the vDisk on the array.

    Today probably the same, but I was sleeping, nothing was working when I woke up.

    In both cases I had to powercycle the whole system.

     

    So, I just came home, startet a VM, that I only use for home-office. So its on a SSD on the array. Within a minute or two, the system hang.

    SSH still works, so I could create a diagnostics. (attached!)

    At that time, 4 VMs were running. I was using some ressources through vpn during the day, evething was fine.

    - Linux (Asterisk) VM, running only on the array since the restart this morning

    - Server 2012R2 VM, running with the system disk on the cache and backup on the array (no backup jobs during the day)

    - Gaming (Win10) VM , running only on the Cache (started around 16:41 UTC)

    - Work (Win10) VM, running only on the array (started around 16:42 UTC)

     

    I noticed the lockdown between 16:50 and 16:55. Because I knew I had to powercycle the system, I tried to shutdown as many VMs as possible.

    - The work vm was never reachable through the network, console I dont know, webPage did not work.

    - I could still connect through RDP to the Server 2012R2 and shut it down.

    - I shutdown the Gaming VM

    - I "force" Shutdown the Linux VM

     

    I connected to the server by ssh to collect the diagnostics and saw that all the VMs that have a vDisk on the array were still running, but in the state "shutdown in Progress" (virsh list), so only the VM that has no vdisk on the array was shut down normaly.

     

    TL;DR:

    - Since 6.2 beta 18, my whole server gets unusable as soon as any VM has some sort of write activity on a vDisk that is placed on the array.

    - I had no Issues with that in 6.1

    - Apart from all the changes in 6.2, I put my old cache (Sata SSD) in the array, and I am using my NVMe disk as a cache device.

    - unRAID itself seems fine, but anything that uses the array (VMs, _some_ Dockers, SMB, Midnight Commander) and the array itself) is unresponsive.

     

    I also had the issue with NO cache device (libvirt.img was also on the array), but I never completly removed the NVMe disk.

    I guess to rule out any issues with NVMe, I need to remove the Sata SSD from the array, put it back as the cache and remove the NVMe device completly... A lot of work, I have no diagnostics from 6.1., but if the issue remains after the removal of the NVMe disk, I would rollback and create some diagnistics. Unfortunatly, the "prevoius" folder probably contains beta 19 and not 6.1 :)

     

    Maybe you find something in the Logs, until I find the time to do more testing.

    unraid-diagnostics-20160329-1903.zip

  11. Issue is not fixed, just not as bad I guess or I that 1 VM was not enough to trigger the Problem, who knows.

     

    Moved everything back to the array and as soon as a windows VM with a system disk on the array boots or a secondary vDisk is used heavily during a backup, system lock up.

    But my other issue is fixed, so I can focus on this problem...

     

    Reported it in the 6.2 beta20 announcment with some diagnostics.

  12. Your options are endless, problem is, source and destination have so many variables (of which you did not share much ;)), that there is no complete guide that catches all the if's and when's...

     

    Basic idea is:

    1) create a backup (there are thousands of programs, but it should have preferably a bootable version to start the vm with)

    2) create a vm that is as similar as possible to your pc, every change may result some form of error that must be solved. *

    3) boot the vm and restore the backup.

    4) If your vm does not boot, the Windows installation disk may help.

     

    *) As long as you have no "virtio" disks, Win10 should have no issues at boot due to drivers. (sata prefered)

    *) Make sure, that the BIOS type matches that of your physical installation (BIOS/MBR = SeaBIOS, UEFI/GPT = OVMF)

     

    Acronis is a good program, but not the cheapest.

    CloneZilla can basicly do the same, just not so comfortable. (its free though)

    You could search for "clonezilla p2v". (physical-to-virtual).

     

    And remember, you might need to re-activate windows, due to the massive hardware change.

    Could be tricky if there were upgrades involved.

     

    Depending on your Installation, starting fresh and re-installing could be faster.

     

    Or... You could also skip creating an image completly.

    If you passthrough the whole disk, Win10 should have no issue booting, as long as the bios-type matches.

  13. I have som issues with the beta and a VM.

    In 6.1.9 I had a Windows 2012 R2 VM (running in Bios mode, since I could not manage to make partitions in UEFI mode), it was running in one disk image, placed on Disk 1.

     

    [...]

     

    When I tried to access the data drive in the VM now, the VM hang.

    Could not shut it down.. and "Force Stop" did not work either.. Just got this error:

    "Execution error ... Failed to terminate process 16228 with SIGKILL: Device or resource busy"

     

    Is this a bug in 6.2 beta, or is it becuase I have one VM, running on two different Disk images, placed on differnt disks?

    I will try to re-install it, just using one Disk image, placed on Cache drive.

    Sounds exactly like THIS.

    No solution or reason why, but it seems there are issues while running VMs with disks on the array.

     

    More like a "bug" with vdisks placed on the array, that crashes VMs when the start writing something on it.

    If you place your OS vdisk on the array, you probably wont get far after the login, if you even make it that far.

     

    Cache-Only VMs do not show these issues. Workaraund would be to copy everything to the cache or run it outside of the array or a rollback.

     

    This seems also to be fixed.

    I can run VMs on the array again. At least one, have to copy everything back to be sure.

  14. Hi Bob,

     

    Thanks for replying.  Sounds like I am attempting something similar.  I have a the Windows Install ISO mounted and I am booting into the recovery console.  Within this console it asks to point to the image, which I created earlier and it is stored on the network.  It does not allow me to "browse" to the file location and instead asks for the file path, but it fails each time I try to point to the image.  For example, my image is located here:  \\tower\cache\WindowsImageBackUp\UNRAID-Win7\Backup 2016-03-10\ .  After entering the image location it asks for the network credentials, it defaults to an odd domain of MININT-AS426KG, so I use a "\" in my username to get out of the domain.  Then I get an error saying the domain can not be reached. 

     

    So bottom line, I don't think windows is seeing the folder on the network.  Anything special I have to enter to get to the network?

    You should be more specific about your system.

    I can guess its a Win7 Image, but is it x86 or x64? What unRAID version? 6.1 or 6.2 (beta)?

     

    Depending on your system, there may be more than one issue.

    1) Windows 7 Recovery/Installation Disk has no USB3 (xhci) Support, be sure to set the USB of your VM to USB2 (ehci) (default in 6.1)

    2) Recovery/Installation Disk does not come with virtio network drivers.

    3) x86 Windows has limitied support for UEFI (ovfm)... Select Seabios when creating the VM

     

    If 2) is your only issue, you are probably stuck with loading the drivers.

    Make sure to attach not only the Windows ISO, but also virtio drivers. You can add driver in the recovery wizzard (See

    (at around 7:00))

    If I remember correctly, you have to connect as "anonymous" (empty PW) to your share if you only have root as a user.

     

    The Installation media has a command line somewhere in the recovery options.

    You could type "ipconfig" to see if it has a network card and/or correct ip.

    Or you connect to your unraid host and type "arp" your VM should show up with its mac address.

     

     

    *MININT-xxxxx is a random hostname, that every installation/recovery environment gets and every windows system takes its own hsotname/domain as the default for any network authentication. Nothing odd here :)

  15. OK, the SeaBIOS switch worked, I think that's what the original VM was set to. Can it not be switched once a VM is created?

    No.  If you want to switch types you have to rebuild the VM from scratch.

     

    I'm happy to have my VM back up and running, but is this the expected behavior?

    if you switched types then yes this is expected behaviour.

    Not enterely true, you can convert BIOS (SeaBIOS) to UEFI (OVMF).

    I did it 3 years ago on a physikal machine (Win8 i think, maybe Win7) and last week on unRAID with a Win10 VM.

     

    1) I copied the vDisk of my VM. (as an insurence if anything went wrong)

    2) created a new VM (same specs, just OVFM) and attached the copied vDisk.

    3) booted a gParted .iso and converted the disk from MBR to GPT (see comment from "Lars Johan Olof Lidström")

    4) mounted virtio driver .iso and booted with the Windows .iso

    5) start setup and follow it until the driver installation, just choose the storage driver, so that the disk is found.

    6) cancel setup and go to recovery mode. Choose cmd/cli to use diskpart.

    7) followed THIS GUIDE to create a working GPT/EFI Boot environment. Reboot and you are done.

     

    *) I changed the MAC-addres to that of the original VM, but thats optional and only neccesary if the mac is uses somewhere in your vm/network

    *) 32bit Windows may not work at all with EFI/OVMW.

    *) In my case, I never had do re-activate my Windows license, but I can't guarantee it.

     

    A lot of steps and some knowledge about the involved tools would be recommended. Which is why the usual answer is "No, do a fresh install" and probably why you can't switch this option in unRaid on an existing VM.

     

    So depending on your installation, a fresh start may be simpler/faster, but its a good learning experience :)

    And if you do it on a copy of your disk, with a new VM, worst case is you waste a lot of time :)

  16. Gents, I have some good news.  We've figured this issue out.  We had to bisect a merge window kernel (NOT FUN) to figure this one out, but we did it.  It will affect all Windows 10 VMs running under QEMU/KVM running on any 4.3.x or 4.4.x branch of the Linux kernel.  Thankfully, we found a way to solve it WITHOUT having to modify the kernel itself.  The next beta will have this resolved.

    Great news and very fast delivery as always!

    Sounds like very time consuming and evil sorcery, so thanks for the effort  ;D

     

    I'll report back if it helped me.

     

    However, squark said he had the issue before upgrading (asuming 6.1.9) and that was a 4.1.18 kernel.

    Could he have been affected earlier due to some specific hardware or config?

     

    Or he had another issue and I just hijacked his thread  :-[

  17. I have som issues with the beta and a VM.

    In 6.1.9 I had a Windows 2012 R2 VM (running in Bios mode, since I could not manage to make partitions in UEFI mode), it was running in one disk image, placed on Disk 1.

     

    [...]

     

    When I tried to access the data drive in the VM now, the VM hang.

    Could not shut it down.. and "Force Stop" did not work either.. Just got this error:

    "Execution error ... Failed to terminate process 16228 with SIGKILL: Device or resource busy"

     

    Is this a bug in 6.2 beta, or is it becuase I have one VM, running on two different Disk images, placed on differnt disks?

    I will try to re-install it, just using one Disk image, placed on Cache drive.

    Sounds exactly like THIS.

    No solution or reason why, but it seems there are issues while running VMs with disks on the array.

     

    More like a "bug" with vdisks placed on the array, that crashes VMs when the start writing something on it.

    If you place your OS vdisk on the array, you probably wont get far after the login, if you even make it that far.

     

    Cache-Only VMs do not show these issues. Workaraund would be to copy everything to the cache or run it outside of the array or a rollback.

  18. What's up with SAMBA shares in this beta? I am using two VMs (win2012 - Win10) and have a downloads folder located on a SSD shared using 'unassigned devices' plugin.

    If I download and save a file to the downloads folder I have to refresh explorer or close it/reopen it to see the file. I also have a very large (120GB) VHD file located on the same share that cannot be opened/mounted unless I first copy it to my C drive on my windows guest. What's going on?

    You could try some Client Side Fixes and/or some Server Side Fixes to solve the refresh-issue.

  19. As always, first things first. Thanks for the LetsEncrypt container, its almost to easy now :)

     

    I started using it ~28 days ago, and from what I remember, renewal should be due with around 30 days left.

    So I started the container and saw in the logs, that the current certs were only 10 days old and not up to renewal...

     

    The container was mostly offline, but I startet it while updating to 6.2 beta. (10 days ago...)

    Port 443 was redirected to another server that is not set up for the validation process, so a new cert could not have been succsessfully requested/validated.

    And the cert was only 18 days old, so it should not have been renewed by your cronjob.

     

    I'm not sure what or why that happend (maybe some date/time issues while upgrading) and its not really important.

     

    But it got me curios, so I looked through your code, because the container description is reduced to the necessary stuff.

    The code is easy to read, learned a lot about letsencrypt and that earned some bonus points :)

     

    Some things I would like to request/suggest, if its not to much work. (to make great things even greater ;))

    1) could you add DH and RSA Key length as a variable? 2048 is good and definitly enough, but greedy people like me may want 4096+... should be quite easy from what I see.. I changed DH in firstrun.sh and added --rsa-key-size in letsencrypt.sh and got my 4096-keys. Any container update would probably revert those changes, but the renewal.conf contains the rsa-key option and dh won't be renewed anyway :)

     

    2) In my case, I did not notice the "accidental" but successfull renewal. Is there a easy way to add some form of notification on successfull/failed creations or renewals?

     

    3) I saw, that you are using --renew-by-default. That does mean, the cert will be renewed even if the 60 days are not over, which is probably what happened 10 days ago. You could add a variable for that as well, for those who would like to test the renewal process more frequently, its still beta after all :)

     

    4) while its great to generate certs and learn about letsencrypt, some sort of reverseproxy-support out-of-the-box would be a perfect addition. While its definetly more work, a simple proxy-conf could be generated through some variables in the container template (source url, destination url) I guess?

     

    5) maybe a NTP option to make sure date/time is correct for renewal?

     

    Maybe you could add some info about the renewal to your readme/description?

    - Default Renewal after 60 days

    - Renewal does not re-validate the domain, as long as the correct cert is found on port 443 (correct me if I am wrong). So after cert creation, you could move the containter to another port if you want? For example, if another server needs to run on 443 and reverse proxy is not working/wanted.

     

    But as I said, its already a really usefull container, even without any additional features, so thanks again :)

  20. I did not report the error or try to investigate, because I have other issues with 6.2.

    So my logs may show stuff that is unrelated. Its usually better to solve one problem at a time.

    Unless you also have the issue of a high Host-CPU-Load while watching a Video (YT) in the guest (on the nvme cache), then it may be related :)

     

    I could reproduce the error on my server and post the diagnostics if Lime-Tech thinks it helps.

    Forcefully shutting down unRAID, while the array is running is not something I want to do that often ^^

     

    From the official wiki for 6.0 / 6.1:

    Create User Shares for Virtualization

    So I guess, running a vm from the array was supported, but not recommended due to perfomance.

     

    But 6.2 may look different...

  21. Failed to terminate process 13447 with SIGKILL: Device or resource busy
    

     

    After that error appears the webUI locks up (I still get SSH access) and I cannot umount any of the disks so safely shut the machine down so I have to force restart it

     

    The vdisk for the VM is stored on my array, though it does use the cache layer.

     

    I have the excact same issue. I moved my vms to the array during upgrade, to add the cache drive.

    When I startet a vm it showed the symptoms you described, even before adding the nvme cache drive.

    Some instances of other programs (MC, htop) also froze and hat to be restartet in another ssh session.

     

    It may be related to NVMe, because you also have a NVMe disk, but at this point, I dont think it does.

     

    Once I added the new cache (NVMe), moved the "system"-share with libvirt.img an all vms to the cache, they were working again.

     

    To sum it up, there are 2 issues:

    1) A VM (at least windows) running a vDisk on the array, infinitly boots (without freezing).

    2) Destroying a VM in the state mentioned above, will fail (resource busy) an lock up the WebGui and other stuff

     

    I believe running VMs directly from the array was never recommended, but it definitly worked with 6.1.9.

    Some of my vms had a second disk on the array for backup reasons, not anymore.

     

    Btw. some VMs (win10) bootet do the desktop, but I could not do anything apart from moving the mouse.

    My Server2012r2 VM never made it so far.

     

    Maybe some guest-agent issues, that begin once it gets loaded, which may be at diffrent times for diffrent VMs/OSs

  22. Have you tried the following command to switch to SMB1.0 on the windows box?

     

    Set-SmbServerConfiguration -EnableSMB2Protocol $false

     

    Got it from here:  https://support.microsoft.com/en-us/kb/2696547

     

    Edit: not sure why you would want to do it from the Windows side instead of your workaround but I'm curious if it would work.

    Client side makes sense, in a larger environment, where some clients/server need smb3 and others can't use it.

    Haven't tried it, but I guess its good to mention, that there may be other workarounds.

×
×
  • Create New...