Unraid unresponsive after boot. Takes 15 min to autostart vm.


Recommended Posts

Hi everyone,

 

I'm running a trial version of unraid, and hoping to resolve this issue before solidifying my purchase.  I have unraid setup on a workstation with 1 NVME which hosts my VM (which is Win 2019 domain controller), and 3 8tb WD Blue drives configured in raidz1.

 

Upon boot, I watch the screen and see the IP for the UI.  I can log into the UI, and see that the array (of 1 disk...the nvme) is starting.  That takes around 2-ish min to start.  After that the UI freezes for ages and is non-responsive to reloads via IP.  I haven't timed it, but I'd say 15 min before the UI is responsive again, and my VM boots.  Once the UI is up and the VM has booted once, I can reboot it and it goes down and comes up quite fast.  If I shut down the VM, the UI becomes unresponsive again for another 15 min or so. When it becomes responsive again, I can boot the VM, and we're back in business.

 

Any thoughts on what might be going on here?  Drives are 1 month old and have been burnt in.  Ram has been memtested.  NVME is also around a month old and is a WD 850 2tb.  The rest of the box is in good shape having been used daily for a few years.

 

I've created a diagnostic zip, and checked the logs.  I see an issue with time sync but have NTP setup pointing both locally to the DC (maybe this is the problem?), and the 2nd, 3rd, and 4th NTP pointing externally.  I have two external pi-hole DNS servers, so the external NTP should always be reachable.

 

EDIT: I deleted the inline log, and attached the full diagnostic zip.

 

Any and all help is greatly appreciated.

 

Thanks,

Cal.

 

 

Edited by calvados
Removed inline log, and attached diagnostic zip
Link to comment

Hi everyone,

 

I've narrowed it down to the the VM service.

 

Upon boot the machine starts up and unraid works on starting the array.  Once the array has been started Unraid freezes up for 15-ish min.  This occurs if the VM service is enabled whether or not the VM is set to auto-start.  If the VM service is disabled unraid starts up fine.

 

If I start the VM service after unraid successfully boots, it freezes for 15-ish min.

 

I have 3 VMs on the machine, with 1 of the (the domain controller) set to auto-start. 

 

The VMs are:

Server 2019 - Domain controller

2x Windows 11 Pro

 

Such a long boot time is a show stopper for me that I hope to resolve.  Any thoughts/suggestions as to how to figure out how to resolve this?

 

Thanks everyone,

Cal.

 

Link to comment
8 hours ago, calvados said:

If I start the VM service after unraid successfully boots, it freezes for 15-ish min.

That's very weird, don't remember seeing anything similar before, you could try backing up current libvirt.img, then creating a new one (note that all VMs will be gone from the VM page, not from the server), then see if just starting the service with a new libvirt is normal, if yes create a new test VM and again check if it still starts normally, you can always restore the old ilibvirt if/when needed.

Link to comment

Thanks for your reply @JorgeB.  I stopped the VM service, renamed libvirt.img to libvirt.img.backup, and started the service again.   Upon starting the service a new libvirt.img was created, but the system is frozen.  I'm unable to pull up any pages in the UI.  I may do a hard reboot shortly if it doesn't become responsive.  (EDIT, the UI became responsive after 15-ish min)

 

Any other suggestions as to what I can try?  My NVME drive and the 3 drive zfs array are populated with files that I would rather not have to re-populate from external sources.  Is it possible to do a 'factory reset' without losing the data on these drives?

 

Thanks again @JorgeB,

Cal.

Link to comment
Just now, calvados said:

Is it possible to do a 'factory reset' without losing the data on these drives?

Since you only have the NVMe assigned it's quite easy, backup current flash, create a new Unraid install, restore only the key, then re-assign the NVMe as disk1 as it is now, don't touch the zfs pool for now or install anything xfs related, see how the VM service behaves after that.

Link to comment

Hi @JorgeB,

 

Thanks for all your help and advice.  With your help I think I have figured out the root of this issue, and have somewhat, but not perfectly, solved the issue for my use case.  Following your advice I backed-up and created a new USB Key, added the NVMe, then the VMs, and step by step started adding things back.

 

My goal was to have 1 physical server running Unraid hosting my Server 2019 domain controller that would service all the devices on my network, and authenticate shares from my zfs pool via AD/LDAP.

 

Things I have learned:

On Unraid, under Settings-SMB-Active Directory Settings, you need to have the DC powered up to join the domain (obviously), however in Unraid, you cannot edit/join AD while the Array is running, which means you can't have the DC VM running.  Thus joining Unraid to an AD VM hosted on Unraid won't work, since I will never be able to reach the DC that is a powered off VM in Unraid.

 

A suggestion was made in an older post to spin up a 2nd DC on another VM host and join that one.  This works to get Unraid to join the domain, however it comes at a cost.  You must keep that external DC up.  If not, you end up with the 15-ish min hang time I was experiencing.  By having an external AD running, Unraid is responsive and does not exhibit the aforementioned freezing.

 

For some reason when

  • starting or stopping the array
  • loading the "VM Manager" settings window,
  • loading the VMS page,
  • starting the VM DC,

Unraid appears to try to reach the DC, and (I'm speculating here)...waits for the DC to time out for a reaaaaaly long time.  I'm unsure why Unraid would need to reach out to the DC for any of these operations?  Despite the hang time with the only DC being an Unraid VM, once the timeout values expire, and you get the array up and the DC VM running, things run without issue.  Despite under Settings-SMB-Active Directory Settings showing as not domain joined, users are indeed authenticated against AD with only the 1 VM DC running.  Once you power off (I think reboot too...will have to test) the DC VM, you're back into the 15 min of hang time.  Same occurs if trying to start or stop the array, start the DC VM, or load the "VM Manager" settings window, or the VMS page.

 

Perhaps this could be addressed in a future version, adding a user adjustable timeout value for AD requests, and looking into why Unraid needs the DC to start/stop the array, start the VM, or load those 2 pages.

 

So it appears my options are:

  1. Run a secondary DC outside of Unraid (need to ensure that DNS server 1 is pointed to the external DC)
  2. Manage a second set of credentials in Unraid for all DC users
  3. Run only one DC in a VM in Unraid, and deal with 15 min freeze ups anytime I need to bring down the DC VM
  4. Look for another platform to run VMs and zfs pools.

Thoughts?

 

Thanks again @JorgeB,

Cal.

 

Edited by calvados
Link to comment

That's some good findings, I don't really have any experience with AD, and there aren't many Unraid users using it, but there have been other issues before, I would suggest you create a bug report with all the great info from the last post to at least bring it to LT's attention, maybe then can at least change the timeout or possibly came up with a better solution.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.