unRAID Server Release 6.0-beta5a-x86_64 Available


Recommended Posts

I was just trying to pin cores for performance reasons. But pinning cores crashes my server but helps others who crash with default. I can run stable with default syslinux.cfg.  Right now I just have 4GB allocated to dom0.

 

running pinned caused you issues because you hadn't excluded the pinned cores from your domU's, this in turn makes the cpu starvation issue more prevalent, as your then restricting dom0 to only 1-2 cores and at the same time still using those cores in your vm's.

It still crashes.  Last crash I was running 2 cores pinned and 4-7 cpus in the vm. I don't have stability issues from default cfg so I'm staying with that. FWIW It's an 8 core Avoton.

 

 

I was just trying to pin cores for performance reasons. But pinning cores crashes my server but helps others who crash with default. I can run stable with default syslinux.cfg.  Right now I just have 4GB allocated to dom0.

 

running pinned caused you issues because you hadn't excluded the pinned cores from your domU's, this in turn makes the cpu starvation issue more prevalent, as your then restricting dom0 to only 1-2 cores and at the same time still using those cores in your vm's.

Link to comment
  • Replies 148
  • Created
  • Last Reply

Top Posters In This Topic

With hyper threading, not all cores are equal. 1 HT core =1/2 a real core. 2 HT cores could either equal 1 real core or 2 half cores. Does this affect the pinning strategy?

It could be for me that more than 2 Avoton cores are needed.  There's no HT just 8 real cores.

Link to comment

With hyper threading, not all cores are equal. 1 HT core =1/2 a real core. 2 HT cores could either equal 1 real core or 2 half cores. Does this affect the pinning strategy?

possibly  8) i would basically give it a couple of cores and see how cpu usage is for dom0 when running something heavy, say a parity check, if its looking acceptable then youve probably hit your sweet spot, for me 2 cores worked out ok, 1 core wasn't enough (no HT).

Link to comment

With hyper threading, not all cores are equal. 1 HT core =1/2 a real core. 2 HT cores could either equal 1 real core or 2 half cores. Does this affect the pinning strategy?

 

Great question and this is a topic for which I need to do a write up and a little extra research.  bjp999 is 100% correct in how HT works.  Essentially HT "tricks" your system into thinking is has twice the cores it does, but each HT core comes with 1/2 the benefits of a non-HT core.  There are countless articles about the benefits that hyperthreading provides and the trade-offs that come with it.  Generally speaking, it is best to have hyperthreading turned on.  There are probably a few small niche cases where hyperthreading can hurt application performance, but I would be shocked if anyone here fits into those circumstances.  The simple truth is that most processors nowadays come with so much performance per core, that by using this "feature" to slice the cores in half and present them back to the host as having twice as many cores just makes good common sense.

 

All that said, now let's talk about pinning cores to Dom0 vs. VMs.  In my personal testing, I have yet to forcibly assign cores to Dom0.  I do, however, assign Dom0 a minimum memory allocation to prevent my VMs from grabbing too much memory and guaranteeing that Dom0 always has what it needs to run properly.  I also have yet to experience a system crash or fault as a result of this.  NOTE:  THIS DOES NOT MEAN THAT YOU SHOULDN'T PIN A CPU TO DOM0, IT JUST MEANS I HAVEN'T EXPERIENCED A CRASH FROM NOT DOING THAT...YET...

 

My biggest point here is that until we can do some further testing, my biggest suggestion to people who have had a dom0 crash in testing VMs would be to first edit their syslinux config on their flash device and change the Xen line to be as follows:

 

append /xen dom0_mem=4096M,max:4096M --- /bzimage --- /bzroot

 

While you can tweak that line and adjust the exact amount of memory, if you have issues, you should set it back to 4096M.

 

Host configuration and requirements is a critical element in our planning and development of unRAID 6.  More information will come out over time as we continue to develop and learn.  Thanks everyone!

Link to comment

One more data point on the domU crash issue that has been reported.  I'm able to recreate this by bumping up the disk configuration values (md_num_stripes, md_write_limit, and md_sync_window) and then kicking off a parity check.  When I go back to the default values (I used the tunables script to determine optimum values originally)  So I think that points to a memory issue, but then again I'm no expert. 

 

Also, as an FYI, I tried the default syslinux, locking in 4GB of RAM for domU, and pinning a CPU, none of which made a difference.  In each case I would start seeing the Vector not implemented/CPU Stall messages within a couple minutes of kicking off a parity check.

Link to comment

Hello.

 

I'm running version 5.0.5 now with no problems. Fairly old hardware with 4GB of RAM. Half my drives are on the mainboard and the others are on Supermicro add on card. I have a total of 31TB. Will upgrading to this newer version benefit me in anyway? I'm guessing no, but figured I'd ask. Since I don't have as much time as I use to, I use to read all the new posts on the forum and keep up with things but life got in the way. But I do come back to update and keep things current.

 

 

 

Thanks.

 

 

Link to comment

... but each HT core comes with 1/2 the benefits of a non-HT core ...

 

Just thought I'd quibble on this a bit. If one of the 2 "half cores" is idle, than the other core can run full speed. But if both are equally busy they each only get 1/2 the speed (or a little less).

 

So there may actually be benefit of letting unRAID have only 2 half cores. So if unRAID is relatively idle and the VM is processing hard, the VM would get the other half of the unRAID's two half cores and use all the horsepower the chip has to offer. So long as unRAID isn't starved, which I don't think it should be, as when it starts to use cycles it would get its fair share from the hyperthreader.  But if unRAID got both halves of the same core, it would be wasted if unRAID was idle with no way for the VM to use it.

Link to comment

One more data point on the domU crash issue that has been reported.  I'm able to recreate this by bumping up the disk configuration values (md_num_stripes, md_write_limit, and md_sync_window) and then kicking off a parity check.  When I go back to the default values (I used the tunables script to determine optimum values originally)  So I think that points to a memory issue, but then again I'm no expert. 

 

Also, as an FYI, I tried the default syslinux, locking in 4GB of RAM for domU, and pinning a CPU, none of which made a difference.  In each case I would start seeing the Vector not implemented/CPU Stall messages within a couple minutes of kicking off a parity check.

 

Have you tried excluding your pinned dom0 cores from your domU's? works for me.

Link to comment

One more data point on the domU crash issue that has been reported.  I'm able to recreate this by bumping up the disk configuration values (md_num_stripes, md_write_limit, and md_sync_window) and then kicking off a parity check.  When I go back to the default values (I used the tunables script to determine optimum values originally)  So I think that points to a memory issue, but then again I'm no expert. 

 

Also, as an FYI, I tried the default syslinux, locking in 4GB of RAM for domU, and pinning a CPU, none of which made a difference.  In each case I would start seeing the Vector not implemented/CPU Stall messages within a couple minutes of kicking off a parity check.

 

Have you tried excluding your pinned dom0 cores from your domU's? works for me.

 

Yes, I excluded the pinned dom0 cores from the domU's, and I still had the same issue during a parity check.  Once I lowered the tunables, even without pinning cores, or locking it to 4 GB RAM for dom0 the error went away.  I'll have to see if I get the error at any other times (I had a couple of times where the system crashed without doing a parity check, but they were every couple of weeks, so only time will tell about those), but it appears that this resolved the issue during parity checks.

Link to comment

Yes, I excluded the pinned dom0 cores from the domU's, and I still had the same issue during a parity check.  Once I lowered the tunables, even without pinning cores, or locking it to 4 GB RAM for dom0 the error went away.  I'll have to see if I get the error at any other times (I had a couple of times where the system crashed without doing a parity check, but they were every couple of weeks, so only time will tell about those), but it appears that this resolved the issue during parity checks.

 

hmm interesting!, i too have run the script tunables-tester and modified my tunable values, i wonder if that is linked to the issue then, i may well set mine back to stock values and unpin and see what happens.

Link to comment

Hello.

 

I'm running version 5.0.5 now with no problems. Fairly old hardware with 4GB of RAM. Half my drives are on the mainboard and the others are on Supermicro add on card. I have a total of 31TB. Will upgrading to this newer version benefit me in anyway? I'm guessing no, but figured I'd ask. Since I don't have as much time as I use to, I use to read all the new posts on the forum and keep up with things but life got in the way. But I do come back to update and keep things current.

 

I do think there are some great things in 6.0 for folks that like to do things with their server besides using as a file server. If you are frustrated by difficult installations and upgrading of addon packages 6.0 is going to be a huge step in the right direction.

 

That being said, it is still in a fair amount of transition at this point and I might suggest returning in a month or two after beta7 and beyond come out. A major retooling seems to be in the works.

Link to comment
  • 2 weeks later...

one annoyance (no more than that to me but it will annoy people).

 

I'm adding a disk to extend the array and as the new disk is being cleared the array isn't up. Because the array isn't up the virtual machines aren't up. Which I can understand but its hard to explain to the wife and others why they can't use Plex at the moment...

 

If this is the proposed long term rule then you probably need to separate out clearing the disks from extending the array to speed things up a bit.

Link to comment

one annoyance (no more than that to me but it will annoy people).

 

I'm adding a disk to extend the array and as the new disk is being cleared the array isn't up. Because the array isn't up the virtual machines aren't up. Which I can understand but its hard to explain to the wife and others why they can't use Plex at the moment...

 

If this is the proposed long term rule then you probably need to separate out clearing the disks from extending the array to speed things up a bit.

 

http://lime-technology.com/forum/index.php?topic=2817.0

 

Link to comment

one annoyance (no more than that to me but it will annoy people).

 

I'm adding a disk to extend the array and as the new disk is being cleared the array isn't up. Because the array isn't up the virtual machines aren't up. Which I can understand but its hard to explain to the wife and others why they can't use Plex at the moment...

 

If this is the proposed long term rule then you probably need to separate out clearing the disks from extending the array to speed things up a bit.

 

+1

Link to comment

Yes it's a solution to the problem but only for 1% of the target audience. And even this member of that 1% forgot about it as it's years since I've added a disk to the array.

 

Virtually everyone that uses unRAID preclears disks with this script. You only have to know the drive's device name (e.g., /dev/sdz).

 

The script just has to be copied to the root of your flash, (access from your workstation with "//Tower/flash"). Takes 15 seconds to dl it.

 

It is advisable but not required to run the script from a "screen" prompt, but if not can also be run from the server console. You do not want to run it from a putty or other telnet type terminal session because when / if that session closes, the preclear would stop.

Link to comment

 

Yes it's a solution to the problem but only for 1% of the target audience. And even this member of that 1% forgot about it as it's years since I've added a disk to the array.

 

Virtually everyone that uses unRAID preclears disks with this script. You only have to know the drive's device name (e.g., /dev/sdz).

 

The script just has to be copied to the root of your flash, (access from your workstation with "//Tower/flash"). Takes 15 seconds to dl it.

 

It is advisable but not required to run the script from a "screen" prompt, but if not can also be run from the server console. You do not want to run it from a putty or other telnet type terminal session because when / if that session closes, the preclear would stop.

 

You seem to have this idea that I need instructions to run a script. Do you think I would be running a beta with multiple vms having moved from an esxi system if I wasn't comfortable in reading instructions and thinking for myself.

 

My point was a general one. I wasn't asking for advice but reporting a usability issue

 

 

Link to comment

 

Yes it's a solution to the problem but only for 1% of the target audience. And even this member of that 1% forgot about it as it's years since I've added a disk to the array.

 

Virtually everyone that uses unRAID preclears disks with this script. You only have to know the drive's device name (e.g., /dev/sdz).

 

The script just has to be copied to the root of your flash, (access from your workstation with "//Tower/flash"). Takes 15 seconds to dl it.

 

It is advisable but not required to run the script from a "screen" prompt, but if not can also be run from the server console. You do not want to run it from a putty or other telnet type terminal session because when / if that session closes, the preclear would stop.

 

You seem to have this idea that I need instructions to run a script. Do you think I would be running a beta with multiple vms having moved from an esxi system if I wasn't comfortable in reading instructions and thinking for myself.

 

My point was a general one. I wasn't asking for advice but reporting a usability issue

 

I do not know you or your skill level. I just saw a user with 29 posts and thought you were a newbie asking a question.

 

No insult was intended and sorry if I touched a nerve.

 

I would ask that you be respectful in your posts in the form.

 

The preclear script is a good alternative to the default clearing feature built into the product. Its use is up to you. You can send a note to Limetech expressing your dissatisfaction with the product feature if you would like.

 

Have a good day.

Link to comment

 

The preclear script is a good alternative to the default clearing feature built into the product. Its use is up to you. You can send a note to Limetech expressing your dissatisfaction with the product feature if you would like.

 

Have a good day.

 

Sorry when I replied I was on my phone and assumed it was FJV posting again.

 

I wasn't trying to attack anyone. My comment was that the system is unfriendly for most users if adding a disk from the front end takes everything down for hours. And the idea of using the console and a script to preclear a disk just isn't a solution if you are targeting the general public, where if its not on the web front end it doesn't exist.

 

As for sending a note to Limetech, I'm not dissatisfied. I'm running beta software, and merely wished to highlight the only issue I've so far discovered. The fact there is a workaround is great but if the desire of Limetech is to use VMs to provide functionality and expanding the array takes those VMs off for 6+ hours, then its worth highlighting the issue now while things can still be changed.

Link to comment

^^^ This ... consider it part of the larger discussion about some of the "core" features that need to be included in unraid 6.0 Final (or 7, whatever). 

 

Even if it isn't the full pre-clear script, the functionality of clearing the drive that occurs right now should occur independent of the array and shouldn't take down VMs.  Then the new drive gets added once complete.  If it HSA to take down the array and VMs for a minute at that point that isn't so bad, though not "ideal". 

 

Anyone that wants to do multi-pass, have access to the smart reports, etc can still intall the script.  Though there is no reason that can't be made into a part of the pre-clear gui as well.

Link to comment

I've just run into an issue where I lost access to the WebGUI. I rebooted from telnet, several times, and even rebooted it at the machine, but I still can't access the GUI.

I can still access the server via Telnet, and I can see the flash share over the network.

Does anyone know what's happened? Is there a log file somewhere that I should post?

 

EDIT: Access is back now, but only at //tower. For some reason the IP won't work. Yet Plex etc are available at IP:Port.

Link to comment

I've just run into an issue where I lost access to the WebGUI. I rebooted from telnet, several times, and even rebooted it at the machine, but I still can't access the GUI.

I can still access the server via Telnet, and I can see the flash share over the network.

Does anyone know what's happened? Is there a log file somewhere that I should post?

 

EDIT: Access is back now, but only at //tower. For some reason the IP won't work. Yet Plex etc are available at IP:Port.

If you can telnet in but not access the web gui after a crash, it's likely processing the transaction logs(?) on each drive. Do a tail on the syslog to see what's happening.

Link to comment
  • 1 month later...

Ran into a weird issue with this beta (still on this beta since I have Xen VM's that won't work in beta 6).  Server has been stable for past couple of months, then a couple days ago (when the server was idle) I lost all of my shares with a log message of "transport endpoint is not connected".  After a reboot the shares came back, but now when I try to stream any movies to my PCH over NFS, I get either an error that it can't find the server, or stuttering/lock up in the video.  Everything else seems to be fine, including the VM's.  I can access the drives from Windows at normal speeds, and I can stream video through Plex on my Ubuntu VM, so the problem appears to be NFS related.  I get the same response with 2 PCH's, and a WDTV, so I believe it is the server.  I have tried rebooting, and that doesn't seem to fix anything (including rebooting after removing the plugins in use, which are unmenu/APC/powerdown/cache_dirs).  I have 3 VM's running (Windows XP, Arch, and Ubuntu Server).  I don't see anything in the logs that would indicate an error (the only line I see is an authenticated mount request from the PCH's IP), and a message about my flash drive needing to have fsck run, which is I think from the previous crash, but which hasn't appeared to cause any issues.  I'm attaching the logs from the run with everything running in case someone else can see something there.  Any suggestions? 

syslog-20140808-104440.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.