Jump to content
limetech

unRAID Server Release 6.2.0-beta21 Available

546 posts in this topic Last Reply

Recommended Posts

 

Many of us feel that running two or three preclear cycles will get the drive past the 'infant mortality' portion of the bathtub curve (google for further discussion).  Uncovering an early HD failure before putting that drive into an array is much less stressful than finding a compromised array in the first week after introducing a new drive into the mix. 

 

PS--- I could tell a story about how the concept of infant mortality came into general knowledge to the military during WWII but that would be completely out of topic...

 

I agree that there is value in stress testing the drive and checking to make sure nothing is failing after the first few writes.

 

That said, maybe this signals that a new plugin needs to be made that removes the clearing portion of the plugin and instead focuses entirely on stress testing. Leave the clearing entirely to the OS since that's not an issue anymore.

 

This should allow more cycles of stress testing without having to have that long post read cycle (that verifies the dive is zeroed) meaning you can do more cycles faster... I think.

 

No. You want to verify the writes and reads produced what you expect, so even if it's not writing zeroes you will always have a post-write read confirmation phase.

 

The existing preclear plugin (rewritten to not use JoeL's script) still suits this purpose perfectly.

Share this post


Link to post

 

I think you are missing a part of the  equation.  It is not only the stress introduced by the testing, the elapsed time is an integral part of the entire process.

 

To be clear I am not suggesting that you can stress test a drive in like 5 min. What I am suggesting is that you can do way more in-depth and better testing by cutting out those phases and instead spend that time running better tests... tests that were actually designed with stress testing in mind... (Zeroing and post Zeroing reads were designed to avoid taking the array down for a long time, not designed to really stress test a disk... while that might be a side effect, I'm really suggesting when you don't have to worry about that part anymore, you can focus on designing better tests.)

 

Final Edit:

 

Joe's Pre-Clear script was designed to solve the problem of having to take your array down for a long time during the clearing process for new disks. It's being used for that purpose, but also being used for a purpose it wasn't originally designed for which is to stress test disks. The only point I'm making is now that the original problem isn't there anymore, we should perhaps look at designing a script that sets out to solve the stress-test problem, instead of using a script that can be used for that, but wasn't original designed with that in mind. That's all.

Share this post


Link to post
Joe's Pre-Clear script was designed to solve the problem of having to take your array down for a long time during the clearing process for new disks.
There is still a place for the preclear routine, that is having a disk ready to add, format, and start using immediately. The array may no longer be unavailable during the adding process, but the new disk isn't available until it's cleared and formatted. Being able to take a drive off the shelf, add it to a parity protected array and start putting data on it within minutes is still a valid use case.

Share this post


Link to post

I just wanted to boot into safe mode, so that i could test bare bones and to my surprise my dockers started up. I remember this being discussed many moons ago, but surely it would be a good idea for safe mode to include dockers & VMs, as well as the current plugins?

Share this post


Link to post

What is the procedure to revert to 6.1.9?

 

The correct way should always be to restore your 6.1.9 flash backup and reboot (assuming you haven't made significant changes to the array assignments).

 

The bare bones way is to copy the bzroot and bzimage from the desired distro to the flash drive and reboot.

 

You can also check for a 'previous' folder on the flash, and restore it.

Share this post


Link to post

Quick tangential question: Over in the Preclear Plugin conversation someone made the comment that the primary purpose of the script is largely moot now, since the 6.2 beta can zero new drives in the background while the array is active. I thought I was keeping pretty close tabs on the beta, and usually comb through release notes with each new announcement... but this was news to me.

 

Is this a confirmed feature of the 6.2 beta now? A quick search of the forum didn't turn anything up, but is there a discussion somewhere with more details? I don't have a spare drive to add to my test rig or I'd just try it for myself... ;)

 

-A

 

This was talked about in one of the previous beta's as a discovery by a user (JohnnieBlack? maybe).

Anyhow, from what I recall this is a big deal as the array is available while clearing (as this was not the case previously), however it only clears the drive.

Preclear (by default) also does a post read after the clear to verify the SMART parameters have not changed, which is a good indication of a problematic drive.

(someone else will likely have more/better things to add to this clarification)

 

The discussion starts with johnnie.black's discovery here, and ends with Tom's comments here.

Share this post


Link to post

I've noticed that after a few minutes my unRAID box will become very slow to respond and the processor is pegged at 100%.  It takes a restart (power off with the power button) to get it to work again.  I've tried it with various combinations of dockers and VMs running, but can't find a docker or VM to blame it on.  Diags attached.  Thanks for the help.

unraid-diagnostics-20160408-0903.zip

Share this post


Link to post

(Zeroing and post Zeroing reads were designed to avoid taking the array down for a long time, not designed to really stress test a disk... while that might be a side effect, I'm really suggesting when you don't have to worry about that part anymore, you can focus on designing better tests.)

 

I completely disagree with your assessment. Everyone needs to be sure that what is written can be read as expected on the drives. How else would you do that without explicit post-read phases?

Share this post


Link to post

Thanks

Does the Samba release 4.4.0 fix the error in windows 10 not mounting iso files, and we no longer need to add max protocol = SMB2_02

Yes, mounting iso files in Windows 10 should work now without overriding the max protocol value.

I can confirm that, at least for me it does.

Had to use SMB2_02 in beta20, removed it and upgraded to beta 21, can mount VHDs and ISOs.

 

Lets hope the samba guys didn't break anything important in that version :)

Share this post


Link to post

(Zeroing and post Zeroing reads were designed to avoid taking the array down for a long time, not designed to really stress test a disk... while that might be a side effect, I'm really suggesting when you don't have to worry about that part anymore, you can focus on designing better tests.)

 

I completely disagree with your assessment. Everyone needs to be sure that what is written can be read as expected on the drives. How else would you do that without explicit post-read phases?

 

I don't disagree that "Everyone needs to be sure that what is written can be read as expected on the drives" what I disagree with is that the Pre-clear script is the best way to verify that. There are lots of disk stress and testing tools that exist with the express purpose of verifying the disk surface, checking for mechanical errors, etc.

 

So yes you should likely to an end to end test, but it might be better to do a test that can tell you if you've got slow sectors or find any of the other issues that the pre-clear script doesn't look for... because it was primarily made to clear drives without taking the array down.

Share this post


Link to post

I've noticed that after a few minutes my unRAID box will become very slow to respond and the processor is pegged at 100%.  It takes a restart (power off with the power button) to get it to work again.  I've tried it with various combinations of dockers and VMs running, but can't find a docker or VM to blame it on.  Diags attached.  Thanks for the help.

 

Well, I deleted all my plugins (shutdown, preclear, File Integrity, the master browser one), changed the VMs to i440fx-2.5 and reduced the number of cores they have access to, forced my dockers to update, and now it seems to be working.  Running 1 W10 VM, Crashplan docker, and Plex docker.

 

Share this post


Link to post

Been on 6.2 Beta 21 for about 24 hours, ran full parity check, 0 errors.  Updated dockers one by one, really nothing to report.  Everything on the surface is working just as it did under 6.1.9, have not reviewed any logs looking for problems.  All is well.

 

Great job on 6.2.

 

 

What did you have to do to each Docker after the upgrade ?

 

Are you using Plugin's ?  Dockers ?  VM's ?

 

 

I just upgraded from 6.1.9 to 6.2 beta 21 with no trouble at all

 

I use most of the dynamix plugins, 2 VM and 12 Dockers. Biggest pain in the *** was having to update each docker due to the docker API update

 

Nothing other than updating them

Lol, I think he was expecting more given the "biggest pain in the ass" comment.

Share this post


Link to post

 

 

PS--- I could tell a story about how the concept of infant mortality came into general knowledge to the military during WWII but that would be completely out of topic...

 

Frank, I'm interested in hearing this.  What about a post in "The Lounge"?

 

Love hearing about historical things....

 

 

 

Sent from my LG-H815 using Tapatalk

 

 

Share this post


Link to post

Thanks

Does the Samba release 4.4.0 fix the error in windows 10 not mounting iso files, and we no longer need to add max protocol = SMB2_02

Yes, mounting iso files in Windows 10 should work now without overriding the max protocol value.

I can confirm that, at least for me it does.

Had to use SMB2_02 in beta20, removed it and upgraded to beta 21, can mount VHDs and ISOs.

 

Lets hope the samba guys didn't break anything important in that version :)

 

great!  that one fix is enough for me to try the beta the weekend

Share this post


Link to post

Thanks

Does the Samba release 4.4.0 fix the error in windows 10 not mounting iso files, and we no longer need to add max protocol = SMB2_02

Yes, mounting iso files in Windows 10 should work now without overriding the max protocol value.

I can confirm that, at least for me it does.

Had to use SMB2_02 in beta20, removed it and upgraded to beta 21, can mount VHDs and ISOs.

 

Lets hope the samba guys didn't break anything important in that version :)

 

great!  that one fix is enough for me to try the beta the weekend

NetBIOS name resolution was also broken in samba 4.4.0 but Tom patched that and sent the code upstream as well.

 

Sent from my Nexus 6 using Tapatalk

 

 

Share this post


Link to post

Does 6.2 fix the dpc high latency/sound stuttering in VMs if the VM use a cpu core which also unraid uses?

Isolcpus fix it for me but i only have an i5 (6600K) without hyperthreading (4 cores) and the game far cry: primal needs 4 cores to even start.

 

Sry for my english. I am not a native speaker...

Share this post


Link to post

 

Many of us feel that running two or three preclear cycles will get the drive past the 'infant mortality' portion of the bathtub curve (google for further discussion).  Uncovering an early HD failure before putting that drive into an array is much less stressful than finding a compromised array in the first week after introducing a new drive into the mix. 

 

PS--- I could tell a story about how the concept of infant mortality came into general knowledge to the military during WWII but that would be completely out of topic...

 

I agree that there is value in stress testing the drive and checking to make sure nothing is failing after the first few writes.

 

That said, maybe this signals that a new plugin needs to be made that removes the clearing portion of the plugin and instead focuses entirely on stress testing. Leave the clearing entirely to the OS since that's not an issue anymore.

 

This should allow more cycles of stress testing without having to have that long post read cycle (that verifies the dive is zeroed) meaning you can do more cycles faster... I think.

 

I think you are missing a part of the  equation.  It is not only the stress introduced by the testing, the elapsed time is an integral part of the entire process.

You are both missing an important part of the equation.

 

Un-readable sectors are ONLY marked as un-readable when they are read.  Therefore, unRAID's writing of zeros to the disk does absolutely nothing to ensure all the sectors on the disk can be read.  (Brand new disks have no sectors marked as un-readable)

 

Sectors marked as un-readable are ONLY re-allocated when they are subsequently written to.    It is the reason the preclear process I wrote first reads the entire disk and then writes zeros to it. (It allows it to identify un-readable sectors, and fix them where possible)

 

The entire reason for the post-read phase is because quite a number of disks failed when subsequently read after being written.

 

If you rely on unRAID to write zeros to the disk and then put it into service, the first time you'll learn of an un-readable sector error is when you go to read the disk after you've put your data on it. (or during a subsequent parity check)

 

The new feature in this release of unRAID will help some to avoid a lengthy un-anticipated outage of their server if they had not pre-cleared a disk, and for that it is a great improvement.  This improvement in unRAID 6.2 does not however test the disks's reliability in any way, nor identify un-readable sectors (since it only writes them, and does not read them at all)

 

Additional discussion about the difference between the unRAID 6.2 initial zeroing of drives replacing the preclear process should continue in another thread... and not clutter up this thread in the announcement forum.

 

Joe L.

 

Share this post


Link to post

I reported an issue with NFS nounts in 6.2 beta20.  I am back to report it also in beta 21.  I mount a remote SMB share (on another computer) on the local unraid using UD and try to share it via NFS on the unraid server and I get the following errors in the log:

 

Apr 8 19:14:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:15:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:16:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:17:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:18:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:19:02 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:19:28 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export
Apr 8 19:20:01 Tower root: exportfs: /mnt/disks/HANDYMANSERVER_Backups does not support NFS export

 

It does export via NFS though.

 

/etc/exports file:

# See exports(5) for a description.
# This file contains a list of all directories exported to other computers.
# It is used by rpc.nfsd and rpc.mountd.

"/mnt/disks/HANDYMANSERVER_Backups" -async,no_subtree_check,fsid=200 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/Computer Backups" -async,no_subtree_check,fsid=103 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/Public" -async,no_subtree_check,fsid=100 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)
"/mnt/user/iTunes" -async,no_subtree_check,fsid=101 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)

 

The line I add to the /etc/exports file is:

"/mnt/disks/HANDYMANSERVER_Backups" -async,no_subtree_check,fsid=200 *(sec=sys,rw,insecure,anongid=100,anonuid=99,all_squash)

 

My code reads the /etc/exports file into an array, I add my line to the array, and then write the array back to the /etc/exports file.  It should show up at the end of the file, not in the middle.  It appears that something is altering the /etc/exports file in the background causing me to get parts of the file at times.

 

Please note that the mount point is /mnt/disks/, not /mnt/user/.

 

When I mount an iso file with UD and share it via NFS, I do not see the errors in the log.

 

This did not show up in the early 6.2 beta because NFS was not working, but has shown up in all subsequent beta versions.

 

Diagnostics attached.

tower-diagnostics-20160408-1907.zip

Share this post


Link to post

Mover no longer moving.

 

I had a bit of a hiccup with my server, I went to stop a docker and the gui became unresponsive, it so happened I was copying data to the array at the same time, data that was first going to my cache drive. When I stopped the data copying the gui became responsive again and  I was able to stop and restart the docker. I then recommenced the data copying, however the log does not reflect that the mover is actually doing anything, yet in the gui it says its moving and is greyed out. My issue is I am copying just over 1TB of data and risk filling up my cache drive if the mover is not working, of course I can stop the copy job, but I'd like to fix the mover if possible. Diags attached.

 

Oddly enough, as soon as I stopped the data copying the mover came to life, weird. I guess anyone can look at my diags anyway.

tower-diagnostics-20160408-2221.zip

Share this post


Link to post

Out of curiosity, has anyone upgraded to Beta21 from 6.1.9 ?  Just curious how the process went for you, trying to get a feel if some of the subsequent betas have smoothed this over or not.

 

Thanks.

 

i actually just did about 2hrs ago on my back-up server. the install/ update went generally smooth. nearly everything is working... my docker apps seem to be all disappeared (wasn't much to start with, just madsonic, calibre and plex. what i get instead of the apps is a grayed-out pic saying 'orphaned image'. did not yet really figured out the solution. updating (as said in thread) is not an option - just not happen, new install to old directory ends with failure. will work on that later or roll back to 6.1.9... we will see. all else seems to work fine. what interested me most was the 2nd parity disk option - tossed in a drive i had sitting around, which fits the specs and got that process going. it is going, but i seem to have a huge performance loss for the parity check speed, while under 6.1.9 it was bumbling along at around 120-125MBs for the last 2hrs it is snailing even more at about 80-85MBs. i thought there where supposedly nice improvements. we will see - as it is, i am looking at another (gui estimated) 9hrs to see if it suddenly 'turbos up'.

can update that later, if you are interested...

ohh, btw... thats what i get on the dashboard screen under parity status:

Last checked on Friday, 08-04-2016, 23:25 (today), finding 0 errors.

Duration: unavailable (no parity-check entries logged)

 

seems not exactly rite - at the moment i just let it run and see 2morrow pm what happen ;)

 

cheers, L

 

ps: the verification system s*cks majorly!!! i am not into movies at all! wonder how ppl would feel if they suddenly have to figure out what the 2nd official live album of Sham 69 was??? i am sure that could be solved in a more common way!

Share this post


Link to post

Bug: Stubbed USB 3.0 Controller only working on first VM boot

 

I have noticed this behavior quite early on passing through my PCIe USB 3.0 Controller. Now I have pinpointed the situation where things break.

 

I have currently stubbed 2 PCIe devices (excluding my GPU) to Windows VMs. A minimalistic Windows 7 VM is using GoodSync to automate my Backups/Cloud Uploads and is being passed through a SATA controller for automatic ripping purposes (Blurays, etc.).

 

This is working fine.

 

My Main VM is running Windows 10 x64 and I am passing through my PCIe USB 3.0 controller to achieve hot-plugging.

On my first (and only the first) time booting the Windows 10 VM the USB controller works fine. The attached screenshot shows my C: Drive (50G vdisk) and one USB 2.0 flash drive (e:) and my external 3.0 drive (d:) working.

 

On a second boot the vm becomes stuttery and is not passing through the devices anymore. It does not matter whether they stay attached to the ports during the VM shutdown or not. I presume the stubbed controller is not detached from the VM correctly, if that makes sense...

 

Only when disconnecting the devices does the vm run normal again (without the controller working, however) on its next boot.

 

To get this controller working again I have to reboot my entire server, which is a major annoyance.

 

Please find attached my diagnostics file (I downloaded inside my Win10 VM after a working boot, stuttery reboot and then booting without the controller working/having devices attached).

passed_through_de_drives.JPG.50c1e9ea334238e89bc505f91ef6bce1.JPG

ninja-diagnostics-20160409-1053.zip

Share this post


Link to post

Bug: Stubbed USB 3.0 Controller only working on first VM boot

 

I have noticed this behavior quite early on passing through my PCIe USB 3.0 Controller. Now I have pinpointed the situation where things break.

 

I have currently stubbed 2 PCIe devices (excluding my GPU) to Windows VMs. A minimalistic Windows 7 VM is using GoodSync to automate my Backups/Cloud Uploads and is being passed through a SATA controller for automatic ripping purposes (Blurays, etc.).

 

This is working fine.

 

My Main VM is running Windows 10 x64 and I am passing through my PCIe USB 3.0 controller to achieve hot-plugging.

On my first (and only the first) time booting the Windows 10 VM the USB controller works fine. The attached screenshot shows my C: Drive (50G vdisk) and one USB 2.0 flash drive (e:) and my external 3.0 drive (d:) working.

 

On a second boot the vm becomes stuttery and is not passing through the devices anymore. It does not matter whether they stay attached to the ports during the VM shutdown or not. I presume the stubbed controller is not detached from the VM correctly, if that makes sense...

 

Only when disconnecting the devices does the vm run normal again (without the controller working, however) on its next boot.

 

To get this controller working again I have to reboot my entire server, which is a major annoyance.

 

Please find attached my diagnostics file (I downloaded inside my Win10 VM after a working boot, stuttery reboot and then booting without the controller working/having devices attached).

This might not be related to unraid at all. It might be that the USB card doesn't support resetting. As a workaround you could try to safely remove it in the VM before you shut it down. You do this the same way as with a USB stick.

Share this post


Link to post

Bug: Stubbed USB 3.0 Controller only working on first VM boot

 

I have noticed this behavior quite early on passing through my PCIe USB 3.0 Controller. Now I have pinpointed the situation where things break.

 

I have currently stubbed 2 PCIe devices (excluding my GPU) to Windows VMs. A minimalistic Windows 7 VM is using GoodSync to automate my Backups/Cloud Uploads and is being passed through a SATA controller for automatic ripping purposes (Blurays, etc.).

 

This is working fine.

 

My Main VM is running Windows 10 x64 and I am passing through my PCIe USB 3.0 controller to achieve hot-plugging.

On my first (and only the first) time booting the Windows 10 VM the USB controller works fine. The attached screenshot shows my C: Drive (50G vdisk) and one USB 2.0 flash drive (e:) and my external 3.0 drive (d:) working.

 

On a second boot the vm becomes stuttery and is not passing through the devices anymore. It does not matter whether they stay attached to the ports during the VM shutdown or not. I presume the stubbed controller is not detached from the VM correctly, if that makes sense...

 

Only when disconnecting the devices does the vm run normal again (without the controller working, however) on its next boot.

 

To get this controller working again I have to reboot my entire server, which is a major annoyance.

 

Please find attached my diagnostics file (I downloaded inside my Win10 VM after a working boot, stuttery reboot and then booting without the controller working/having devices attached).

This might not be related to unraid at all. It might be that the USB card doesn't support resetting. As a workaround you could try to safely remove it in the VM before you shut it down. You do this the same way as with a USB stick.

 

That actually makes sense. I'll report back right away!

 

Edit: my unraid server just crashed.

 

After safely disconnecting the USB controller inside the I was able to reboot without the stuttering issue. The controller wouldn't come back after that, however...

 

Unraid crashed when I then shut down the vm and tried to boot it (not restarting it).

 

Thanks for pushing me in the right direction.. if lime tech can't do anything about this, do you know a place to get a replacement USB controller?

Share this post


Link to post

Out of curiosity, has anyone upgraded to Beta21 from 6.1.9 ?  Just curious how the process went for you, trying to get a feel if some of the subsequent betas have smoothed this over or not.

 

Thanks.

 

i actually just did about 2hrs ago on my back-up server. the install/ update went generally smooth. nearly everything is working... my docker apps seem to be all disappeared (wasn't much to start with, just madsonic, calibre and plex. what i get instead of the apps is a grayed-out pic saying 'orphaned image'. did not yet really figured out the solution. updating (as said in thread) is not an option - just not happen, new install to old directory ends with failure. will work on that later or roll back to 6.1.9... we will see. all else seems to work fine. what interested me most was the 2nd parity disk option - tossed in a drive i had sitting around, which fits the specs and got that process going. it is going, but i seem to have a huge performance loss for the parity check speed, while under 6.1.9 it was bumbling along at around 120-125MBs for the last 2hrs it is snailing even more at about 80-85MBs. i thought there where supposedly nice improvements. we will see - as it is, i am looking at another (gui estimated) 9hrs to see if it suddenly 'turbos up'.

can update that later, if you are interested...

ohh, btw... thats what i get on the dashboard screen under parity status:

Last checked on Friday, 08-04-2016, 23:25 (today), finding 0 errors.

Duration: unavailable (no parity-check entries logged)

 

seems not exactly rite - at the moment i just let it run and see 2morrow pm what happen ;)

 

cheers, L

 

ps: the verification system s*cks majorly!!! i am not into movies at all! wonder how ppl would feel if they suddenly have to figure out what the 2nd official live album of Sham 69 was??? i am sure that could be solved in a more common way!

 

i quote here on myself - i couldn't remember the password for my old account, don't have the email account it was linked to anymore.... yadayadayada... anyway, after being able to look up my own profile i figured it out again and now i am back as me LOL.

i will get rid of the 'one post account'.

 

anyway, i like the new beta so far. still try to wrap my head around the docker issue.

otherwise, the parity check is roughly going as fast as molasses flows uphill on a reaaaaally cold winter day! it did not turbo up, but nearly halfed in speed since my last post - how can that be, considering no hardware changes etc. - just upgrading 6.1.9 to b6.2-21???

 

cheers, L

Share this post


Link to post
Guest
This topic is now closed to further replies.