unRAID Server Release 5.0-beta12 Available


Recommended Posts

Just a note, some tests require that you just have the unraid server and a client only connected to the switch or hub your using. I use jumbo frames and that to you have to look out for. The quality of the switch does matter. The backplanes/buffers, etc are not the same quality. You will notice on some switches you can only set the entire switch (all ports) to jumbo frames, while others are per port.

 

Check you latency, this shows up quite easily if you have high latency on a 0-1 hop network at home and is an indicater the switch or router is of poor quailty. You can get good throughput one minute and not on another. If another device on the switch starts or is in the process of a large download on the same switch you could see poor throughtput. If you have two switchs connect to a router and the unraid server resides on one switch and your client resides on another (or your daisychaining, etc...) check the latency, it could be your router.

 

I am not saying there is no issue between the betas. But physical nic/switch/router/nic driver combos can end up giving different performance.

 

As an example I am able to copy to the unRaid server and max out the gigabit connection to it. (Not via shares, to a cache drive or mounted drive).

Link to comment
  • Replies 154
  • Created
  • Last Reply

Top Posters In This Topic

*limetech*

 

I think this issue is because of patch I put into the file "drivers/ata/libata-scsi.c" to increase timeouts.  Normally this patch is put into all new kernel updates I do, except somehow it didn't make into version 3.0.3 which -beta12 uses... releasing a -beta13 soon.

 

 

I have 3 AOC-SASLP-MV8 cards, running 5.0 B12

 

Will this resolve the BLK_EH_NOT_HANDLED problem as I am currently unable to use this software (running a Pro version).

It will hang emhttp / unmenu within 20 mins after rebuilding my parity drive. The webapplication ain't accessible anymore. The SMB-share(s) stay alive. Stopping the rebuild of the array doesn't work. It keeps on trying to unmount the shares. Telnet or SSH reboots don't work.

Only a hard reset gets the system to reboot... but then the same scenario starts all over again. Add-ons are all disabled, except for unmenu.

 

Is Beta13 getting out soon or can I already patch this lib myself?

 

 

 

Link to comment

I'm a new user who discovered unraid just a week ago. I'm still reading since there is so much of it :)

Also going through my spare parts and seeing what I can use as an unraid server. Likewise I have to choose an unraid version to start with. Seeing the above regarding speed measurements, is it wise to start with b12 or should I do b11 instead?

Link to comment

I'm a new user who discovered unraid just a week ago. I'm still reading since there is so much of it :)

Also going through my spare parts and seeing what I can use as an unraid server. Likewise I have to choose an unraid version to start with. Seeing the above regarding speed measurements, is it wise to start with b12 or should I do b11 instead?

 

Always use latest betas.

Link to comment

My early Labor Day start gave me time to do some more beta testing on unRAID.  Here are some parity checks speeds for the various betas, kernels and drivers.  Tests were performed on the following system:

 

MSI 790FX-GD70

3 x BR10i Controllers (all in x8 slots)

20 x 2TB drives (mix of WD and Hitachi; all 5400rpm)

 

unRAID Linux Kernel mptsas Speed MB/s (after about 5 minutes)

version version

5b8d 2.6.37.6 3.04.17 100+

5b9   2.6.37.6 3.04.17 100+

5b10 2.6.39.3 3.04.18 45+

5b11 2.6.37.6 3.04.17 100+

5b12 3.0.3 3.04.19 70+

 

A quote from the 5b11 release notes:

 

""Also, this release uses the last really stable version (for unRaid) of the linux kernel: 2.6.37.6.  Something changed in the kernel starting with 2.6.38/39 which causes a throttling of I/O to hard drives as soon as you get beyond 6 hard drives accessing in parallel.  Eventually I'll figure this out.""

 

Looks like 3.0.3 is better than 2.6.39.3 but still noticeably below that of 2.6.37.6.  In reviewing this thread, those that reported sync speeds close to that of 2.6.37.6 have tested only 6-9 drives in their array.  Anyone else with large arrays seeing the same thing?

 

Regards,  Peter

 

You can experiment with the tunables on "Settings/Disk settings" page, in particular md_sync_window, which in essence, says how many 4K i/o's to queue down into the disk drivers when doing parity sync/check.  If you have alot of memory (say more than 1G), then you can also increase md_num_stripes (maybe double it) - this defines how many total active i/o's can exist, more-or-less (and directly affects how much memory is allocated to the unRaid driver).

Link to comment

*limetech*

 

I think this issue is because of patch I put into the file "drivers/ata/libata-scsi.c" to increase timeouts.  Normally this patch is put into all new kernel updates I do, except somehow it didn't make into version 3.0.3 which -beta12 uses... releasing a -beta13 soon.

 

 

I have 3 AOC-SASLP-MV8 cards, running 5.0 B12

 

Will this resolve the BLK_EH_NOT_HANDLED problem as I am currently unable to use this software (running a Pro version).

It will hang emhttp / unmenu within 20 mins after rebuilding my parity drive. The webapplication ain't accessible anymore. The SMB-share(s) stay alive. Stopping the rebuild of the array doesn't work. It keeps on trying to unmount the shares. Telnet or SSH reboots don't work.

Only a hard reset gets the system to reboot... but then the same scenario starts all over again. Add-ons are all disabled, except for unmenu.

 

Is Beta13 getting out soon or can I already patch this lib myself?

 

 

 

 

I'll be posting -beta13 today.  Will it solve the BLK_EH_NOT_HANDLED issue?  Don't know - the only time I see that is when there is bad h/w, in particular, bad PSU.

Link to comment

Always use latest betas.

Thank you. I thought I'd ask because of the much slower reported speeds. I'll go with b12 then.

 

I recommend starting with 4.7 the Production Stable version and like you stated alot of reading. Once you see and understand the product and what if any extras you want to run, then move on to beta's if required. The learning cure is easier, my 2 cents. But if you want to go straight into the 5.0 Beta's then as Tom stated, you should always be testing under the lastest he has available.

Link to comment

Always use latest betas.

Thank you. I thought I'd ask because of the much slower reported speeds. I'll go with b12 then.

 

I recommend starting with 4.7 the Production Stable version and like you stated alot of reading. Once you see and understand the product and what if any extras you want to run, then move on to beta's if required. The learning cure is easier, my 2 cents. But if you want to go straight into the 5.0 Beta's then as Tom stated, you should always be testing under the lastest he has available.

Thanks. Though I've already taken the plunge with b12. My current server has a RAID5 set with a drive failed. I searched the internet and by accident came across unraid. After some reading I knew unraid was for me. I already had hardware laying around to setup a new server and after some reading for what I am going to eventually do with unraid I see that 5.0 beta's is the way to go. Already preclearing 4x disks through screen right now. Keeping my thumbs up. No linux experience here, just following the configuration tutorial :)

Link to comment

*limetech*

 

I think this issue is because of patch I put into the file "drivers/ata/libata-scsi.c" to increase timeouts.  Normally this patch is put into all new kernel updates I do, except somehow it didn't make into version 3.0.3 which -beta12 uses... releasing a -beta13 soon.

 

 

I have 3 AOC-SASLP-MV8 cards, running 5.0 B12

 

Will this resolve the BLK_EH_NOT_HANDLED problem as I am currently unable to use this software (running a Pro version).

It will hang emhttp / unmenu within 20 mins after rebuilding my parity drive. The webapplication ain't accessible anymore. The SMB-share(s) stay alive. Stopping the rebuild of the array doesn't work. It keeps on trying to unmount the shares. Telnet or SSH reboots don't work.

Only a hard reset gets the system to reboot... but then the same scenario starts all over again. Add-ons are all disabled, except for unmenu.

 

Is Beta13 getting out soon or can I already patch this lib myself?

 

 

 

 

I'll be posting -beta13 today.  Will it solve the BLK_EH_NOT_HANDLED issue?  Don't know - the only time I see that is when there is bad h/w, in particular, bad PSU.

 

Can't wait to try out beta13  :P  what's your GMT?

Link to comment

Can't wait to try out beta13  :P  what's your GMT?

 

I just got 2 new 3TB drives - one for parity, one for storage.  I had thought about waiting up last night to install beta 13 once it was released, and start a parity check.  Glad I didn't wait up :)

 

I think it's good that Tom is cautious.  With every extra hour he puts into it before releasing it, the software can only get better.  (I hope)

Link to comment

Ok, So I have been researching this for hours.. I felt comfortable enough to finally upgrade to 5.0 b12 from 4.7 after being comfortable with Unraid for the past few months.  The first time I started Unraid with the newest files, everything went great, I followed the instructions with setting the permissions and set up some plugins (sab,couch,etc).  However I decided to do a reboot and since then I cannot get my array to come out of the "starting..." status.  It looks like everything is up though as I can access the shares.  I even tried downgrading to 5.0b11 to see if that would help. In between reboots, I have to use the command line to shut down the array before I reboot since the webgui seems to think it is always starting.

Someone else seemed to have a similar problem and they set their array auto start up option to no and then start it manually and it seemed to work for them sometimes.  I tried all of that and I can't for the life of me to get it work.  I even scoured the logs and I really can't see anything major (but then again I am still learning a lot).  Is this a bug or is something wrong with my system?

Thanks for any help you can give me.

Syslog.txt

Link to comment

I had this issue. In the end I removed the sab/sick/couch plugins and it seems to have gone away. I also had disk and/or psu issues so I can't me be sure what the issue was and I'm not keen to play around with a prod server. But try removing those scripts as a first step and see what happens.

Link to comment

I had this issue. In the end I removed the sab/sick/couch plugins and it seems to have gone away. I also had disk and/or psu issues so I can't me be sure what the issue was and I'm not keen to play around with a prod server. But try removing those scripts as a first step and see what happens.

 

You hit the nail on the head,  by removing the plugins, it worked for me as well.  I wonder if that's an issue with the plugins themselves or the plugin architecture.  Either way, thanks!!!

Link to comment

What I am about to say is less than scientific, and more just a quick observation.  With 4.7 on my 20 drive, 1 partity, 1 cache drive build using AOC-SALSP-MV8 cards I am used to getting around 75MB/sec, with 5b12 I am getting closer to 40MB/sec....

 

Yes, with 13 drive + parity (new 3TB) b12 started around 25 MB/s then hit 40ish MB/s at the halfway mark.  Wasn't able to check it before it finished...

Link to comment

Just a quick update on the NFS issues.  I had access to my brother's macbook and setup NFS on that and it also gave the same error message on the same files.

 

I've changed the frontends to use SMB rather than NFS and I don't have any issues anymore.  The SMB share is a little slower (1-2MB/s) in the read but it's still above the minimum bandwidth required for streaming so it works for me.

 

Will try 12a sometime tomorrow.

Link to comment

My parity drive has a red ball next to it, I think the problem started here

 

Sep  4 16:00:40 Tower kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen

Sep  4 16:00:40 Tower kernel: ata1: irq_stat 0x00400000, PHY RDY changed

Sep  4 16:00:40 Tower kernel: ata1: SError: { RecovComm Persist PHYRdyChg 10B8B }

Sep  4 16:00:40 Tower kernel: ata1: hard resetting link

Sep  4 16:00:50 Tower kernel: ata1: softreset failed (1st FIS failed)

Sep  4 16:00:50 Tower kernel: ata1: hard resetting link

Sep  4 16:01:00 Tower kernel: ata1: softreset failed (1st FIS failed)

Sep  4 16:01:00 Tower kernel: ata1: hard resetting link

syslog1.zip

Link to comment

I'm transferring large files via NFS between unRAID and an OS X Snow Leopard machine and during this transfer at roughly every half hour OS X will throw up a Server Disconnected message to unRAID, that if I simply ignore, will eventually resolve itself (reconnect).

 

However, this interrupts the file transfer.

 

The unRAID syslog shows periodic Monitor Host (SM_MON) errors that coincide with OS X Server Disconnected errors.

 

This is my first use of NFS on a wide scale, which also happens to be my first use of unRAID 5 (b12), so I'm not sure if this is an existing unRAID NFS issue, a v5 beta issue or an OS X issue.  I will post this issue under b12.

 

There are no problems with SMB transfers/connections.

 

My workaround is to ensure file transfers take less than a half hour to complete.  Since I'm dealing with a video Media Server with BluRay images, these are typically 20-50GB in size, meaning I must only transfer one video file at a time.

 

I keep getting a forum error that it is unable to accept the attachment due to size or time (I've excerpted it, I've shortened the file name and I tried zipping it; all to no avail) so I will post the pertinent snippet:

 

Sep  7 08:32:14 UnRAID rpc.statd[1230]: No canonical hostname found for 10.0.1.200
Sep  7 08:32:14 UnRAID rpc.statd[1230]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200
Sep  7 08:32:14 UnRAID kernel: lockd: cannot monitor The Matrix

 

This error appears to occur mostly every minute while an unRAID NFS share is mounted on the OS X machine.  The OS X machine is at 10.0.1.200, network name of "The Matrix".

Link to comment
  • 3 weeks later...

I'm transferring large files via NFS between unRAID and an OS X Snow Leopard machine and during this transfer at roughly every half hour OS X will throw up a Server Disconnected message to unRAID, that if I simply ignore, will eventually resolve itself (reconnect).

 

However, this interrupts the file transfer.

 

The unRAID syslog shows periodic Monitor Host (SM_MON) errors that coincide with OS X Server Disconnected errors.

 

This is my first use of NFS on a wide scale, which also happens to be my first use of unRAID 5 (b12), so I'm not sure if this is an existing unRAID NFS issue, a v5 beta issue or an OS X issue.  I will post this issue under b12.

 

There are no problems with SMB transfers/connections.

 

My workaround is to ensure file transfers take less than a half hour to complete.  Since I'm dealing with a video Media Server with BluRay images, these are typically 20-50GB in size, meaning I must only transfer one video file at a time.

 

I keep getting a forum error that it is unable to accept the attachment due to size or time (I've excerpted it, I've shortened the file name and I tried zipping it; all to no avail) so I will post the pertinent snippet:

 

Sep  7 08:32:14 UnRAID rpc.statd[1230]: No canonical hostname found for 10.0.1.200
Sep  7 08:32:14 UnRAID rpc.statd[1230]: STAT_FAIL to UnRAID for SM_MON of 10.0.1.200
Sep  7 08:32:14 UnRAID kernel: lockd: cannot monitor The Matrix

 

This error appears to occur mostly every minute while an unRAID NFS share is mounted on the OS X machine.  The OS X machine is at 10.0.1.200, network name of "The Matrix".

 

I have a different device that I mount via NFS to act as a "control" test and there is definitely a NFS connectivity bug with unRAID b12 as I have zero issues with the other NFS server.

Link to comment

I've performed this procedure twice so far under b12 and no problems, however, last night I upgraded my 3rd data disk to 3TB and set about unRAID building data onto the new drive.  17 hours later I attempt to check on the progress and the web server is unresponsive.  I've had this happen once before so I try unMENU access, but that, too, is unresponsive (where previously, I was able to access it even when unRAID web server is unresponsive).

 

I can telnet into it so the machine is still running and isn't locked. up.  I can still mount all defined disks and shares via SMB and NFS.

 

I checked on the status this morning, roughly 10 hours ago, and it was at 40%, ~40MB/s and going, with roughly 700 minutes estimated until completion, so if the estimate is accurate, it should be completed by now with the data rebuild.

 

QUESTION:  Not being sure, should I perform a reboot command from the command line?


EDIT:  I noticed my telnet connections are not staying active after several minutes with something about a "foreign host" error message.  And now I can't even telnet to the unRAID box, though SMB/NFS access is still available.

 

So I use IPMI to see what the "screen" is showing and I am presented with non-stop REISERFS errors scrolling on the screen.  The recurring snippet:

 

REISERFS error.  (device md3): vs-5150 search_by_key: invalid format found in block 281116676. Fsck?
REISERFS error.  (device md3): vs-5150 search_by_key: invalid format found in block 281116678. Fsck?
REISERFS error.  (device md3): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [15 484 0x0 SD]

 

The same errors, the same problematic blocks.

 

Now what?  Forced restart since I can't telnet anymore?

Link to comment

Without any responses and determining there was no other "soft" means of restarting, I did a hardware reset of the server, and it came up with the array offline.

 

I launched unMENU and performed a SMART test on sdg (which, it appears to be the "md3" disk; <rant on>Why can't unRAID stick to ONE disk identification system, hmm? <rant off>) and then the parity drive sda and they both passed.

 

I tried a File System Check via unMENU but it reports back:

 

Sorry, no file system detected on /dev/md3

 

for both the parity drive (sda) and the new drive (sdg aka md3 aka disk3).

 

So I'm starting a parity check with the parity correction disabled and see what that gets me.

Link to comment

Without any responses and determining there was no other "soft" means of restarting, I did a hardware reset of the server, and it came up with the array offline.

 

I launched unMENU and performed a SMART test on sdg (which, it appears to be the "md3" disk; <rant on>Why can't unRAID stick to ONE disk identification system, hmm? <rant off>) and then the parity drive sda and they both passed.

 

I tried a File System Check via unMENU but it reports back:

 

Sorry, no file system detected on /dev/md3

 

for both the parity drive (sda) and the new drive (sdg aka md3 aka disk3).

 

So I'm starting a parity check with the parity correction disabled and see what that gets me.

Parity does not have a file system type.

 

and the md3 vs. sdg thing.  the Linux kernel controls the sdX assignment and those can change from boot to boot, it all depends on the order in which pieces and parts are discovered.

 

unRAID has to then find the drives it needs and mount/assign them correctly.  The mdX devices are just that.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.