unRAID Server Release 5.0-beta13 Available


limetech

Recommended Posts

It is looking like b13 didn't fix the SUPERMICRO AOC-SASLP-MV8 problems.

My system has crashed twice today while doing Parity check. Just like b12a did. Ran great for a while not a single issue, then Parity check, and bam nothing but problems.

 

Echo this.

 

I went from 5.0beta9 to 5.0beta13 and noticed a horribly slow Parity check. On beta9 I get about 50Mb/s when I was doing a Parity check on beta13 I was seeing about 20Mb/s.... I just downgraded back to beta9 and everything seems to be back to normal. 16 drives are on a SuperMicro AOC-SAS-LP-MV8

Link to comment
  • Replies 269
  • Created
  • Last Reply

Top Posters In This Topic

May have found a bug,

 

I setup a Share as "Use cache disk: Only" added data to that cache only share, a day later I went back to change it to "Use cache disk: Yes", added 6 disks to "Included disk(s):", set allocation method to "Most-free" a "Split level: 1". Clicked "Apply" then clicked "Done". Executed the mover "Move Now", no data moved, tail log shows its skipping that share.

 

I repeated this with another share, same behavior. So seems you cannot switch from a cache only share to an array share.

 

I checked the flash\config\shares\cfg's, the cfg looks no different then a share that was originally setup as an array share... Rebooted unRAID, just will not move.

Problem is now, I cant delete the share as it is not empty so I cant even recreate it as a new array share to get over this bug.

 

Any suggestions?

You could try manually moving the data from cache to a drive, delete the folder on the cache drive when empty. stop start array or reboot and then try and use the share and see what happens?

 

Yeah, what I came up with was this, lets say that share was "movies", I went back to the share settings renamed it to "movies99", saved this change, it did rename the folder on the cache drive which mover keeps skipping. Created a new share "movies" since now it was freed up. Saved so now I copied one file to the movies share, it created the folder on the cache drive as it should have with this one file in it. I then cut and pasted all the files from movie99 into movie (gig's worth), took 2 seconds. Kicked off mover and it started to move off all the files from movie share on the  cache drive. One finished it removed the folder off the cache drive.

 

It seems there is a bug from starting off with a share that is cache only at first and then trying to change it to an array share.

 

Question why does the mover remove the folder at the end of a move? Could that be stopped and have the empty directory left on the cache drive?

 

I was also thinking what a great thing it would be to have an option like "reallocate" with mover. Say you had two 2TB drives with a movie share set to most-free. They get half-full (each drive) you purchase 2 more 2TB drives. You include them in with the share, normal behavior will be the next copies would be placed on the 2 new drive's until they fill up half-full before mover started to copy to all four, right. What if we had a button to optionally reallocate share after adding these two drives and it would move data off the original first two drives to the second set of drives to have equal amount of data (and free space) across all four.?

Link to comment

It is looking like b13 didn't fix the SUPERMICRO AOC-SASLP-MV8 problems.

My system has crashed twice today while doing Parity check. Just like b12a did. Ran great for a while not a single issue, then Parity check, and bam nothing but problems.

 

How many SASLP-MV8 controllers are you running.  My hardware is very similar to yours, and I'm not seeing anything yet.  (knocking on wood)  What symptoms are showing up during your parity check?

 

Supermicro X9SCM-F

Intel i3-2100  and 4 gig RAM

Supermicro AOC-SASLP-MV8 8-Port (only 1)

Norco 4224

mix of 3tb and 2tb drives

Link to comment

I just added an AOC-SASLP-MV8

I am using 2 reverse breakout cables to hook up drives.

 

 

You would need Forward breakout cables to hook a SASLP-MV8 to a hard drives sata port.

 

I may have the terminology wrong, but the cables are made to go from the controller to the drives where the fanout is supposed to be on the drive side.

Both forward and reverse breakout cables have sata ports on one side and an SAS port on the other, but are not interchangeable because they're only meant to send data in one direction. Make sure you have the correct cable.

 

As I said, I believe I have the right cable (SAS on the controller side, drives on the other).  Is the cable used supposed to be the cause of the errors?

Link to comment

I am running 2 cards!

Parity ran real good until somewhere above 16.4% (Last time I checked) it was running at 104.8MB/s and I was watching a tv show on the server, then the tv shows froze up, so I went to the webgui and it did not respond. this happened twice, then I tried a 3rd time without doing anything not even going to the webgui. I just checked after seeing this post, and it has crashed again. Time to go back to b11 where I have not had any issues yet.

 

It is looking like b13 didn't fix the SUPERMICRO AOC-SASLP-MV8 problems.

My system has crashed twice today while doing Parity check. Just like b12a did. Ran great for a while not a single issue, then Parity check, and bam nothing but problems.

 

How many SASLP-MV8 controllers are you running.  My hardware is very similar to yours, and I'm not seeing anything yet.  (knocking on wood)  What symptoms are showing up during your parity check?

 

Supermicro X9SCM-F

Intel i3-2100  and 4 gig RAM

Supermicro AOC-SASLP-MV8 8-Port (only 1)

Norco 4224

mix of 3tb and 2tb drives

Link to comment
Parity ran real good until somewhere above 16.4% (Last time I checked) it was running at 104.8MB/s and I was watching a tv show on the server, then the tv shows froze up, so I went to the webgui and it did not respond. this happened twice, then I tried a 3rd time without doing anything not even going to the webgui. I just checked after seeing this post, and it has crashed again. Time to go back to b11 where I have not had any issues yet.

 

With b12, running any parity rebuilds (and I believe parity checks as well) always led to errors if I am accessing the server concurrently, such as watching videos (my unRAID is a Media Server as is yours) so it seems b13 still has issues when accessing the server continuously (e.g. Watching videos off it) during parity operations....

Link to comment

Bad news, after fixing my CPU scaling thing... I found this:

 

Nov  5 12:48:54 Tower kernel: r8169 0000:02:00.0: eth0: link up (Network)

 

:(

 

I'm guessing that Realtek bug is back, how do I go back to r8168 on b13?

 

Edit: Syslog attached.

 

Well, under very high network load those lines pop up more and more frequently, when before they didn't. However it's certainly not as bad as it was with previous betas!

 

As Limetech has said, beta13 has an entirely new kernel and new NIC drivers. Continue using what it defaults to and report issues if you see any. Just because it's using r8169 does NOT mean it's an issue.

 

linux - I have been monitoring and testing the 3.1 development.  There are numerous driver changes.  In particular, mvsas seems far more robust.  Also r8169 has many changes, so in this release I have gone back to the kernel driver for Realtek devices. If you are using Realtek NIC's please report whether it still works or not with this release.

http://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.1

Link to comment

I've been up and running with -13 for over a week. It seems to have crashed this morning. I can't get into any web interfaces, i can't telnet/ssh in. Will try and see if i can get on directly to the machine.

 

//Edit: System was locked up. I had to reboot. Parity check is running now.

Link to comment

I just added an AOC-SASLP-MV8 to my system in order to consolidate several 2-port sata controllers, I was able to boot and start the array, but the syslog was being FILLED by lines like the ones below.  I'm talking on the order of 1 gig of syslog in the space of 10 minutes.  6 of my drives are on the MB controller and the other 5 where put on the AOC-SASLP-MV8, bios version is 3.1.0.22.   Initially I had a drive red-balled because I didn't have one of the cables seated properly but the syslog messages were always present.  I am using 2 reverse breakout cables to hook up drives.

 

[pre]Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2108:phy 3 ctrl sts=0x00199800.

Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2110:phy 3 irq sts = 0x01000000

Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2108:phy 3 ctrl sts=0x00199800.

Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2110:phy 3 irq sts = 0x01000000

Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2108:phy 3 ctrl sts=0x00199800.

Nov 5 02:04:39 Tower kernel: drivers/scsi/mvsas/mv_sas.c 2110:phy 3 irq sts = 0x01000000[/pre]

 

I'm sorry to reply to myself, but the only suggestion I got for this was to check my cables (which I've done).  It's fine if that is the only likely cause for this problem, I'd just like confirmation.

 

Thanks.

Link to comment

I just wanted to provide feedback to Limetech about how well Beta 13 is working for me.

 

Issue #1: Drive spindown: Of the seven, only two drives ever spin down: #3 and #5 (which are part of separate shares).  None of the rest spin down on their own.  I can spin down all the drives manually via the web GUI, but the rest will spin back up on their own.  Spindown times are set to default of 45; no spinup groups.

 

Issue #2: Access to GUI: Occasionally I cannot get to the web GUI from my browser.  The browser says it cannot contact the address.  However, the server and its shares show up in Finder (I use Macs), and I have complete access to them.  I can also access unMenu while this is happening.  I just cannot access the unRAID web GUI.  To restore access, I have to reboot the server.  I have attached a syslog captured while experiencing this problem.

 

UPDATE:  I stopped the array via unMenu, then attempted to log in to the remote management for the main board to force it to power cycle.  My browser locked up and I had to force quit.  After several minutes, I was able to restart my browser and suddenly able to access the unRAID web GUI again.  Very strange.  Restarted the array from the web GUI, completely responsive.  The server never power cycled.  Perhaps there is a problem with the web GUI html?...

 

Broadly, my experience with Beta 13 has been positive.  I have had no problems with AFP and the server is very responsive.  My parity checks are very fast.

 

Hardware and Setup:

NORCO 4220 case

CORSAIR Enthusiast Series TX650 PSU

Supermicro MBD-X8SIL-F-O mainboard

Intel Core I3 processor

4GB RAM

2 x AOC-SASLP-MV8 controllers

2 x WD 2GB EARX (one of which is the parity drive)

4 x WD 2GB EARS

1 x WD 1.5GB EADS

 

Parity and three data drives are wired direct to the main board SATA ports; the other three data drives are on a single AOC-SASLP-MV8 controller.

 

All shares are exported via AFP only.  All discs are exported via SMB only.  I have never accessed the server via SMB other than the Flash drive.

 

I run security on all my shares (Secure mode).

 

I run the following "packages":  

unMenu

apcupsd

bwm-ng

"C" compiler & development tools

mail and ssmtp

unRAID Status Alert

unRAID Power-Down

Clean Powerdown

screen

 

I think 5.0 is a great release.  Between it and Plex, they are making some pretty incredible things possible.  Both have their issues, but both are being actively developed - please keep up the great work!

 

Phil C.

syslog.txt

Link to comment

Seems there may be an issue with going to sleep (which worked well on 12b):  Snip from my log:

 

Nov  6 22:01:26 Tower kernel: PM: Syncing filesystems ... done.
Nov  6 22:01:26 Tower kernel: Freezing user space processes ... 
Nov  6 22:01:26 Tower kernel: Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
Nov  6 22:01:26 Tower kernel: find            D c14a6fc0     0 12024  11992 0x00800004
Nov  6 22:01:26 Tower kernel:  c3fabed8 00000086 c14a6fc0 c14a6fc0 c14a2000 f0644b9c c14a6fc0 f04c0cec
Nov  6 22:01:26 Tower kernel:  00000003 f0644a20 00000003 c3fabe88 c102585c c3fabeac c101f3e6 00000000
Nov  6 22:01:26 Tower kernel:  e8430034 f0644a20 c3fabeb4 c10328c1 00000292 00000292 f0644a20 c3fabecc
Nov  6 22:01:26 Tower kernel: Call Trace:
Nov  6 22:01:26 Tower kernel:  [] ? default_wake_function+0xb/0xd
Nov  6 22:01:26 Tower kernel:  [] ? __wake_up_common+0x34/0x5c
Nov  6 22:01:26 Tower kernel:  [] ? __set_task_blocked+0x66/0x6c
Nov  6 22:01:26 Tower kernel:  [] ? set_current_blocked+0x27/0x38
Nov  6 22:01:26 Tower kernel:  [] schedule+0x48/0x4a
Nov  6 22:01:26 Tower kernel:  [] request_wait_answer+0x11d/0x1b5
Nov  6 22:01:26 Tower kernel:  [] ? wake_up_bit+0x5b/0x5b
Nov  6 22:01:26 Tower kernel:  [] fuse_request_send+0x96/0x9c
Nov  6 22:01:26 Tower kernel:  [] fuse_readdir+0xb9/0x170
Nov  6 22:01:26 Tower kernel:  [] ? generic_block_fiemap+0x43/0x43
Nov  6 22:01:26 Tower kernel:  [] vfs_readdir+0x53/0x7e
Nov  6 22:01:26 Tower kernel:  [] ? generic_block_fiemap+0x43/0x43
Nov  6 22:01:26 Tower kernel:  [] sys_getdents64+0x63/0xa5
Nov  6 22:01:26 Tower kernel:  [] syscall_call+0x7/0xb
Nov  6 22:01:26 Tower kernel: 
Nov  6 22:01:26 Tower kernel: Restarting tasks ... done.
Nov  6 23:01:14 Tower kernel: mdcmd (79): spindown 0
Nov  6 23:01:15 Tower kernel: mdcmd (80): spindown 1
Nov  6 23:01:15 Tower kernel: mdcmd (81): spindown 2
Nov  6 23:01:36 Tower emhttp: shcmd (114): /usr/sbin/hdparm -y /dev/sde &> /dev/null
Nov  6 23:17:27 Tower kernel: PM: Syncing filesystems ... done.
Nov  6 23:17:27 Tower kernel: Freezing user space processes ... 
Nov  6 23:17:27 Tower kernel: Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
Nov  6 23:17:27 Tower kernel: find            D c14a6fc0     0 20568  20540 0x00800004
Nov  6 23:17:27 Tower kernel:  e8673d70 00000086 c14a6fc0 c14a6fc0 c14a2000 efdcc4dc c14a6fc0 f05f104c
Nov  6 23:17:27 Tower kernel:  00000002 efdcc360 00000003 e8673d20 c102585c e8673d44 c101f3e6 00000000
Nov  6 23:17:27 Tower kernel:  e8430034 efdcc360 e8673d4c c10328c1 00000282 00000282 efdcc360 e8673d64
Nov  6 23:17:27 Tower kernel: Call Trace:
Nov  6 23:17:27 Tower kernel:  [] ? default_wake_function+0xb/0xd
Nov  6 23:17:27 Tower kernel:  [] ? __wake_up_common+0x34/0x5c
Nov  6 23:17:27 Tower kernel:  [] ? __set_task_blocked+0x66/0x6c
Nov  6 23:17:27 Tower kernel:  [] ? set_current_blocked+0x27/0x38
Nov  6 23:17:27 Tower kernel:  [] schedule+0x48/0x4a
Nov  6 23:17:27 Tower kernel:  [] request_wait_answer+0x11d/0x1b5
Nov  6 23:17:27 Tower kernel:  [] ? wake_up_bit+0x5b/0x5b
Nov  6 23:17:27 Tower kernel:  [] fuse_request_send+0x96/0x9c
Nov  6 23:17:27 Tower kernel:  [] fuse_do_open+0xd8/0x12f
Nov  6 23:17:27 Tower kernel:  [] ? ns_capable+0x35/0x4d
Nov  6 23:17:27 Tower kernel:  [] fuse_open_common+0x4e/0x69
Nov  6 23:17:27 Tower kernel:  [] ? fuse_dir_release+0x13/0x13
Nov  6 23:17:27 Tower kernel:  [] fuse_dir_open+0xd/0xf
Nov  6 23:17:27 Tower kernel:  [] __dentry_open+0x134/0x208
Nov  6 23:17:27 Tower kernel:  [] ? fuse_permission+0xa3/0x1f2
Nov  6 23:17:27 Tower kernel:  [] ? do_lookup+0x7f/0x287
Nov  6 23:17:27 Tower kernel:  [] nameidata_to_filp+0x45/0x53
Nov  6 23:17:27 Tower kernel:  [] do_last+0x4e4/0x5d5
Nov  6 23:17:27 Tower kernel:  [] path_openat+0x9d/0x2a6
Nov  6 23:17:27 Tower kernel:  [] do_filp_open+0x21/0x60
Nov  6 23:17:27 Tower kernel:  [] ? getname_flags+0x1e/0xa7
Nov  6 23:17:27 Tower kernel:  [] do_sys_open+0xf6/0x174
Nov  6 23:17:27 Tower kernel:  [] sys_open+0x1e/0x26
Nov  6 23:17:27 Tower kernel:  [] syscall_call+0x7/0xb
Nov  6 23:17:27 Tower kernel: 
Nov  6 23:17:27 Tower kernel: Restarting tasks ... done.

 

Thought I'd pass along.  Full log pasted in also. 

log.txt

Link to comment

I'd like to report an experience I'm having with the current beta software (both b12a and b13) which appears to indicate an issue with NFS.

 

This relates to a test server which is based on the Supermicro X8-SIL motherboard, with 12 data drives, cache and parity, all Hitachi 3TB drives. The parity, cache and four data drives are connected to SATA ports on the motherboard, with the other eight drives connected to a Supermicro AOC-SASLP-MV8 SATA controller.

 

When user shares are accessed from a Duneplayer via NFS, the Duneplayer does not display all the icons for the titles within the selected folder. For example, the following screen shot shows a sample page which should include 18 icons displayed in a 6x3 matrix. As can be seen, there are multiple 'holes' where icons should appear but do not. Additionally, it is not possible to invoke the associated video folders by clicking on the area where the missing icons should be. Effectively, the affected video files are inaccessible via the Duneplayer. This behavior is replicable across multiple Duneplayers connected to the same UnRaid server running 5b12a or 5b13.

 

 

screenshot1vx.th.jpg

 

 

Switching now to SMB (at the Duneplayer end), everything appears normal; all icons are displayed properly and all video files are accessible - see following screenshot with the 'holes' filled in.

 

 

screenshot2qi.th.jpg

 

 

I'd also note that this issue does not occur with my main server, which has similar hardware except for the drives which are all 2TB, this server is running version 4.7 and all shares are accessed via NFS from the same Duneplayers without any problem.

 

Link to comment

I'd like to report an experience I'm having with the current beta software (both b12a and b13) which appears to indicate an issue with NFS.

 

If you read back through this thread (and the beta 12/12a thread), you will see that several people are experiencing problems with nfs.  Data transfers will start, but then hang after a short while.  I'm guessing that this is to do with the latest kernel.  I've reverted back to beta11, hoping that this issue will soon be fixed in a new release.

Link to comment

Yep, I reported NFS issues both under b12 thread and the general Bug forum.  Primary client machine is OS X, but also have a PCH (Linux based) Media Player.

 

This is interesting! I have issue with Tvix SLIM S1 using NFS (linux based player) , subtitles don't work sometimes, stuttering , Icon not all is showing, like the issue PFT have. and more, shifting to samba its much better, and I have i test file (high bitrate) that file was not OK in B12, using B13 with SMB2 this is playing smooth.

 

I never think the issue was on unraid side, but now when more have it, it looks like we need Tom look into this, maybe there is a bug according NFS.

 

Is there a new version on B13 for NFS? , I think I saw something about version 4 in my syslog

 

//Peter

Link to comment

Linux Kernel 3.1.1 is out and contains a lot of fixes. One in particular is of vital importance for LSI hardware users. I dont know if this is causing the massive headaches in Beta 13 or not, but it can't make things any worse than they are now (completely unusable).

 

http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.1.1

commit 218782d30177214893625e8d6523191caa5e023b

Author: [email protected] <[email protected]>

Date:  Fri Oct 21 10:06:33 2011 +0530

 

    mpt2sas: Fix for system hang when discovery in progress

   

    commit 0167ac67ff6f35bf2364f7672c8012b0cd40277f upstream.

   

    Fix for issue : While discovery is in progress, hot unplug and hot plug of

    enclosure connected to the controller card is causing system to hang.

   

    When a device is in the process of being detected at driver load time then

    if it is removed, the device that is no longer present will not be added

    to the list. So the code in _scsih_probe_sas() is rearranged as such so

    the devices that failed to be detected are not added to the list.

Link to comment

Linux Kernel 3.1.1 is out and contains a lot of fixes. One in particular is of vital importance for LSI hardware users. I dont know if this is causing the massive headaches in Beta 13 or not, but it can't make things any worse than they are now (completely unusable).

 

http://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.1.1

commit 218782d30177214893625e8d6523191caa5e023b

Author: [email protected] <[email protected]>

Date:   Fri Oct 21 10:06:33 2011 +0530

 

    mpt2sas: Fix for system hang when discovery in progress

   

    commit 0167ac67ff6f35bf2364f7672c8012b0cd40277f upstream.

   

    Fix for issue : While discovery is in progress, hot unplug and hot plug of

    enclosure connected to the controller card is causing system to hang.

   

    When a device is in the process of being detected at driver load time then

    if it is removed, the device that is no longer present will not be added

    to the list. So the code in _scsih_probe_sas() is rearranged as such so

    the devices that failed to be detected are not added to the list.

 

You're right, BRiT, this fix appear to handle those terrible mptsas detection errors. I'm hopping that Tom will update v5 soon.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.