unRAID Server Release 5.0-beta14 Available


limetech

Recommended Posts

From the reports so far, this seems to be the major issues remaining in 5.0:

 

- Disk spin up/down issues.  For those experiencing this issue, I'd like to ask that you run completely "stock", i.e., no add-ons, no plugins, etc.  If still issues with this, again please report and include the system log.  One of the logs I looked at seems to show a spin-up issue when trying to access a spun-down drive via AFP (first guess is the afpd timeout is too shore - I will look into this).

 

- NFS issues.  Well NFS/fuse has historically been "problematic".  I don't have a Tvix so specific troubleshooting may be difficult. [Probably I should get one of these: a) how does it compare to a Popcorn Hour, and b) which one should I get?].  Peter, please send me an email and we can try to troubleshoot that way instead of via forum: [email protected]

 

Link to comment
  • Replies 496
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

I don't have a Tvix so specific troubleshooting may be difficult.

 

I'm not sure that the device is critical ... I experienced problems in beta12 simply when writing a large file from an ubuntu desktop.  The file transfer would hang after about 800kB, requiring the unRAID server to be rebooted before NFS would work again.  I don't believe that there was any pertinent information in the syslog, but if it would be helpful, I could reproduce the problem and document it.

 

 

Edit:

Ooops, my memory is bad (my personal memory, not the computer's!)!

 

I first reported the problem with beta12a (although beta12 fails too), and it was when copying a set of files FROM unRAID, not to unRAID, reported here.

 

In fact, the problem first became apparent when a program was accessing the files - what I was trying to do was to convert a number of flac files to mp3.  Soundconv was hanging part way through the first file, which is why I then tried to copy the files to local storage on the ubuntu desktop machine, only to find that the copy failed in the same place.

 

BTW, I do have a couple of Popcorns, so if it would assist, I can try testing with those.

 

Oh, and I can confirm that this was a 'disk share', since I don't have a bookmark set on my ubuntu machine for the user share where the files reside.  I know that I went in as disk2, and navigated the directory tree.

Link to comment
it's NFS no longer going to be supported as previous kernel. FYI B11 is NFS flawless, but I have some other issue with kernel panic on that build.

 

As Peter, I find it difficult to believe NFS support would discontinue for Linux as it is the de facto standard file sharing for *nix, so it stands to reason that these NFS issues we are experiencing have to involve unRAID itself.

 

Now, whether unRAID 5 will move forward without NFS support is another question, which I hope does not occur.

Link to comment

it's NFS no longer going to be supported as previous kernel. FYI B11 is NFS flawless, but I have some other issue with kernel panic on that build.

 

As Peter, I find it difficult to believe NFS support would discontinue for Linux as it is the de facto standard file sharing for *nix, so it stands to reason that these NFS issues we are experiencing have to involve unRAID itself.

 

Now, whether unRAID 5 will move forward without NFS support is another question, which I hope does not occur.

 

If you read my previous post, so is NFS OK with disk share, so we have a issue for user share for NFS.

 

//Peter

Link to comment

Hi!

 

From the reports so far, this seems to be the major issues remaining in 5.0:

 

- NFS issues.  Well NFS/fuse has historically been "problematic".  I don't have a Tvix so specific troubleshooting may be difficult. [Probably I should get one of these: a) how does it compare to a Popcorn Hour, and b) which one should I get?].  Peter, please send me an email and we can try to troubleshoot that way instead of via forum: [email protected]

 

 

Normally I use SMB to playback movies from my unRAID server and stream them to my Popcornhour C200. I switched to NFS and started a bluray (Avatar untouched) and discovered no problems so far. I would guess that it might not be a general probem with NFS put with the used streaming device.

 

Bye.

Link to comment
If you read my previous post, so is NFS OK with disk share, so we have a issue for user share for NFS.

 

//Peter

 

Well I've been experiencing issues with both user and disk shares, when I engage in prolonged file access (transfers, jukebox indexing via YAMJ or watching videos) the access constantly "pauses" just enough to corrupt the file operation, but usually not enough to actually result in stopping the operation.  The net result is incomplete YAMJ indexes with huge amount of data missing or incomplete, stuttering videos, and corrupted file transfers.

 

Now there is one operation that I've tried only user shares and that's an actual file copy via OS X Finder: i always get an error that the operation is prohibited, yet unRAID still creates an empty file (0k) in the destination directory, but I can freely modify (edit in whichever application can open it) and delete files on demand.  Weird thing is, YAMJ on the same OS X machine can transfer all the thousands of jukebox files to the NFS share under what i assume is the same UID or permissions in the background without that type of error (it still comes across the corrupting "pauses" which essentially makes the jukebox useless; once in awhile it will index and transfer the jukebox error free).

 

UPDATE: Actually, I DID try copying files to a disk share via OS X Finder under B12 and got the prohibited error.  I will try again with b14.

Link to comment

If you read my previous post, so is NFS OK with disk share, so we have a issue for user share for NFS.

 

//Peter

 

Well I've been experiencing issues with both user and disk shares, when I engage in prolonged file access (transfers, jukebox indexing via YAMJ or watching videos) the access constantly "pauses" just enough to corrupt the file operation, but usually not enough to actually result in stopping the operation.  The net result is incomplete YAMJ indexes with huge amount of data missing or incomplete, stuttering videos, and corrupted file transfers.

I'm going to make a longer test with disk share, what I can see so far is that subtitles that don't work on user share works on disk share, going to start i full BD movie and see id it plays all of it.

 

//Peter

Link to comment

- Disk spin up/down issues.  For those experiencing this issue, I'd like to ask that you run completely "stock", i.e., no add-ons, no plugins, etc.  If still issues with this, again please report and include the system log.  One of the logs I looked at seems to show a spin-up issue when trying to access a spun-down drive via AFP (first guess is the afpd timeout is too shore - I will look into this).

 

I disabled all user-installed packages via unmenu, disabled unmenu, rebooted.  Ejected the user shares from all computers in the house.  Via webgui, I spun down all drives - webgui confirmed all drives were spun down.  Spin down command was sent @ 0529 (see attached syslog...I'm in the Navy, early riser...what can I say?).  From 0529, I can absolutely confirm there was no user directed or 3rd party software background service write or read access to the server.  At 0554, according to the log, a spin down command was sent to drives 2 and 6.  As of 0556, the webgui shows drives 2, 3, 5 and 6 as spun down (see attached screenshot) (Never mind, keep exceeding file size limitations for the post - can't make the screenshot clear enough to read).  All the computers in my house run Mac OS X Lion, all user shares are exported via AFP only, while disc shares are exported via SMB only.   See attached spin down settings file to see user shares/spin down settings by disc.

 

The one thing I cannot confirm is whether or not Lion is accessing the drives to cause the problem; i.e., part of the AFP or SMB background process is accessing drives in order to maintain the connection.  I kinda doubt it, because I always have one or two drives that show as spun down.  If it is any help, discs 3 and 5 are completely empty (no folders, no files).  Disc 6 has only a single folder (no files).  There may be something there...it is usually discs 3 and 5 that behave appropriately (spin down).

 

I'm willing to downgrade to 4.7.1 on my server to see if the drives spin down properly under that version.  That would eliminate/identify hardware issues.  I also have another unRAID server running 4.7.1 beautifully, all drives spin down appropriately (known good configuration), am willing to upgrade it to 5b14 to test spin down as well.  Just want to know how much of a one-way trip each of those is:  how much configuration/data do I lose going each way?  Momma won't tolerate a prolonged absence of our networked media...

 

If there is anything else I need to change in my config to run the tests, let me know and I'll do it.

syslog-2011-11-27.txt

unRAID_webgui.pdf

unRAID_spin_down_settings.txt

Link to comment

I'm still having the same NFS issues that I had under b13.

http://lime-technology.com/forum/index.php?topic=16125.msg153667#msg153667

 

Split second of video/sound and then it stops (under xbmc). User shares through smb seem fine, and the syslog has no information at all.

 

I mounted and navigated the nfs share using my linux box, and I get this interesting error on it:

ls: reading directory .: Stale NFS file handle

 

One thing I'd like to point out, but not sure if it has an effect, was the name of the directory in nfs. "Doctor Who (2005)"

Other shows seem to have no issue, so I wondered if the parenthesis was causing nfs a problem.

Link to comment

I am seeing the below message every ten seconds.  I attached a syslog, and it full of this. 

 

Nov 27 08:37:35 UnRaid kernel: mdcmd (44712): spindown 0

Nov 27 08:37:35 UnRaid kernel: mdcmd (44713): spindown 1

Nov 27 08:37:35 UnRaid kernel: mdcmd (44714): spindown 2

Nov 27 08:37:35 UnRaid kernel: mdcmd (44715): spindown 3

Nov 27 08:37:35 UnRaid kernel: mdcmd (44716): spindown 4

Syslog.txt.zip

Link to comment

Got a write error on a spun down disk ("Device not ready"). There was no special load on the machine at that time.

 

After that error things got confused, with all disks labelled wrong etc.. After a reboot and downgrade to beta13, I could rebuild the drive, although there was a weird 'resizing' message on another disk. After another reboot I had to reassign the last drive and rebuild it as well. Things seem to have settled now.

 

Controller is an Intel SASUC8I, which is LSI 1068e based. Disks 9 & 11 were configured for a spindown time 15 mins, other drives were default = 45 mins; disk9 == /dev/sdj.

syslog-20111126-140811.txt.zip

Link to comment

No surprise here. Parity is 22-25MB/s still. Same on Beta13. 70MB/s on Beta12a.

 

Posted a more detailed post about it in the Beta13 thread, and received no feedback. Not a very happy camper, back to 12a it is...

 

From the reports so far, this seems to be the major issues remaining in 5.0:

 

- Disk spin up/down issues.  For those experiencing this issue, I'd like to ask that you run completely "stock", i.e., no add-ons, no plugins, etc.  If still issues with this, again please report and include the system log.  One of the logs I looked at seems to show a spin-up issue when trying to access a spun-down drive via AFP (first guess is the afpd timeout is too shore - I will look into this).

 

- NFS issues.  Well NFS/fuse has historically been "problematic".  I don't have a Tvix so specific troubleshooting may be difficult. [Probably I should get one of these: a) how does it compare to a Popcorn Hour, and b) which one should I get?].  Peter, please send me an email and we can try to troubleshoot that way instead of via forum: [email protected]

 

 

Not quite. http://lime-technology.com/forum/index.php?topic=16125.msg149318#msg149318

 

This is still happening, and makes my parity checks take over a day on 3TB, compared to ~8 hours. I don't see how this could possibly be related to hardware or my system, as it works fine on 12a and any earlier beta. I even tried a completely fresh install of b14, with no addons/mods and it's the same thing.

syslog.txt

Link to comment

Got a write error on a spun down disk ("Device not ready"). There was no special load on the machine at that time.

After that error things got confused, with all disks labelled wrong etc..

Controller is an Intel SASUC8I, which is LSI 1068e based. Disks 9 & 11 were configured for a spindown time 15 mins, other drives were default = 45 mins; disk9 == /dev/sdj.

 

You should downgrade to beta12a/beta12 or earlier since Beta13 was the start of the nasty LSI Controller issues. Unfortunately it seems Beta14 has the same nasty LSI Controller issues.

 

From your syslog you had at least 3 more drives go off the deep-end after the first false failed-disk error. Pretty much all drives on the controller gave up the ghost after the first issue. As you posted, everything was fine after a reboot.

 

Nov 26 12:02:49 Tower kernel: mdcmd (52): spindown 9

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj] Device not ready

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  Sense Key : 0x2 [current]

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  ASC=0x4 ASCQ=0x2

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj] CDB: cdb[0]=0x28: 28 00 78 5e f6 38 00 00 08 00

Nov 26 12:28:34 Tower kernel: end_request: I/O error, dev sdj, sector 2019489336

Nov 26 12:28:34 Tower kernel: md: disk9 read error

Nov 26 12:28:34 Tower kernel: handle_stripe read error: 2019489272/9, count: 1

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj] Device not ready

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  Sense Key : 0x2 [current]

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj]  ASC=0x4 ASCQ=0x2

Nov 26 12:28:34 Tower kernel: sd 8:0:4:0: [sdj] CDB: cdb[0]=0x2a: 2a 00 78 5e f6 38 00 00 08 00

Nov 26 12:28:34 Tower kernel: end_request: I/O error, dev sdj, sector 2019489336

Nov 26 12:28:34 Tower kernel: md: disk9 write error

Nov 26 12:28:34 Tower kernel: handle_stripe write error: 2019489272/9, count: 1

...

Nov 26 13:56:11 Tower kernel: sd 8:0:10:0: [sdp] Device not ready

Nov 26 13:56:11 Tower kernel: sd 8:0:10:0: [sdp]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 13:56:11 Tower kernel: sd 8:0:10:0: [sdp]  Sense Key : 0x2 [current]

Nov 26 13:56:11 Tower kernel: sd 8:0:10:0: [sdp]  ASC=0x4 ASCQ=0x2

Nov 26 13:56:11 Tower kernel: sd 8:0:10:0: [sdp] CDB: cdb[0]=0x28: 28 00 00 00 97 47 00 00 08 00

Nov 26 13:56:11 Tower kernel: end_request: I/O error, dev sdp, sector 38727

...

Nov 26 13:56:15 Tower kernel: sd 8:0:11:0: [sdq] Device not ready

Nov 26 13:56:15 Tower kernel: sd 8:0:11:0: [sdq]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 13:56:15 Tower kernel: sd 8:0:11:0: [sdq]  Sense Key : 0x2 [current]

Nov 26 13:56:15 Tower kernel: sd 8:0:11:0: [sdq]  ASC=0x4 ASCQ=0x2

Nov 26 13:56:15 Tower kernel: sd 8:0:11:0: [sdq] CDB: cdb[0]=0x28: 28 00 00 00 97 47 00 00 08 00

Nov 26 13:56:15 Tower kernel: end_request: I/O error, dev sdq, sector 38727

Nov 26 13:56:15 Tower kernel: md: disk0 read error

Nov 26 13:56:15 Tower kernel: handle_stripe read error: 38664/0, count: 1

Nov 26 13:56:15 Tower kernel: md: disk2 read error

Nov 26 13:56:15 Tower kernel: handle_stripe read error: 38664/2, count: 1

...

Nov 26 13:56:16 Tower kernel: sd 8:0:0:0: [sdf] Device not ready

Nov 26 13:56:16 Tower kernel: sd 8:0:0:0: [sdf]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 13:56:16 Tower kernel: sd 8:0:0:0: [sdf]  Sense Key : 0x2 [current]

Nov 26 13:56:16 Tower kernel: sd 8:0:0:0: [sdf]  ASC=0x4 ASCQ=0x2

Nov 26 13:56:16 Tower kernel: sd 8:0:0:0: [sdf] CDB: cdb[0]=0x28: 28 00 00 00 1a 57 00 00 10 00

...

Nov 26 13:56:16 Tower kernel: sd 8:0:8:0: [sdn] Device not ready

Nov 26 13:56:16 Tower kernel: sd 8:0:8:0: [sdn]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 13:56:16 Tower kernel: sd 8:0:8:0: [sdn]  Sense Key : 0x2 [current]

Nov 26 13:56:16 Tower kernel: sd 8:0:8:0: [sdn]  ASC=0x4 ASCQ=0x2

Nov 26 13:56:16 Tower kernel: sd 8:0:8:0: [sdn] CDB: cdb[0]=0x28: 28 00 00 00 b4 2f 00 00 08 00

Nov 26 13:56:16 Tower kernel: end_request: I/O error, dev sdn, sector 46127

...

Nov 26 13:56:16 Tower kernel: sd 8:0:7:0: [sdm] Device not ready

Nov 26 13:56:16 Tower kernel: sd 8:0:7:0: [sdm]  Result: hostbyte=0x00 driverbyte=0x08

Nov 26 13:56:16 Tower kernel: sd 8:0:7:0: [sdm]  Sense Key : 0x2 [current]

Nov 26 13:56:16 Tower kernel: sd 8:0:7:0: [sdm]  handle_stripe read error: 46088/0, count: 1

Nov 26 13:56:16 Tower kernel: md: disk3 read error

Nov 26 13:56:16 Tower kernel: ASC=0x4 ASCQ=0x2handle_stripe read error: 46088/3, count: 1

Link to comment

 

I have not seen it yet on b14, but it happened 3 times in 4 days with b13, and the only way to get the drive to not show resizing was to reboot the server. How is that normal?

 

although there was a weird 'resizing' message on another disk

 

Now that you mention it, I have seen that too.

 

The 'resizing' messages and state is completely normal.

Link to comment

 

  I understand that. normally when starting the server, and bringing the drives on line they will either show resizing, or mounting. What I was referring to was one or more drives would show resizing when the web browser would be refreshed, making that/those drives unavailable until a reboot. (What limetech stated was it is normal to show resizing! I just pointed out my experience, and that is not normal, if so, something is wrong. But that was b13, not b14, so it doesn't belong here.)

 

unRAID has shown 'resizing' when they mount upon starting the array. It should disappear once completely started.

Link to comment

You should downgrade to beta12a/beta12 or earlier since Beta13 was the start of the nasty LSI Controller issues.

 

Good advice. I can confirm that beta13 and beta14 exhibit the "not ready" problem on my system whereas beta12 and 12a are fine. So it seems that the Linux LSI driver no longer waits for a drive to spin up?

Link to comment

If you are getting "resizing" on a disk, make sure that it's at least 20% full. I had this issue for awhile, started on beta11 or so. My friend who has unRAID also has the issue. It only happens on empty or near empty drives. I've tested and confirmed it on both systems. After we transferred data to the drives, the issue disappeared on both systems. I gave a detailed post about it in one of the threads, and would consider this a critical bug.

 

I've given up on reporting bugs because the only thing that ever happens is "that's normal", "user error", "it's your system", or no response whatsoever. My report was never acknowledged by anyone. I believe everyone suffers from this issue, the problem is not many people stop/start their array and/or have near empty drives in their system.. so it never gets noticed. It only happens for me when I start the array with <20% full drives, and even then it's only about a 20% chance. I had 2 PMs since I posted the "fix" for this resizing issue, and both users told me it also fixed it for them - so I would consider it widespread The bad part is if the resizing bug happens, it freezes my system completely if I attempt to stop the array.

 

I've also reported another issue 3 times so far, and everytime I get no feedback:

http://lime-technology.com/forum/index.php?topic=16840.msg153987#msg153987

 

I really hope when 5.0 releases that i'm not forced to remain on 12a because of 8 hour vs 28 hour parity syncs.

Link to comment

If you are getting "resizing" on a disk, make sure that it's at least 20% full. I had this issue for awhile, started on beta11 or so. My friend who has unRAID also has the issue. It only happens on empty or near empty drives. I've tested and confirmed it on both systems. After we transferred data to the drives, the issue disappeared on both systems. I gave a detailed post about it in one of the threads, and would consider this a critical bug.

 

I've given up on reporting bugs because the only thing that ever happens is "that's normal", "user error", "it's your system", or no response whatsoever. My report was never acknowledged by anyone. I believe everyone suffers from this issue, the problem is not many people stop/start their array and/or have near empty drives in their system.. so it never gets noticed. It only happens for me when I start the array with <20% full drives, and even then it's only about a 20% chance. I had 2 PMs since I posted the "fix" for this resizing issue, and both users told me it also fixed it for them - so I would consider it widespread The bad part is if the resizing bug happens, it freezes my system completely if I attempt to stop the array.

 

If you don't mind, please post link to post describing this problem [i'm out of town until late tomorrow and have limited time to monitor and search through the forum.]  What the code does when mounting a disk is, upon successful 'mount', immediately execute a 'resize' operation on the disk (via a remount operation), to handle the (rare) case where a new larger drive has been plugged in replacing an existing drive - this serves to expand the file system.  If it's the same drive, then 'resize' doesn't do anything.  This is just a simplification in the code but lately the 'resizing' operation seems to take longer than in the past, so I'll rework the code to make this a bit smarter.

 

I've also reported another issue 3 times so far, and everytime I get no feedback:

http://lime-technology.com/forum/index.php?topic=16840.msg153987#msg153987

 

I really hope when 5.0 releases that i'm not forced to remain on 12a because of 8 hour vs 28 hour parity syncs.

 

One thing to try here, since you have a large array (21 drives), is go to the 'Settings/Disk Settings' page and increase the 'md_sync_window' tunable, perhaps double it (but don't let it get higher than 'md_num_stripes' - perhaps double that one too).  Then see if this helps your parity sync rate.  Why would this help in -beta14 and seems to run differently in -beta12a?  Good question, but -beta13/14 include a kernel upgrade, and it's possible something changed in the i/o subsystem that would account for this if I was able to find it.  If it doesn't do anything, then probably it's a driver change causing this.

 

So this is something a bit maddening about linux: if you want latest drivers you have to upgrade the entire kernel.  But of course you also get all the other kernel changes as well.  The problem is that I would guess that not too many kernel developers have huge disk arrays to test with and all this I/O is going to stress the kernel quite a bit, perhaps revealing flaws/bugs in I/O subsystem changes.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.