You guys want a -rc16b?



Recommended Posts

  • Replies 50
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Ideally V5 stable will just be a renamed RC with zero changes.

 

That's exactly what it should be.  I DO think Tom's getting very close.  Not ALL minor issues need to be resolved ... that's what "Release Notes" are for  :)    But clearly there should be NO known data loss issues (such as in RC15a) or other significant issues.    I'd think an RC17 or possibly RC18 will meet the ready-to-release criteria and could re re-badged as v5.0.

 

My personal standard is that it has to be as solid as v4.7 has been.    [The reality is that if 4.7 supported > 2TB drives a LOT of folks would never bother to change.]

 

Link to comment

Hot on the trail of "transport endpoint not connected" issue, but very difficult to debug.

 

Can you briefly share what you have seen/detected with the "transport endpoint not connected", thanks.

The error message is just a symptom that the 'shfs' process has terminated.  I have to have plex running a scan and then run a 'find /mnt/user' in a couple windows and once in a while it will fail.  But there is nothing output - no syslog messages, no error messages, nothing.  Latest run had 'valgrind' watching the process and it has made it through a couple plex scans without failing, and when I terminate the process valgrind shows no memory leaks whatsoever.  Another user gets it to fail running rsync, so that's next...

 

 

This almost sounds like a malloc failure rather then a leak. In the past I've experienced issues with a massive rsync across a huge filesystem would cause OOM errors.  Perhaps you need a filesystem with tons and tons of files. Also consider if the end user is running the dir_cache tool, which continually traverses the filesystems with a find.

Link to comment

Like most I voted for 16b.

 

However I think there should be consideration given to staying with the reiserfs patch, even for 5.0 final. The patch has been very well tested. Changing to the official fix would require rigorous testing and, who knows, might not even work. So for 5.0 I would suggest staying with what has been tested and works.

Link to comment

Perhaps you need a filesystem with tons and tons of files.

 

+1, that is a must. It is not Support for 24 empty drives, so I hope he has at least one 24 drive test bed loaded to the gills when performing all tests.

 

Also consider if the end user is running the dir_cache tool, which continually traverses the filesystems with a find.

 

I have never run cache_dirs script before, but started thinking this maybe why some see certain behaviours and others do not, so starting some tests with and without cache_dirs. This will also be different for those with large versus small array's i'm sure.

Link to comment

where could one find RC16a or 16b to try....?

 

I understand that it is only available directly from Tom.  You have to request it via e-mail.  (I am not sure he is even sending it at this point.) 

 

 

Edit:  This is one of the reasons that I think that -rc16b needs to be released...

Link to comment

I've stuck with official releases, so I'm running 4.7 and I'm out of space. I've been waiting for 5 so that I can bring several 3 and 4 TB drives online, the wait has been miserable. At this point I need 5.0 but if the RC is really so close then maybe I just move forward with that.

 

It seems like there will always be one more issue, seemingly the past several release candidates have all promised to be the last. It's been very frustrating from a user's perspective (at least this user, but based on the results of the prior poll many others probably agree).

 

I've got a 4TB drive going through preclear that will finish in a few hours for my new parity drive. I guess if you decide to officially release 5, I'll install that. Otherwise I'll just install whatever the latest RC is.

 

Having always run v5 on my server (close to two years now) there is absolutely no reason why you should prevent yourself using 5rc16b.

 

Your causing yourself problems/misery by not upgrading.

 

The version of the product and if it has rc or final has no bearing on the capability of the software.

 

Always confuses me that people are willing to use code named 'Final' but not '5rc16b' even if it is the same code!!

Link to comment

Having always run v5 on my server (close to two years now) there is absolutely no reason why you should prevent yourself using 5rc16b.

 

Your causing yourself problems/misery by not upgrading.

 

That could well be. I probably will install 5RC16b if it is released. My new 4TB parity drive is ready to go.

 

The version of the product and if it has rc or final has no bearing on the capability of the software.

 

Always confuses me that people are willing to use code named 'Final' but not '5rc16b' even if it is the same code!!

 

Hmmm, really? The first RC was released over 16 months ago! There's been 16 numbered RCs, more if you include A's and B's. RC or Final may have no bearing on the "capability" of the software, but it does on the stability of the software, and that is what I am most concerned about. Particularly when I have 20TB in my array and for a very large portion of that my only backup is going back and re-ripping the DVDs/BRs. What if I had decided to make the switch at 5RC15a when a dataloss issue was introduced?

Link to comment

...for a very large portion of that my only backup is going back and re-ripping the DVDs/BRs.

 

Not a good backup plan.    With the low cost of drives these days, there's no reason not to have an offline copy of all of your static data.  I'd certainly think $30-40/TB is well worth the cost for a good backup that would save MANY hours of re-constructing the data !!  My backup disks cost me a LOT more than that, but I'm still very glad I have everything backed up.

 

Link to comment

...for a very large portion of that my only backup is going back and re-ripping the DVDs/BRs.

 

Not a good backup plan.    With the low cost of drives these days, there's no reason not to have an offline copy of all of your static data.  I'd certainly think $30-40/TB is well worth the cost for a good backup that would save MANY hours of re-constructing the data !!  My backup disks cost me a LOT more than that, but I'm still very glad I have everything backed up.

 

All of my essential data is backed up in multiple locations including offsite. Having my DVDs and BRs ripped (particularly older ones we've already watched) is only a nice convenience. One that I've paid for in purchasing drives for unRAID and spending the time ripping them and managing them.

 

Spending thousands more (I understand that today I could get five 4TB drives to back it all up with [still $1000], but on offline solution would have evolved with my unRAID over the years starting with 500GB drives so it would have a similar makeup as the unRAID meaning $1000s in drives) plus MANY hours managing the offline backup - and well I'm comfortable with my choice. Short of a catastrophic failure like a fire that destroys the entire array, I'd probably only lose a small portion of my online collection and I'm willing to let that go. Even with a fully catastrophic failure, I'd probably just walk away from the setup - I've got two young kids now and different priorities than when I started.

 

So I understand your point of view, but I also understand the choice that I've made.

 

Edit: Not that that means I take it lightly, which is why I'm very careful with my array and I've stuck with 4.7 for so long.

Link to comment

Some points to consider.

 

If you already have the physical media, it may not be worth the cost of extra backup drives since you can always go to the backup media.

 

If you have children, Then you want most certainly want to put the most used data on a hard drive. How your family handles missing files is up to you.

 

If you are going to backup your hard drives for which you already have physical media and it's going to be in the same location, it's a waste.  A disaster strike is going to trash it all.

 

From my point of view the only time the expenditure is worth making duplicate hard drives is for

1. irreplacable data that you save off site. (if you have that ability).

2. A running archive of specific replacable or irreplacable data that you want close by.

2. replacable data that you save off site for which you do not have the physical media. (borrows)

3. replacable data that you save locally, cannot/do not want to be without and/or cannot/do not want to spend the time to rebuild.

 

If I had physical media, i wouldn't duplicate those movies to another hard drive, unless I had a set of children/wife's favorites and they were going to be placed offsite, or even duplicated offsite.

Link to comment

Edit: Not that that means I take it lightly, which is why I'm very careful with my array and I've stuck with 4.7 for so long.

 

I totally understand this. I did not throw my hat in the ring until RC12.

I was all ready to upgrade to RC15 until the "potential" data loss situation was discovered.

I lost all confidence after that. I'll never upgrade blindly ever now. I'll have to test the shit out of it even if it takes weeks.

While I could automate backups to another unRAID server, I could also be backing up bad files.

I'll probably spend the summer building some kind of checksum catalog now.

 

Link to comment

Hot on the trail of "transport endpoint not connected" issue, but very difficult to debug.

 

Can you briefly share what you have seen/detected with the "transport endpoint not connected", thanks.

The error message is just a symptom that the 'shfs' process has terminated.  I have to have plex running a scan and then run a 'find /mnt/user' in a couple windows and once in a while it will fail.  But there is nothing output - no syslog messages, no error messages, nothing.  Latest run had 'valgrind' watching the process and it has made it through a couple plex scans without failing, and when I terminate the process valgrind shows no memory leaks whatsoever.  Another user gets it to fail running rsync, so that's next...

 

 

This almost sounds like a malloc failure rather then a leak. In the past I've experienced issues with a massive rsync across a huge filesystem would cause OOM errors.  Perhaps you need a filesystem with tons and tons of files. Also consider if the end user is running the dir_cache tool, which continually traverses the filesystems with a find.

 

Correct. I run into this issue consistently and have sent many emails to Tom in the past regarding this problem. Currently an unRAID server is sitting in the corner collecting dust, would love to see it up and running again for its intended purpose (nearline archive).

Link to comment

Edit: Not that that means I take it lightly, which is why I'm very careful with my array and I've stuck with 4.7 for so long.

 

I totally understand this. I did not throw my hat in the ring until RC12.

I was all ready to upgrade to RC15 until the "potential" data loss situation was discovered.

I lost all confidence after that. I'll never upgrade blindly ever now. I'll have to test the shit out of it even if it takes weeks.

While I could automate backups to another unRAID server, I could also be backing up bad files.

I'll probably spend the summer building some kind of checksum catalog now.

Well this was a pretty extraordinary case.  The 'stock' 3.9.x kernel (actually kernels after 3.5) does not lose data as long as a reset is not done with active disk i/o in process.  The problem was the 'continuous write', which was actually just re-writing the superblock and couple journal blocks with the same data.  Probably everyone outside the unRaid community who still uses reiserfs didn't notice this because all the disks are always spun up.  It was just the "quick patch" created by the maintainer that caused this new problem.

Link to comment

[

 

Always confuses me that people are willing to use code named 'Final' but not '5rc16b' even if it is the same code!!

 

Find this humorous as I have wondered the same thing.  I have been through about 5 - 5RCX upgrades and that is nothing compared to many of you.  I have not had a single significant problem, only perhaps one version seemingly writing or running parity check faster than the other. 

 

It works, I am happy, the rest is semantics 16a/b/17.  I do believe good business practice is to solve major issues, and  clear release a final with release notes for any minor issues.  As most have said, in a perfect world there is a fully operational beta that transitions unchanged to the final, but that doesn't mean it's not acceptable to have release notes for the final for minor issues. 

 

I think there would be better sales, if you had an official V5 stable release, because the average person will get hung up on this title.  Personally I would like to see 2 drive parity sooner rather than later (after V5 stable).  I would buy another parity drive tomorrow if we had this.  Guessing a parity check will run slower when this arrives though.

Link to comment
Always confuses me that people are willing to use code named 'Final' but not '5rc16b' even if it is the same code!!

 

I'm puzzled that people will stick with 4.7, because it has the tag 'final', when it has at least three known faults which can be prejudicial to the security of your data.  I moved from 4.6 (I don't recall whether I ever loaded 4.7) to 5.0b4 on my one and only live system.

 

I suspect that there will be some issues reported when 5.0 does go final just because users moving from v4 to v5 will exercise some new usage patterns.

Link to comment

Every system has faults. It's

Do you know them?

Do you know how to manage them?

Can you live with them?

 

We've all seen and learned how tough it is to manage them. Thank you Tom, for including us in the process.

As they are discovered we can expand our test suites to satisfy our confidence level.

 

I don't fault anyone for waiting to upgrade a large production array. There's been at least 16 releases to work through various issues. It's a moving target.

 

It shows how hard Tom works, and how intricate the process is.

Link to comment

Every system has faults. It's

Do you know them?

Do you know how to manage them?

Can you live with them?

 

We've all seen and learned how tough it is to manage them. Thank you Tom, for including us in the process.

As they are discovered we can expand our test suites to satisfy our confidence level.

 

I don't fault anyone for waiting to upgrade a large production array. There's been at least 16 releases to work through various issues. It's a moving target.

 

It shows how hard Tom works, and how intricate the process is.

 

Agree.

 

It's true that v4.7 does have 3 known faults, but they're pretty rare and require events that aren't very likely to cause significant data loss. 

 

FWIW, the known 4.7 faults are:  Writing to an array during a disk rebuild can result in a lost write;  a failed USB flash drive where you lose the super.dat file and don't have a backup;  and an obscure file renaming bug that could cause the loss of that one file.      If you simply don't write to the array during disk rebuilds and keep a backup of your flash drive you can easily avoid the first two.    The latter would require using a copy utility that doesn't create multiple copies of temp files on physically different disks.

 

If v5.0 is equally reliable and doesn't have any faults worse than those (the 3rd one is the worst ... but was fixed quite a while ago in one of the v5 Betas, so I'm sure it's not in v5),  I think we'll all be very happy  :)

Link to comment
Guest
This topic is now closed to further replies.