• 6.6.1 Crashes


    rjstott
    • Retest Minor

    In the last 48 hours 6.6.1 has crashed twice during periods of no user activity. I don't have a Ryzen (I3 3240) and I don't run VMs, just Plex Docker and a couple of other minor dockers. It seems I'm not alone and I wonder why the release hasn't been pulled and why testing wasn't better. This is a mature 'paid for' product that deserves improved support and shouln't be putting out products that fail.

     

    I've reverted to 6.5.3 so I hope that's the end for now. It might be a good idea to flag releases that have recorded major bugs so that your customers are better informed!




    User Feedback

    Recommended Comments



    What do you mean by "crashed?"  That can describe lots of situations. What happens on your system when it crashes? Can you provide any screenshots with useful information?

     

    If you want Limetech (or anyone else) to be able to look at the cause, you should include full diagnostics if it is possible to get them when the problem occurs.  If the GUI locks up and you can't get diagnostics that way, can you get to the CLI and get them there?

     

    With zero useful information, what you have submitted is an undocumented complaint rather than a useful bug report that would enable the problem to be diagnosed and duplicated so it can be corrected.

     

    Two common "crashing" problems with 6.6.x are related to NFS shares and Realtek NICs.  Both of these are due to recently-introduced Linux kernel bugs.  Do you have either in your system?

    Edited by Hoopster
    Link to comment

    FWIW we run latest software on all our servers.  I'm writing this from a Windows 10 VM running on Unraid OS host running upcoming 6.6.2.  Using Supermicro X10SRM-F with Xeon E5-1650 v4.  I have a triple monitor setup and routinely back up to an Atom-based Unraid server.  Never a single crash like are being described in these bug reports.  Can't even get NFS to fail.  I will admit we avoid Realtek.  Eric has a Threadripper, again, no crashes.

     

    Of course we try and put out a quality product that also keeps up-to-date with bleeding edge of almost all components such as the kernel.  This is necessary to continue to support the latest h/w.

     

    Sorry your server is crashing for some unknown reason, but there are many many servers out there that are not crashing.

     

    In these situations where you think it might crash and you can't get meaningful diags, you can have a log window open which might capture the last bits of messages before crash.  You can also open a telnet/ssh session and be tailing the syslog:

     

    tail -f /var/log/syslog

    • Like 1
    • Upvote 1
    Link to comment

    I hear what you both say and I understand that we are dealing with a complex product.

     

    I do not see a warning in the 6.6.1 announcement that there are any problems to be aware of and yes I do have a Realtek 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 09). If I had seen that warning I would not have upgraded.

     

    I worry when we quote 'bleeding edge' because your product is too mature to consider such risks IMHO. You have a large established user base that pays money for quality AND maturity.

     

    So please instigate a warning and remove it when the issue is resolved.

     

    Finally crash is crash for me non-responsive, dead and useless. If I were running a mission critical system I would not upgrade without my own testing but I am not, so I try to stay a bit behind 'bleeding edge'. However, without clear warnings of known problems I am blind to the consequences. 6.6.1 is not a major release? So it should not have major isssues.

    Link to comment
    13 hours ago, rjstott said:

    So it should not have major isssues.

    I told Apple and Microsoft the same, but they didn't listen 😉

    Link to comment

    I did some thinking about bonienl' comment and Limetech isn't Apple or Microsoft and I do hope they listen.

     

    We all know that the history of Unraid is many user built servers, all different and some vintage examples. Perhaps it is time to define some Limetech recommended builds that would be tested and supported. A deal could be done with either a manufacturer or supplier to offer packaged deals (built or not) and such builds would be tried and tested. The product itself could also have two distinct streams Docker or Storage. Personally I wouldn't compromise a data server with Dockers and I have removed most of mine to the Raspberry world and there I try to keep apps to their own Pi or quarantine dodgy ones. In an ideal environment perhaps a data server without Docker and a Docker server which also acts as a data backup could be a way to go?

     

    I have two data servers one for backup and the majority of my storage is Media and not critical as there are copies elsewhere. I would certainly upgrade one of my servers for a supported build.

     

    Perhaps Limetech would consider asking their customers for input which could define how many builds are necessary  Perhaps alignment with the existing licencing is enough?

    Link to comment
    2 hours ago, rjstott said:

    I would certainly upgrade one of my servers for a supported build.

    While having a current list of known working hardware combinations would be ideal, calling out anything not on that list as totally unsupported would be suicidal for the company.

     

    What you are proposing sounds suspiciously like apple, where they control everything from hardware and software which allows them to focus their development to the detriment of universal compatibility. Apple is big enough to do that, limetech isn't.

     

    As long as limetech uses open source software as their base, compatibility will to a large extent be defined by the whims of that open source community, and the willingness of hardware manufacturers to contribute to said community. Realtek doesn't seem to want to bother with open source, the vast majority of their market share is microsoft driven, so that's where they spend their efforts.

     

    Trust me, if limetech knew before they released it that this version would have these issues, they would have done things differently. They don't have the resources to test on thousands of different hardware combinations, so they release beta (RC) software publicly to attempt to catch the worst offenders. I don't recall a huge outcry with the beta, so they went forward with the release. In the future, I'll bet this current experience will cause a longer delay between RC and release. Certain vocal minorities get crabby when releases don't happen quickly, so the pressure is on to accelerate things.

     

    Coming full circle, my opinion is that @limetech should keep a current page on the website where they list the EXACT components in each of their development rigs. Not as a "recommended hardware" page, but on the "about us" where they list the team members. Each development rig could be another "team member".

    Link to comment

    I specifically wasn't proposing that anything not on the list would not be supported. What I would like like to see are builds that can be tried and tested and thereby (hopefully) offering customers a better chance of surviving an upgrade. IMHO it is currently a lottery as to whether an upgrade will go smoothly or not and despite the RC system there is no record (there being too many variations) of what does and doesn't work. Or any records of what might have been actually tested!

     

    I would not wish to prevent users from deploying what they will (I run a Hackintosh which I rarely upgrade and an iMac that is on Mojave) but I believe there are benefits from tested, optimised solutions.

     

    Perhaps if there was more information about the pros and cons of the new releases we might be better informed? More imprtantly, there is (as far as I can see) still no warning in the update announcement. I can not forgive people who won't admit an error (the error being a failing to post a warning!!!)

    Link to comment

    Often it is not hardware only related issues, by far most issues arise due to how people have (mis)configured their system or use incompatible plugins or packages. These only become apparent when upgrading to a newer Unraid version. Limetech tests with a 'basic' installation, they can not and will not test any 3rd party plugins or packages nor any 'custom' configuration set ups done by end users.

    I know for sure that *IF* the current issues were seen during internal testing or during RC testing, a stable version isn't released until these issues are resolved.

    Link to comment

    So by inference, if I had a standard hardware build that was just a server I would not have had a problem. Additionally, *IF*, once these issues were flagged there had been a warning and advice then I would not have had a problem either. I am not at issue with the fault itself, just the methods used to mitigate what happens once a problem is acknowledged as well as making suggestions as to how users might be better served (no pun intended).

     

    As I have said I believe a file server is such a key functionality that the advice should be not to compromise it with dubious or indeed any addons and I would remove my only remaining Plex Docker if I did not think it at least was ubiquitous enough to deserve some limited support.

    Edited by rjstott
    Link to comment

    Agreed, some advisory notes can be added to the release notes to make it more visible. Keep in mind though, issues are with a specific setup (NFS) or specific hardware (Realtek). A large number of users are unaffected.

     

    Honestly spoken, even with a dozen 'alarm' bells, people don't read release notes and upgrade no matter what, just to complain afterwards. 😥

    Link to comment
    1 hour ago, bonienl said:

    Honestly spoken, even with a dozen 'alarm' bells, people don't read release notes and upgrade no matter what, just to complain afterwards.

    Agreed and such is life but where important systems are concerned neither the supplier nor the client can afford to ignore valuable information. Does this mean we will see warnings now and in future 'cos I'd like to see a positive outcome?

    Link to comment

    So I see that a new relaease has *hit the fan and there are *NO* warnings about known problems. Sorry guys I just don't get this mentality of leading users onto broken systems. It just looks like a 'Microsoft' trick? Am I the only one that has been burned? If there are other people that agree with my view could I get some support and only then things might get better!

    Link to comment
    1 hour ago, rjstott said:

    So I see that a new relaease has *hit the fan and there are *NO* warnings about known problems. Sorry guys I just don't get this mentality of leading users onto broken systems. It just looks like a 'Microsoft' trick? Am I the only one that has been burned? If there are other people that agree with my view could I get some support and only then things might get better!

    There were several Realtek patches in the linux kernel for 6.6.2 vs 6.6.1, does your Realtek NIC still not work?

    Link to comment
    1 hour ago, rjstott said:

    So I see that a new relaease has *hit the fan and there are *NO* warnings about known problems.

    Have you seen the bug list?  There are plenty of "known issues" for which underlying cause and possible fix are "unknown".

    Link to comment

    So yet again you justify releasing a product full of 'known issues' with no known fix. Where is the list of fixes from the previous release? Where is the list of known issues except in a long list of other 'bug reports'. Does 'Solved' mean fixed in 6.6.2? Why isn't there a category 'acknowledged'?

     

    I just don't get it. How would I have known about the several Realtek fixes? Are they expected to to fix the problem that was a major issue? Have they fixed it? Only when the answers are forthcoming would you expect users to enter the realm of try it, as only with the right information would I risk another failure. Indeed what should I test to prove the fix works for me?

    Edited by rjstott
    Link to comment
    21 minutes ago, rjstott said:

    Indeed what should I test to prove the fix works for me?

    Install the latest release and see if the kernel crash occurs again.

    Link to comment
    4 hours ago, rjstott said:

    How would I have known about the several Realtek fixes?

    We did mark this bug as "Retest" as soon as 6.6.2 was published, which means "Please retest in latest release."

    Link to comment

    @rjstott still around?  Has latest release fixed your Realtek NIC issues?

     

    You make a good point about not having complete documentation re: known issues, bugs fixes, workarounds, etc.  However we are like 1/10000 the size of Microsoft so maybe can cut us some slack?  Besides, how many Microsoft engineers, who directly work on the actual software, are you able to communicate with directly?  I bet we outnumber that count.

    Link to comment

    I only brought up Microsoft because of their deceptive practises. In all respects I would hope you strive to be better. I still think you have two products that should separate as that would reduce the update frequency but they could both be built on the same foundation. IMHO Fileservers should be sound and stable (as should a Docker Host) but the world of Dockers is rapidly changing and new developments such as Portainer and the like are creating a different support and management world. As well as an eager User community.

     

    OK you 'browbeat' me into trying 6.6.2 and all I can say is 'alls well' after 1 day 13 hours. But that is all I can tell you. And if rolling back hadn't resurrected a problem for me with Plex I would have left things alone.

    Link to comment

    I'll bear that in mind but without any evidence that this would fix something that a roll-back wouldn't do I prefer the rollback?

     

    I have these errors in my log:

    Oct 20 19:26:39 Media nginx: 2018/10/20 19:26:39 [error] 13954#13954: *10747 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.11, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.103", referrer: "http://192.168.1.103/Dashboard"
    Oct 20 20:22:44 Media nginx: 2018/10/20 20:22:44 [error] 13954#13954: *20244 readv() failed (104: Connection reset by peer) while reading upstream, client: 192.168.1.11, server: , request: "POST /webGui/include/Notify.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.103", referrer: "http://192.168.1.103/Dashboard"

     

    192.168.1.11 is my iMac usually displaying the unRaid Dashbooard in Firefox whilst 103 is the server.  Any ideas?

     

    As for now I'll leave the server on 6.6.2 unless crashes become more frequent. I could with some effort move my drives across to a spare system with a completely different motherboard. But that is a can of new worms too!

    Link to comment

    Second crash as before. Diagnostics attached. I have reverted to 6.5.3 meantime to check that this is still functioning reliably. I have a network card somewhere that may not be Realtek and I will put this into the server. How would I disable the Realtek card or do I just not connect ethernet to it?

    media-diagnostics-20181023-1036.zip

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.