• unRAID 6.6.5 Total system lockup unresponsive from console, ssh, network, parity check stops, no disk activity at all, no VM working, nothing functioning what so ever.


    Rudder2
    • Solved Urgent

    Total system lockup unresponsive from console, ssh, network, parity check stops, no disk activity at all, no VM working, nothing functioning what so ever.  I suffer this lock up when ever I install 6.6.0+.  Though it was because of the RealTek NIC problem so downgraded to 6.5.3 and then updated to 6.6.3 and had worse network issues so down graded to 6.5.3.  Then Upgraded to 6.6.5 and the RealTek NIC error has been rectified but the Lockup problem remains. 

     

    The system will run for about 5 or 6 days lock up then run 6 hours lock up and run 3 hours lock up then 30 minutes and lock up.  I can't get system logs because they are not stored on flash.  I already downgraded to the stable 6.5.3 image.  My system log on first boot after the 4 lock up in one day is attached.  The first was after the first lock up and the last after the 4th.  Don't know if they will help because they were taken after a reset button press to get the system back up.

     

    BIOS was updated to the latest BIOS for my motherboard after the second lockup because I saw a hardware error about microcode in the log file 1923.  That same error is in log file 2357.

    rudder2-server-diagnostics-20181114-1923.zip

    rudder2-server-diagnostics-20181114-2357.zip




    User Feedback

    Recommended Comments



    The OP has his system running stable when no VMs are used (see his statement above).

    When everything freezes including console, it usually indicates a hardware issue.

    Have you tried to run your system in safe mode and without Docker and VM services started?

    To further examine an issue we always need diagnostics.

     

    Link to comment
    50 minutes ago, bonienl said:

    The OP has his system running stable when no VMs are used (see his statement above).

    When everything freezes including console, it usually indicates a hardware issue.

    Have you tried to run your system in safe mode and without Docker and VM services started?

    To further examine an issue we always need diagnostics.

     

    Are you saying I might have a hardware problem if it only happens when my Windows 10 VM is running?  It doesn't happen on 6.5.0.  Or are you telling the last poster to do more diagnostics.

    Link to comment
    7 hours ago, Rudder2 said:

    Are you saying I might have a hardware problem if it only happens when my Windows 10 VM is running?  It doesn't happen on 6.5.0.  Or are you telling the last poster to do more diagnostics.

    My answer was to @cogliostro

    Link to comment

    @bonienl Sure Dockers are running, no VMs are active when this happends. It also looks like that it often happens when the array does a parity check and gets additional load from an Docker App. The Main problem is that i can't even safe a log bcz. of absolute unresponse. I added my diagnostics

    supermicro-diagnostics-20181214-1951.zip

    Edited by cogliostro
    Link to comment

    So is there any more ideas on how I might be able to figure out why my MicroShaft WinBlow$ 10 VM with Asus nVidia GTX 960 SC passthrew is locking up my entire unRAID system on 6.6.x unRAID upgrades?  This VM is VERY important to my education so any help in fixing this would be great.

    Link to comment

    Hello all.  Since there has been crickets here I assume no one has any idea what to try.  I disabled my Asus GTX 960 OC passthrew and the VM stops locking up the host. 

     

    I found a bug report on Launchpad relating to this very problem.  People are showing success with enabling MSI Interrupts in the Windows guest for all hardware that supports it fixing the problem.  Their testing shows that Linux by default uses MSI Interrupts and they felocify here is that M$ WinBlow$ not used them by default is causing conflicts.  I'm going to let my VM run a couple hours with out the passthrew to make sure it's stable then start trying the fixes on the Launchpad Bug Report for QEMU. 

     

    It's looking like this is a QEMU problem and not a problem with unRAID at all.  This problem is happening across Linux Flavors.  I'm starting to think that the QEMU update from 6.5.0 to 6.6.x is the real problem here.  I will keep y'all posted with my results.  It's so nice to be in-between classes and have the time to research and try things properly.

    Link to comment
    1 hour ago, Rudder2 said:

    It's so nice to be in-between classes and have the time to research and try things properly.

    Very much appreciated and you are to be commended for diligence 😎

    These kinds of problems are very difficult to track down because there are so many moving parts and we don't have your exact h/w config, which makes it more challenging.  Be assured, we are monitoring the reports, looking for additional clues provided by others who chime in.

     

    1 hour ago, Rudder2 said:

    I found a bug report on Launchpad relating to this very problem.

    Please post link.

    Link to comment

    I will leave this open and let my system run a week without doing anything with her because I've had my server run up to 6 days before I suffered a freeze but I think the problem is resolved.  It was settings in WinBlow$ causing the problem.  With my VM running gaming benchmarks for over an hour the Host, unRAID, didn't lockup! 

     

    The things I did in Diagnostics that I didn't reverse:

    1. Upgraded the Windows 10 VM from SeaBios to OVMF using this solution

    2. Upgraded the Machine from i1440fx-2.10 to i1440fx-3.0

    3. Change USB Controller from 3.0 (nec XHCI) to 3.0 (qemu xhci)

    4. Upgraded all the VirtIO Drivers to the latest downloadable from unRAID (virtio-win-0.1.160-1)

    The above didn't seam to fix the problem.

    5. Used the MSI_util_v2 found http://www.mediafire.com/file/2kkkvko7e75opce/MSI_util_v2.zip to turn on the MSI Interrupts for all hardware in the util's list.

     

    #5 is what fixed the lock up issue as far as I can tell.  I will know when I update you next week.

    Link to comment
    1 hour ago, limetech said:

    I found a bug report on Launchpad relating to this very problem.

    https://bugs.launchpad.net/qemu/+bug/1580459

     

    It's an old bug report so I don't know how I just had the problem...Maybe the problem was worsened when QEMU upgraded to 3.0 or a combination of the new kernel...Who knows but this is the bug report that ultimately led to me finding the problem.  Suggest adding to the Windows VM help page making sure they turn on MSI Interrupts.

     

     

    Link to comment

    As promised I am writing to tell you the out come.  unRAID no longer locks up.  Changing all Interrupts to MSI Interrupts has stopped the original problem I opened this thread about. 

     

    I've been researching, I havne't found a definitive answer yet, but I'm starting to think that between kernel update in 6.6.0 and/or the QEMU update that a reverse comparability between the old Interrupt system, which Micro$haft still uses, and the new MSI Interrupt system, which Linux adopted a long time ago in technology time, was removed.  The original complaint of this thread has been cured by enabling MSI Interrupts. 

     

    Thank you everyone for your help! 

    Link to comment

    ***WARNING WARNING WARNING!!!**  Do not install nVidia 2018-12-12 Driver...Problem comes back and cannot be removed by re-enabling MSI Interrupts!! I can't even get it back up by rolling back the driver. 

    Link to comment

    OK, so the once and for all fix for this problem was to create a new Windows 10 VM with all the steps I fallowed above.  I hate to have to reconfigure a whole new VM but I did.  Lesson learned here is back up your VDisk once you have everything installed and running properly so all you have to do is updates instead of total reconfigure everything.

    Link to comment
    On 12/31/2018 at 7:50 PM, Rudder2 said:

    OK, so the once and for all fix for this problem was to create a new Windows 10 VM with all the steps I fallowed above.  I hate to have to reconfigure a whole new VM but I did.  Lesson learned here is back up your VDisk once you have everything installed and running properly so all you have to do is updates instead of total reconfigure everything.

    Have you yet tried the newest driver version ?

    Version: 417.71  WHQL

    Freigabedatum: 2019.1.15

    Betriebssystem: Windows 10 64-bit

    Sprache: Deutsch

    Dateigröße: 545.03 MB

     

    because I had the same issue like you and now running currently an older

    NVIDIA Driver

    [411.63]

    517.11 MB

    2018/09/20 for my GeForce RTX™ 2080 GAMING OC 8G.

    Link to comment
    1 hour ago, Technikte said:

    Have you yet tried the newest driver version ?

    Version: 417.71  WHQL

    Freigabedatum: 2019.1.15

    Betriebssystem: Windows 10 64-bit

    Sprache: Deutsch

    Dateigröße: 545.03 MB

     

    because I had the same issue like you and now running currently an older

    NVIDIA Driver

    [411.63]

    517.11 MB

    2018/09/20 for my GeForce RTX™ 2080 GAMING OC 8G.

    After the reinstall of Windows 10VM I'm now on the latest driver without issues.  My old VM I just can't use GPU Pass Threw anymore.  Wish I could figure out what changed but don't have the time to compare the two Windows installs. 

    Link to comment

    Also, make sure that the MSI Interrupts are on.  I had the system lock up again and when Windows 10 Upgraded it turned of MSI Interrupts off again and the problem returned.  I turned on the MSI Interrupts and the problem was cured again.  The root problem is 100% confirmed that it's Windows 10 VM causing the system to hang when it's not using MSI Interrupts. 

     

    Just wanted to make this message because I just had the problem this weekend when Windows 10 updated for completeness of this help to the next person reading it.

    Link to comment



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.