• Kernel panic and server lockup on VM restart 6.8.0-rc3


    glennv
    • Retest Minor

    AMD reset bug seems to have made my situation worse instead of better with the new 6.8 rc release

    Now the Unraid server kernel panics and freeses forcing a hard reset when i restart my (in itself fine running ) OSX VM with passed thru MSI Radeon 56.

    Before (till 6.7.2) it just froze the VM upon a restart. So now with 6.8 rc way worse !!!!!!

    What i also did was test my VM with a newer (4.1) QEMU bios to exclude that as the source of errors as saw that it was updated. And when i tested it with rc1 without doing that the server crashed immediately when gracefull shutting down the VM. Now on rc3  it only dies on restart or consecutive starts.

    Stil not useable so will  revert.

     

    Attached diags ad screenshots:

     

    syslog_during_start_and_then_restart_of_VM.thumb.png.4fd015bd863d6f4ec7fe3c5a0fbcc37e.pngjust_before_server_lockup.thumb.png.c5cab61fecd702502e8391733c80723e.pngboom_dead.thumb.jpg.73d2c528786de7ba68df310706939e68.jpg

    tach-unraid-diagnostics-20191018-0950.ziptach-unraid-diagnostics-20191018-0954.zip




    User Feedback

    Recommended Comments

    and how would that make sense in this case ? Machine runs 24x7 , never ever crashed for anything in years. Only and repeatable crashes when i run the new rc and only when i stop or restart my vm with amd passed thru.

    Without that card or any older release no crashes. Ever.

    Run multiple zfs pools , large btrfs cache etc etc . Zero issue. (even these crashes they survive aparantly, good test cases for my redundancy setup unwillingly)

    So although memory issues can cause random crashes , that would not be my first though here as not random, not at normla runtime and there is a very very clear history and relation with only amd card not resetting properly and now shooting the server in the head.

    If i leave the vm (used for heavy davinci resolve rendering, nuke/houdini render farm duties etc) running it runs indefinitely without issues.

    Went back for now to older release where same vm just freezes on a restart (in 90% of the cases. Some rare restarts work)

     

    p.s last full 24 hrs mem test was about 4 months ago

    Edited by glennv
    Link to comment

    running an extensive mem test now anyway to prevent the inevitable questions if i ran a test and if i tried unplugging and replugging the server 😝

    Wont hurt anyway.

    Link to comment

    ps fyi also took out all the other pcie devices (usb and 10g) from the xml to make sure and confirmed it is gpu related and not coming from any of these devices.

    Link to comment
    16 hours ago, glennv said:

    running an extensive mem test now anyway to prevent the inevitable questions if i ran a test and if i tried unplugging and replugging the server 😝

    Wont hurt anyway.

    about  16 hours later hammering 24cores (of 48) at memory sticks with latest downloaded memtest as totaly expected zero issues (other then large enegry bill).

    Edited by glennv
    Link to comment

    Are there any changes made that require a retest (as i see you changed the status) as i am now running 6.7.2 but willing to retest if something changed in the code ? Did not see anything related in the rc4 change log

    Edited by glennv
    Link to comment
    Quote

    AMD reset bug seems to have made my situation worse instead of better with the new 6.8 rc release

     

    What do you mean here by "AMD reset bug"?

    Link to comment
    2 minutes ago, limetech said:

     

    What do you mean here by "AMD reset bug"?

    I thought that was clear by now .  (or maybe i should have put "the effects of the bug" )

    The bug whereby a passed thru AMD card causes a VM (at least in my case my OSX VM with Vega 56) only to be able to be started once. Will run fine indefinate, everything works, but then when you restart the VM , it hangs as "apparently" the AMD card does not get properly reset.

    That was all before 6.8 rcx for me the case.

    In 6.8 rc a patch was introduced to tackle this bug according to the change logs. For me it had the side affect of not just hanging the VM but paniking and hanging the complete Unraid server (see logs /screenshots etc). Consistently and repeatable. 

    Reverting back to 6.7.2 return it to normal buggy behavior of forementione AMD reset bug.

    Link to comment

    I´ve  the same fail  with AMD RX 480 + RX 580, but Windows works fine with GPU reset, but no Linux OS like Ubuntu, Arch linux...

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.