• 6.9.0-beta25 BUG: unable to handle page fault


    BurntOC
    • Solved

    I'd created a post in the support forum but after doing more searches maybe I'm actually seeing a bug here.  There is some background info here, but basically I'd been running rock solid for weeks with no problems.  I started trying to use a Windows VM converted from a physical machine and pass my GTX 1050 Ti through.  While that all appeared to work fine as well, now my Unraid server becomes non-responsive.  Sometimes this happens late in the day, sometimes the next morning, but it is pretty consistent and I can't access the web GUI, ssh, and usually even the local display.  I've run at least one pass of memtest and had 0 errors in that run as well.

     

    I've attached a diagnostics zip here as well, but here's what caught my eye today:

     

    Quote

    Aug 29 09:53:42 unraid kernel: BUG: unable to handle page fault for address: fffff8ef62b6de88
    Aug 29 09:53:42 unraid kernel: #PF: supervisor read access in kernel mode
    Aug 29 09:53:42 unraid kernel: #PF: error_code(0x0000) - not-present page
    Aug 29 09:53:42 unraid kernel: PGD 0 P4D 0
    Aug 29 09:53:42 unraid kernel: Oops: 0000 [#1] SMP PTI
    Aug 29 09:53:42 unraid kernel: CPU: 1 PID: 13583 Comm: php-fpm Tainted: G        W         5.7.8-Unraid #1
    Aug 29 09:53:42 unraid kernel: Hardware name: Hewlett-Packard HP Z230 Tower Workstation/1905, BIOS L51 v01.63 04/22/2020
    Aug 29 09:53:42 unraid kernel: RIP: 0010:compound_head+0x0/0x11
    Aug 29 09:53:42 unraid kernel: Code: 89 06 31 c0 c3 48 8b 17 b8 01 00 00 00 48 f7 c2 9f ff ff ff 74 13 48 b8 98 0f 00 00 00 00 f0 7f 48 85 c2 0f 95 c0 0f b6 c0 c3 <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 65 48 8b 04 25
    Aug 29 09:53:42 unraid kernel: RSP: 0018:ffffc9000031fcc8 EFLAGS: 00010282
    Aug 29 09:53:42 unraid kernel: RAX: 0000000000000001 RBX: 000014a12306c000 RCX: ffff888000000000
    Aug 29 09:53:42 unraid kernel: RDX: 0000003bbd8adb7a RSI: ffffc9000031fcc4 RDI: fffff8ef62b6de80
    Aug 29 09:53:42 unraid kernel: RBP: ffffc9000031fdf8 R08: fffff8ef62b6de80 R09: 7c00003bbd8adb7a
    Aug 29 09:53:42 unraid kernel: R10: ffff8885262bf148 R11: ffffc9000031fd50 R12: ffff8884ea490af8
    Aug 29 09:53:42 unraid kernel: R13: 00000004fcc5d025 R14: ffff888465212000 R15: ffff8884652d4420
    Aug 29 09:53:42 unraid kernel: FS:  0000000000000000(0000) GS:ffff88852ba80000(0000) knlGS:0000000000000000
    Aug 29 09:53:42 unraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 29 09:53:42 unraid kernel: CR2: fffff8ef62b6de88 CR3: 000000000200a002 CR4: 00000000001606e0
    Aug 29 09:53:42 unraid kernel: Call Trace:
    Aug 29 09:53:42 unraid kernel: migration_entry_to_page+0x19/0x2e
    Aug 29 09:53:42 unraid kernel: unmap_page_range+0x444/0x65c
    Aug 29 09:53:42 unraid kernel: unmap_vmas+0x6c/0x9a
    Aug 29 09:53:42 unraid kernel: exit_mmap+0xb2/0x14d
    Aug 29 09:53:42 unraid kernel: __mmput+0x3b/0xd6
    Aug 29 09:53:42 unraid kernel: do_exit+0x3ae/0x8f3
    Aug 29 09:53:42 unraid kernel: ? handle_mm_fault+0x117/0x165
    Aug 29 09:53:42 unraid kernel: do_group_exit+0x8e/0x8e
    Aug 29 09:53:42 unraid kernel: __x64_sys_exit_group+0xf/0xf
    Aug 29 09:53:42 unraid kernel: do_syscall_64+0x7a/0x87
    Aug 29 09:53:42 unraid kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Aug 29 09:53:42 unraid kernel: RIP: 0033:0x14a127414b46
    Aug 29 09:53:42 unraid kernel: Code: Bad RIP value.
    Aug 29 09:53:42 unraid kernel: RSP: 002b:00007ffce070cc88 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
    Aug 29 09:53:42 unraid kernel: RAX: ffffffffffffffda RBX: 000014a127513820 RCX: 000014a127414b46
    Aug 29 09:53:42 unraid kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
    Aug 29 09:53:42 unraid kernel: RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff78
    Aug 29 09:53:42 unraid kernel: R10: fffffffffffff8f5 R11: 0000000000000246 R12: 000014a127513820
    Aug 29 09:53:42 unraid kernel: R13: 000000000000000b R14: 000014a12751c328 R15: 0000000000000000
    Aug 29 09:53:42 unraid kernel: Modules linked in: xt_nat xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost vhost_iotlb tap macvlan xt_MASQUERADE iptable_filter iptable_nat nf_nat ip_tables xfs dm_crypt dm_mod dax md_mod e1000e igb i2c_algo_bit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd wmi_bmof cryptd glue_helper intel_cstate intel_uncore intel_rapl_perf ahci libahci i2c_i801 i2c_core input_leds led_class wmi button thermal fan [last unloaded: md_mod]
    Aug 29 09:53:42 unraid kernel: CR2: fffff8ef62b6de88
    Aug 29 09:53:42 unraid kernel: ---[ end trace 7835d68bf5662ab2 ]---
    Aug 29 09:53:42 unraid kernel: RIP: 0010:compound_head+0x0/0x11
    Aug 29 09:53:42 unraid kernel: Code: 89 06 31 c0 c3 48 8b 17 b8 01 00 00 00 48 f7 c2 9f ff ff ff 74 13 48 b8 98 0f 00 00 00 00 f0 7f 48 85 c2 0f 95 c0 0f b6 c0 c3 <48> 8b 57 08 48 89 f8 f6 c2 01 74 04 48 8d 42 ff c3 65 48 8b 04 25
    Aug 29 09:53:42 unraid kernel: RSP: 0018:ffffc9000031fcc8 EFLAGS: 00010282
    Aug 29 09:53:42 unraid kernel: RAX: 0000000000000001 RBX: 000014a12306c000 RCX: ffff888000000000
    Aug 29 09:53:42 unraid kernel: RDX: 0000003bbd8adb7a RSI: ffffc9000031fcc4 RDI: fffff8ef62b6de80
    Aug 29 09:53:42 unraid kernel: RBP: ffffc9000031fdf8 R08: fffff8ef62b6de80 R09: 7c00003bbd8adb7a
    Aug 29 09:53:42 unraid kernel: R10: ffff8885262bf148 R11: ffffc9000031fd50 R12: ffff8884ea490af8
    Aug 29 09:53:42 unraid kernel: R13: 00000004fcc5d025 R14: ffff888465212000 R15: ffff8884652d4420
    Aug 29 09:53:42 unraid kernel: FS:  0000000000000000(0000) GS:ffff88852ba80000(0000) knlGS:0000000000000000
    Aug 29 09:53:42 unraid kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug 29 09:53:42 unraid kernel: CR2: 000014a127414b1c CR3: 000000000200a002 CR4: 00000000001606e0
    Aug 29 09:53:42 unraid kernel: Fixing recursive fault but reboot is needed!

     

     

     

    unraid-diagnostics-20200830-0953.zip




    User Feedback

    Recommended Comments

    So it ran for about 40 hours or so with the GTX out of the server and I thought it might be working without it at least.  Then I checked it this afternoon and it is completely non-responsive and I had to hard power off.  This was showing on the screen, in case it is of any help. 

    IMG_20200903_154013.jpg

    Link to comment

    Changed Status to Solved

     

     

    Final update:  2 threads (this and the original support request), 6 days here with no response hereI do really, really like Unraid and I don't regret buying a license.  Kinda regretting buying the 2nd one now though, but maybe I'll luck out and not run into any major issues and the point will be moot.  Hope so.  In any case, I want to add this update for posterity.  

     

    Current uptime is 3 days, 17+ hours, well beyond what it had been as of late.  I believe the solution was that I pulled the 1 stick of RAM that didn't match the others out of the system. Even though the run of memtest had passed maybe subsequent ones would fail, but the stick has performed perfectly fine in other multifunction server installs.  In the config I had here it just seems it was enough of a mismatch that it caused an error.

    Edited by BurntOC
    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.