• 6.6.0 Bug M.2 NVME Issues


    Jerky_san
    • Minor

    I don't know if this is and AMD x399 Zenith board bug or Unraid so posting this to to verify. Was using my machine and got this below. I force stopped the VM and it killed unraid completely. 100% unresponsive so had to hard reset. This isn't the exact one it got but its very similar. It was the only thing logged in the syslog before it occured.

    Sep 26 15:08:57 Tower kernel: iommu ivhd2: AMD-Vi: Event logged [
    Sep 26 15:08:57 Tower kernel: iommu ivhd2: INVALID_DEVICE_REQUEST device=00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00]

    tower-syslog-20180926-1518.zip




    User Feedback

    Recommended Comments

    Had it happened again. Unraid survived it though for me to pull logs.. It appears my m.2 drive became fully unresponsive and was dropped from the system. I have two M.2 NVME drives.. One wouldn't work period with unraid simply wouldn't boot any VMs on it. That one was an HP and I had a samsung and it seems space invaders had good luck with the samsung so tried that and it worked. I used it throughout the beta without issue but now for some reason it has started to hiccup.

     

    Sep 27 11:53:23 Tower kernel: vfio-pci 0000:41:00.0: timed out waiting for pending transaction; performing function level reset anyway
    Sep 27 11:53:25 Tower kernel: vfio-pci 0000:41:00.0: not ready 1023ms after FLR; waiting
    Sep 27 11:53:26 Tower kernel: vfio-pci 0000:41:00.0: not ready 2047ms after FLR; waiting
    Sep 27 11:53:28 Tower kernel: vfio-pci 0000:41:00.0: not ready 4095ms after FLR; waiting
    Sep 27 11:53:32 Tower kernel: vfio-pci 0000:41:00.0: not ready 8191ms after FLR; waiting
    Sep 27 11:53:41 Tower kernel: vfio-pci 0000:41:00.0: not ready 16383ms after FLR; waiting
    Sep 27 11:53:58 Tower kernel: vfio-pci 0000:41:00.0: not ready 32767ms after FLR; waiting
    Sep 27 11:54:34 Tower kernel: vfio-pci 0000:41:00.0: not ready 65535ms after FLR; giving up

    tower-diagnostics-20180927-1155.zip

    Link to comment
    6 hours ago, jbartlett said:

    Do you think that the m.2 drive failed and it's killing your system? A more descriptive title would be helpful.

    Doubtful to be honest. It's not impossible but I had used it on my old system for 6 months without issue. Smart comes back 100% good. It only has 3 tb of writes 3.3 tb of reads. No flags are being set for it having an issue.

    Link to comment

    I've got two unraid systems, both with two M.2 NVMe drives as well, VM's running of them and no issues so far.

     

    It wasn't quite clear but are the two m.2 drives the same model? If so, check out my DiskSpeed docker app and run a bench mark on both system (assuming you can access both currently) to see if they have similar speed graphs. Just because there's no flags being raised doesn't mean there aren't any.

     

     

    Link to comment
    6 hours ago, jbartlett said:

    I've got two unraid systems, both with two M.2 NVMe drives as well, VM's running of them and no issues so far.

     

    It wasn't quite clear but are the two m.2 drives the same model? If so, check out my DiskSpeed docker app and run a bench mark on both system (assuming you can access both currently) to see if they have similar speed graphs. Just because there's no flags being raised doesn't mean there aren't any.

     

     

    One is an HP ex920 1TB and other is a Samsung 950 Pro. The HP simply refuses to boot in a virtual machine getting a strange error(shown below) I reported a while ago but no one really responded on why. Searching around I found this link https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg02385.html

     

    Internal error: qemu unexpectedly closed the monitor: 2018-06-04T23:48:57.302131Z qemu-system-x86_64: -device vfio-pci,host=01:00.0,id=hostdev2,bus=pci.0,addr=0x8: vfio error: 0000:01:00.0: failed to add PCI capability 0x11[0x50]@0xb0: table & pba overlap, or they don't fit in BARs, or don't align

     

    The Samsung was working fine for at least a month before this issue cropped up. The problem is that this is an X399 Zenith Board and a 2990wx. I've read others here having issues with different NVME's on X399 as well so I wonder if its really an unraid problem or an x399 platform problem.

     

    To the speed test thing. I already mirrored the nvme onto an ssd and passed it through. Had it lock up twice yesterday and the second time it took unraid with it again. Previously when I had ran speed tests with crystal disk mark it was running at the speed a 950 Pro would be expected to run.

    Link to comment


    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Restore formatting

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Status Definitions

     

    Open = Under consideration.

     

    Solved = The issue has been resolved.

     

    Solved version = The issue has been resolved in the indicated release version.

     

    Closed = Feedback or opinion better posted on our forum for discussion. Also for reports we cannot reproduce or need more information. In this case just add a comment and we will review it again.

     

    Retest = Please retest in latest release.


    Priority Definitions

     

    Minor = Something not working correctly.

     

    Urgent = Server crash, data loss, or other showstopper.

     

    Annoyance = Doesn't affect functionality but should be fixed.

     

    Other = Announcement or other non-issue.