Stu

Members
  • Posts

    6
  • Joined

  • Last visited

Stu's Achievements

Noob

Noob (1/14)

0

Reputation

  1. Adding a "me too" to this issue, which came about following a power outage. Unraid 6.8.3 diags attached. I am running a UPS, and as far as i can tell the server shut down cleanly (parity was still valid when it came back up). Everything seemed normal, shares and docker all ok, except for when my VM didn't autostart. It flagged an error when i manually started it, that I was missing my second vdisk image that i have on an array only share (for games). The vdisk Windows is installed on, is on cache. I commented that second vdisk out from the xml and the VM booted fine, though windows did have to do some disk checking. Let me know if i can provide any more info. tower-diagnostics-20200430-2128.zip
  2. Thanks for pointing that out. I have since found a couple of other threads with similar issues that you've assisted with. Bios is latest already, 2019-12-03, and the issue seems to have existed for a while so I'll just keep going as I am. Hopefully it doesn't occur too frequently under normal usage otherwise might have to do away with my virtualisation plans. Thanks again.
  3. I managed to trigger the VM instability / crash. This time the array was impacted and not the cache disk. I've attached the syslog, which I did have to truncate as it was >300mb due to "Disk read errors". It occurs immediately after I run user script to resolve the AMD Reset bug: The syslog snippet: Feb 27 15:50:07 Tower kernel: iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=29:00.0 address=0x000000040e06b4f0] Feb 27 15:50:07 Tower kernel: iommu: Removing device 0000:29:00.0 from group 15 Feb 27 15:50:07 Tower kernel: iommu: Removing device 0000:29:00.1 from group 15 Feb 27 15:50:07 Tower kernel: PM: suspend entry (deep) Feb 27 15:50:17 Tower kernel: PM: Syncing filesystems ... Feb 27 15:50:17 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x7800000f SErr 0x0 action 0x6 frozen Feb 27 15:50:17 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED Feb 27 15:50:17 Tower kernel: ata2.00: cmd 60/70:00:28:76:b2/01:00:99:00:00/40 tag 0 ncq dma 188416 in Feb 27 15:50:17 Tower kernel: res 40/00:ff:ff:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Feb 27 15:50:17 Tower kernel: ata2.00: status: { DRDY } And then the errors cascade from there (view attached syslog) The script: #!/bin/bash # echo "disconnecting amd graphics" echo "1" | tee -a /sys/bus/pci/devices/0000\:29\:00.0/remove echo "disconnecting amd sound counterpart" echo "1" | tee -a /sys/bus/pci/devices/0000\:29\:00.1/remove echo "entered suspended state press power button to continue" echo -n mem > /sys/power/state echo "reconnecting amd gpu and sound counterpart" echo "1" | tee -a /sys/bus/pci/rescan echo "AMD graphics card sucessfully reset" After rebooting, disk 3 was marked as having issues. I removed it from the array, and put it back in before kicking off a parity check which rebuilt it and completed without error. Current diagnostics attached too. For now I'm just going to avoid using that script and just have to accept a full reboot is necessary when that bug occurs. syslog.zip tower-diagnostics-20200228-1303.zip
  4. Thanks for replying! Unfortunately I don't, although a larger cache drive has been on my wishlist for a little while and could be on the cards in the next month or two. That said I'm not convinced of a fault with the SSD, as I have not experienced any problem with it under any other circumstance. My hope is I'm misconfiguring something somewhere somehow that might have been identifiable from the diagnostics attached to my original post. My only other thought is maybe my PSU is not keeping up, it is only a cheap 550w PSU, but not sure how to check for that. If nothing else I may have to try to setup logging so that it survives the crash, and see if those results shed any light.
  5. Hi all, Having recently found the time to dip my toes into the world of VMs and GPU pass through, despite some success I've come across a rather serious issue. Twice now I have had the server crash, requiring a physical reboot. Once unraid has booted it reports the cache disk is unmountable, something to do with a bad partition layout. The only way I've been able to recover is by formatting the cache disk and recovering from what back ups I had. The first crash occurred with my Win 10 VM, while running Geekbench 5. The second crash occurred with a linux VM, ElementaryOS, that I was trying to force shutdown to validate the AMD reset bug scripts I had setup worked. Diagnostics attached which I pulled after rebooting following the second crash. Unfortunately with the crashes I did lose the VM xml templates. I'm wary of proceeding further as each crash does take me a while to recover from. Appreciate any insight! tower-diagnostics-20200224-2220.zip