Hi,
I appreciate this is a long ramble without a lot of specific info, but I'm hoping to use this thread to keep track of further issues as they occur and then hopefully I can get to the bottom of what is wrong.
I've recently been trying out Unraid after hearing about it several years ago. I started with a Z170 intel board and crappy dual core celeron as a test, and once I was happy I moved my install over to new hardware.
I now have a Ryzen 2700 on an ASRock x370 Taichi board, with 32GB of Crucial memory, and an RX 580.
I have 4x 4TB drives in the array, and 2x 800GB Intel SSDs as cache.
I've been having lots of issues, and every time I fix one I think everything's good, only to be hit with another problem. I haven't been keeping good notes so I apologise for not having better information. It feels like I've been having multiple separate issues, and think I have fixed quite a few.
Many of the issues revolve around my Windows 10 VM. I couldn't get the GPU working at all without upgrading to the latest x370 BIOS. This seems to have been a known issue, and I was relieved to find a simple solution. I still had issues installing Windows, but after switching to Q35 and Seabios, I was able to get a working VM. I've still had some issues with stability, usually when first starting the VM. I was running my RAM at 3200 (which it's rated for) but reset this back to defaults to see if it helps with stability.
My cache drives have been sporadically reporting a small number of CRC errors. I thought this might be cables, but replacing them does not appear to have worked. The other likely culprit is the Icy Dock bay hot swap bay they're in. I'm going to remove that and see if that fixes it. It's possible that I first saw the CRCs when they were connected without the Icy Dock, but I can't be sure. A few days ago I woke to find my cache read only and some associated warnings. I think there may have been some warnings regarding the array as well. I wasn't able to remount the cache poo,l, but I was able to mount both disks manually, copy all the data to the array, reformat and copy it back. They've behaved OK since, apart from a small number of CRCs.
Sometimes when I start my Windows VM, I get an error on screen:
amd-vi completion-wait loop timed out
A reboot of the system seems to solve that, but it will often return on subsequent boots of the VM until I reboot the whole system. I added a BIOS ROM for my card, but this does not appear to have helped. When testing this last night, I got the error and noticed high CPU on some cores whilst stuck in this state. I killed the VM and tried again. I got the same error again, but after waiting the screen went blank and was eventually replaced by a windows start up error complaining 'an unexpected IO error has ocurred'. I looked in the Unraid UI and expected to see issues with some disks, but everything looked good. I killed the VM and went to bed, leaving my crashplan backup running via docker backing up array files. The VM disk is on the cache pool.
I woke up today to find two of my array drives reporting exactly 3 errors each. One was the parity drive. Stupidly I've since rebooted which reset the error count. I've had no errors since, and I'm 20% through a parity check which I started. No errors there yet either.
I'm completely lost as to what is going on. Last night before the issues, I used my Windows VM for several hours with GPU passthrough without a single glitch or issue. No disk errors, CRC or otherwise. When it's working, it seems rock solid... Until suddenly it isn't.
tower-diagnostics-20200225-0913.zip