Hi All,
For the last few weeks ive been battling random hardware issues which i think i have now finally figured out.
In Summary, i was looking to add some addiitonal hardware to the server in the form of a P1000 GPU and a seperate NIC. At the same time i was going to add an additional drive (unassigned) to be used for recording IP Camera streams using Blue IRIS in a Windows VM.
On doing so, i had 1 of my partiy drives and 1 of the array drives disable. Both disks appered to have OK SMART tests although i was suffering many many read errors and issues which i couldnt understand. In the process of diagnosing the Faults, i thought my HBA card went bad as it wouldnt boot or connect any drives. So i puchased a replacement. I also replaced all the SATA cables in the system and bought two new replacement drives (Seagate IronWolf 4Tb).
At that point i had the server crash and reported a series of BTRFS erors (Errno=-5 IO Failure) which i believe was caused by too many PCIE devices connected at once. After pulling the NIC out of the system i have been able to rebuild the array drive and the parity drive successfully.
I think that the chipset (X570) was disabling the two SATA ports. In the system i have 1x P600 GPU, 1xP1000 GPU, 9211-8i HBA, 2x NVME cache drives. So im figuring that the full 20 lanes of PCIE were used.
The last battle i am now facing is that whenever i enable VM's, Disk 1 gets disabled.
Disk1 does contain the libvirt.img file and the docker.img file.
Any ideas as to why this is happening and suggestions on how to diagnose and repair this issue?
Diagnostics posted.
Thanks for any assistance!
tower-diagnostics-20230913-0936.zip