Hello,
Longtime lurker first, time asking for assistance. My system is constantly crashing hard and I am running out of steps to troubleshoot by myself
Gigabyte UD3 / 4970 / 32GB
500W psu
LSI 9200-8i
2x 4TB + 2x 2TB drives in xfs on array (off mobo)
8x 250GB ssd in RAID10 btfrs on cache (off hba)
1x 500GB ssd (off mobo)
*4TB parity disk (currently removed from array)
*Radeon 7870 (currently physically removed)
It started with errors off disk1 (xfs corruption) which was a 1yr 4TB drive. Attempts to rebuild resulted in the system locking up and I was becoming unsure of the integrity of the parity disk as well. I removed the parity disk from array and ran an xfs repair on disk1 which came up ok
Various errors made me doubt the integrity of sata cables so these were changed. I also moved all the cache ssd to the hba instead of mobo (was a 6/2 mix prior)
Other changes/tests to try and isolate
-Removed graphics card
-Moved HBA to different slot (was on pcie2.0x4 slot, moved to pcie3.0x8)
-Ran memtest - full test 1pass - all ok
-Diskcheck in maintainance mode - all ok
-VMs and Docker turned off
-Leaving idle in maintainance mode
-Ran extended smart test on a few drives - all ok
While idling in maintanence mode it might run for up to 4-8 hours before locking up. I've been observing this all week from work. I am currently trying to preclear the parity disk to use again, however running the preclear seems to bring an error about much earlier. I managed to pull a diagnostic very close to most recent crash
Any help greatly appreciated
Cheers,
Andy
ap-ur01-diagnostics-20200208-1413.zip syslog.txt