Greetings from Bavaria!
I started building and running an unraid server about 1 to 2 months ago and noticed that one of two drives keep failing whenever a parity check is started. Specifically, drives 3 and/or 4 show errors in the Main menu, and become disabled/emulated. I checked the SAS to SATA connections several times and even replaced the hard drives, but had no luck so far. Moreover, it appears that irrespective of which drive I plug into the drive's physical tray/slot it it is likely to fail.
The strange thing is, that whenever I rebuild a failed disk (i.e., by following this procedure: stop array > deselect e.g., disk 4 > start array in maintenance > stop array > select disk 4 again > start array) the disk rebuilds completely fine and without any errors for several hours (17h for a 10 TB disk). Yet only minutes after starting a parity check, disk 3 or 4 fails.
Hence my guess that the HBA PCIe card that is connected to the 8 drives is overheating due to the heavy load when performing a parity check - yet I would have guessed that the load is just heavy (or worse) when performing a full disk rebuild. The card I am using is a Fujitsu D2607-A21 RAID controller flashed into IT-Mode. It does get hot, so I channeled air to flow across its heatsink (which looks pretty tiny i.m.o., hence supporting my assumption). According to the table provided on the hardware compatibly page, the Fujitsu D2607 should work and is even recommended, but I'm not sure if the "should" implies it will work stably (also, maybe the guy who previously flashed it into IT-mode did an upsy-daisy).
Since I have little to no experience with unraid, I wanted to ask for help and whether my assumption seems plausible, that it is not the drive/s, but the HBA card that is failing. If that appears to be the case, do you think the best course of action may be better cooling (louder fans ) or a different/better (dedicated) HBA card? Or is my assumption incorrect and the problem lies somewhere completely different?
Many thanks in advance!
EDIT:
There are two SATA cache drives connected directly to the mainboard, making the system a 10-drive system (2x parity, 2x cache and 6x data).
zerver-diagnostics-20230227-1722.zip