After the last incident I had decided not to replace my cables (I've already done this a few times throughout this process) and instead removed the Dynamix SSD TRIM plugin which I had thought was causing issues by running the SET MANAGEMENT command. Unfortunately that looks like it was a red herring (I believe Unraid natively supports TRIM stuff nowadays, so this was not going to change anything). To nobody's surprise I've had both my drives drop off again today after updating my docker images.
What's frustrating is that Unraid actually doesn't mark these drives as offline, even though they have been disabled per the logs:
Oct 24 08:41:39 Hathor kernel: ata5: COMRESET failed (errno=-16)
Oct 24 08:41:39 Hathor kernel: ata5: hard resetting link
Oct 24 08:41:44 Hathor kernel: ata5: link is slow to respond, please be patient (ready=0)
Oct 24 08:42:08 Hathor kernel: ata3: COMRESET failed (errno=-16)
Oct 24 08:42:08 Hathor kernel: ata3: limiting SATA link speed to 3.0 Gbps
Oct 24 08:42:08 Hathor kernel: ata3: hard resetting link
Oct 24 08:42:13 Hathor kernel: ata3: COMRESET failed (errno=-16)
Oct 24 08:42:13 Hathor kernel: ata3: reset failed, giving up
Oct 24 08:42:13 Hathor kernel: ata3.00: disabled
Oct 24 08:42:13 Hathor kernel: ata3: EH complete
Oct 24 08:42:13 Hathor kernel: sd 3:0:0:0: [sdb] tag#16 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=90s
Oct 24 08:42:13 Hathor kernel: sd 3:0:0:0: [sdb] tag#16 CDB: opcode=0x2a 2a 00 00 1a 80 80 00 00 40 00
See the attached image - the temperature being an * is the only indication that something is wrong.
I'm going to actually swap the cables as requested again and put the SSDs on my PCIe HBA instead of the motherboard controller. I'm running out of ideas and I'm reasonably convinced there's some sort of latent bug in the MX500 series firmware or the drivers unraid uses to control these drives is faulty.
..1 day later..
After swapping my SATA cables, controllers, and even power cables I have yet again had one of my cache drives drop offline. I connected my safe_cache drives, one to my motherboard controller and the other to my SAS2008 HBA and the mothboard connected drive has dropped. I'm going to swap the sata cable on that one again and see what happens. I have a hard time believing this issue is cable related as I've already used about 6 different SATA cables at this point for these drives, but I have more I can try.