June 8, 20242 yr Hi everyone, I'm pulling my hair out over this one and would love for some help. My server has been stable for months with no issues, so I decided to add an old GPU I had laying around to help with media transcoding. To do so, I moved a couple SATA connectors to other ports. A couple days later, my parity drive started going offline with read errors and then subsequently disabled by Unraid. Strange, I thought, so I rebooted. After the reboot, disk 3 appeared missing, even though no physical reconfiguration was done (I was away for work). After a couple reboots, it reappeared. I booted the server, let the parity rebuild, but after a couple hours everything happened again. I resigned myself to leave it offline for a few days until I got home and could remove the GPU since that was the only thing that changed, and return the SATA connectors back to their original ports. That worked great, for about a week. Then this morning, I awoke to all the same errors, with millions of parity read errors and a few hundred read errors on disks 1-4. I'm at a loss as to what would cause this to happen. My specs are below. Rosewill Rack Mount Chassis MSI B450-A PRO MAX AMD Ryzen 7 3700X PCI SATA Expansion Card (4 ports) 2x 8TB Seagate 1x 4TB Seagate 1x 4TB WD 2x 3TB WD 2 drives are plugged into motorboard SATA, and 4 are plugged into the PCI expansion card. My diagnostics are attached. mccoyserver-diagnostics-20240608-1135.zip
June 8, 20242 yr Author 3 hours ago, itimpi said: Are you sure your PSU is up to the load? Yes, it ran for months without issue prior to the GPU. I have removed the GPU to lower the power draw
June 8, 20242 yr All 4 disks connected to the Marvell controller dropped offline at the same time, suggesting a controller issue, and these Marvell controller are known for doing this for multiple users.
June 8, 20242 yr Author 1 minute ago, JorgeB said: All 4 disks connected to the Marvell controller dropped offline at the same time, suggesting a controller issue, and these Marvell controller are known for doing this for multiple users. Okay, good to have a place to start. I'll replace it and test. Wonder why it started with the GPU addition, think it was something something PCI?
June 8, 20242 yr Author 1 hour ago, JorgeB said: It could be. I have connected my M.2 drive to a PCI adapter, allowing me to use the full 6 SATA ports on my motherboard for the remaining drive and removed the SATA PCI card. I'll order an HBA so I can move the M.2 cache drive back to the normal slot and report back if it's solved for sure. Parity is rebuilding now
June 9, 20242 yr Author 15 hours ago, JorgeB said: All 4 disks connected to the Marvell controller dropped offline at the same time, suggesting a controller issue, and these Marvell controller are known for doing this for multiple users. Unfortunately, even with the Marvell controller removed, my parity drive is still throwing read errors and then being disabled. I have attached the new diagnostics. mccoyserver-diagnostics-20240609-1008.zip
June 9, 20242 yr Jun 9 02:26:48 McCoyServer kernel: ata15: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 9 02:27:18 McCoyServer kernel: ata15.00: qc timeout (cmd 0xec) Jun 9 02:27:18 McCoyServer kernel: ata15.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 9 02:27:18 McCoyServer kernel: ata15.00: revalidation failed (errno=-5) Jun 9 02:27:18 McCoyServer kernel: ata15.00: disable device Parity disk dropped offline, this is most often a power/connection issue.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.