January 31, 20251 yr Hi all, Over the course of my time using Unraid (~2 years) I've had a recurring issue whereby a drive (usually only 1 at any given time) will start to regularly have the following errors kernel: ata4: COMRESET failed (errno=-16) kernel: ata4: hard resetting link Each time I've tried multiple SATA cords and power cords, and eventually chalk them up to drive degredation (as I have purchased manufacturer refurbished drivers in the past). A couple of things though, this is most commonly happening on the newer drivers (not the oldest in the array), this has occurred on multiple motherboards and there respective SATA controllers. It started to happen against about 36 hours ago on one of my drivers and I thought its time to see if someone else could see something I'm not. I'm positing the diagnostics - but I'd appreciate any input on whether: 1.) The drive (or any of the drives) look bad and needs to be replaced 2.) What else may be occuring that I should look into remedaiting Please note - I am running this instance Unraid as a virtual VM on Proxmox, but I am passing through the onboard SATA controller directly. Also, this was occuring even a few months ago when I had Unraid running on bare metal. -W citizenur-diagnostics-20250131-1302.zip
January 31, 20251 yr Community Expert It's not logged as a disk problem, but if you have already replaced the cables... Does it happen only with the white label drives?
January 31, 20251 yr Author Unfortunately no. I think this has happened to 3-4 drives over the last two years. I believe two of them were white label, but the other was manufacturer WD.
February 1, 20251 yr Community Expert Solution If you have a spare, try a new PSU, assuming it's still the same.
February 1, 20251 yr Author 7 hours ago, JorgeB said: If you have a spare, try a new PSU, assuming it's still the same. Ok, I'm going to give this a try. I don't have a spare one lying around, but I have one arriving tomorrow that I'll give a go and report back
February 3, 20251 yr Author On 2/1/2025 at 2:42 AM, JorgeB said: If you have a spare, try a new PSU, assuming it's still the same. I'm not fully declaring victory yet - but a new PSU seems to have either fixed things or at the very least made things much better. I'm nearly 17 hours (68%) into a parity check and a few things of note: 1.) There have been no COMRESET errors (or any logged errors actually) for any of the drives 2.) The parity check is identifying and correcting quite a few sync errors (presumably those created from the disk / comm issues experienced on the parity rebuild that was occuring when I was seeing a high number of COMRESET errors) I'll report back when the parity check is complete and after I've had time to add one of the other drives I gave up on due to this issue in the past. At that point I'm hopeful I'll be able to mark the new PSU as the solution this this issue thread.
February 5, 20251 yr Author On 2/1/2025 at 2:42 AM, JorgeB said: If you have a spare, try a new PSU, assuming it's still the same. Ok, reporting back - I can definitively confirm this was the solution. I have confirmed in two ways: 1.) The drive throwing me errors that led to me initiating this post - I have completed a full parity rebuilt (it was the parity disk in the array) and the rebuild completed - corrected ~25k parity errors and the disk threw no errors during the whole rebuild (and hasn't throw any errors since) 2.) I put another white label drive that had persistently thrown COMRESET error the last time it was in service ( and I had given up) back in the server and successfully complete a full pre-clear cycle (pre-read, zeroing, and post-read) with out throwing any error. Thank you for the suggestion - looks like I fell victim to a cheap PSU purchase. Edited February 5, 20251 yr by citizen_y
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.