AlexB108 Posted September 17, 2022 Share Posted September 17, 2022 I am having a weird problem where my 14tb WD disk (WDC_WD140EDFZ-11A0VA0_9KH23VWL in the diagnostics) will become unmountable while the array is up. I have had it happen a couple of times, and each time stopping the array, starting the array in maintenance mode, then running: xfs_repair -L -v on the drive fixes the problem for a while, before it then happens again. I had thought it was a bad SATA cable (had the problem every day or so), so replaced that and I had assumed that had fixed it as the problem went away for a few weeks, but it has started happening again over this week. I have run extended SMART test, and getting no errors. The warranty for the drive runs out in Jan, so trying to work out if it is a drive problem, or if something else I am running is causing it. I have run the above command and moved all data off the drive since the last time it did it, just incase there is a drive problem (I also know that I should be running a parity drive, but I set it up without one during testing and never got round to installing one - may repurpose this drive if it isn't on its way out). Any help would be appreciated! Alex tower-diagnostics-20220917-1045.zip Quote Link to comment
JorgeB Posted September 17, 2022 Share Posted September 17, 2022 Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Sep 17 02:42:32 Tower kernel: ata4.00: cmd 60/08:e0:f8:ff:ff/00:00:7f:04:00/40 tag 28 ncq dma 4096 in Sep 17 02:42:32 Tower kernel: res 40/00:00:60:81:01/00:00:00:06:00/40 Emask 0x50 (ATA bus error) Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Sep 17 02:42:32 Tower kernel: ata4.00: cmd 60/08:e8:f0:ff:ff/00:00:ff:04:00/40 tag 29 ncq dma 4096 in Sep 17 02:42:32 Tower kernel: res 40/00:00:60:81:01/00:00:00:06:00/40 Emask 0x50 (ATA bus error) Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4: hard resetting link Sep 17 02:42:38 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Sep 17 02:42:42 Tower kernel: ata4: COMRESET failed (errno=-16) Sep 17 02:42:42 Tower kernel: ata4: hard resetting link Sep 17 02:42:43 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Disk looks healthy, the above suggests a power/connection problem, swap cables (both power and SATA) with a different disk and if you see similar errors see if they followed the disk or not. Quote Link to comment
AlexB108 Posted September 17, 2022 Author Share Posted September 17, 2022 2 hours ago, JorgeB said: Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Sep 17 02:42:32 Tower kernel: ata4.00: cmd 60/08:e0:f8:ff:ff/00:00:7f:04:00/40 tag 28 ncq dma 4096 in Sep 17 02:42:32 Tower kernel: res 40/00:00:60:81:01/00:00:00:06:00/40 Emask 0x50 (ATA bus error) Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Sep 17 02:42:32 Tower kernel: ata4.00: cmd 60/08:e8:f0:ff:ff/00:00:ff:04:00/40 tag 29 ncq dma 4096 in Sep 17 02:42:32 Tower kernel: res 40/00:00:60:81:01/00:00:00:06:00/40 Emask 0x50 (ATA bus error) Sep 17 02:42:32 Tower kernel: ata4.00: status: { DRDY } Sep 17 02:42:32 Tower kernel: ata4: hard resetting link Sep 17 02:42:38 Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Sep 17 02:42:42 Tower kernel: ata4: COMRESET failed (errno=-16) Sep 17 02:42:42 Tower kernel: ata4: hard resetting link Sep 17 02:42:43 Tower kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Disk looks healthy, the above suggests a power/connection problem, swap cables (both power and SATA) with a different disk and if you see similar errors see if they followed the disk or not. Fantastic, thanks for the tip, I will swap them over and see if I get the same issue! Quote Link to comment
trurl Posted September 17, 2022 Share Posted September 17, 2022 Since you've been repeatedly doing xfs repair on disk2, have you checked your lost+found share? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.