geofbennett Posted August 16, 2022 Share Posted August 16, 2022 About a month ago my server warned 4 times over 3 hours of “[5] reallocated sector ct” and once of “[187] reported uncorrect”. At the same time it was running it’s parity check, at the end of which it also reported parity errors. (I have it run a parity check w/ corrections weekly, and this was the first time it EVER reported parity errors) Understanding that the Data drive is probably on the way out the door, I acknowledged the errors and restarted the server to clear the warning and see how long it would take for it to report more errors That disk has not reported any more errors since then. However, the parity check has reported errors each week since. Then yesterday, before the parity check was finished, it sent me the following : “Event: Unraid array errors Subject: Warning [THE-DARK-TOWER] - array has errors Description: Array has 1 disk with read errors Importance: warning Parity disk - ST4000VN008-2DR166_ZDHAF8R2 (sdg) (errors 134) “ Running extended SMART tests on all Data disks (including the one the reported errors) returns “Completed without error”. However, the SMART test on the Parity drive ends with “Interrupted (host reset)” All disks are 4TB Seagate Ironwolf NAS. The drive that gave the initial errors was put into service in May 2018 (currently shows 37245 power on hours). The Parity drive was only installed last November (currently shows 6715 hours, and is thankfully still under warranty). (Also note that the Parity drive has a green thumbs up and says “Healthy” on the dashboard) Am I looking at replacing both drives? Any opinions on whether it was errors on the Data drive or problems with the Parity drive that has been causing the parity errors? Any suggestions on further tests of the Parity drive to aid in a warranty claim? Thanks for your help Quote Link to comment
JorgeB Posted August 16, 2022 Share Posted August 16, 2022 Please post the diagnostics. Quote Link to comment
trurl Posted August 16, 2022 Share Posted August 16, 2022 29 minutes ago, geofbennett said: I have it run a parity check w/ corrections weekly Scheduled checks should be noncorrecting. 29 minutes ago, geofbennett said: SMART test on the Parity drive ends with “Interrupted (host reset) You have to disable spindown on the disk to get extended self-test to complete. 11 minutes ago, JorgeB said: Please post the diagnostics. attach to your NEXT post Quote Link to comment
geofbennett Posted August 16, 2022 Author Share Posted August 16, 2022 2 hours ago, trurl said: You have to disable spindown on the disk to get extended self-test to complete. /Settings/DiskSettings - Default spin down delay is already set to "Never". Is there another way to disable spindown? I've set the parity check scheduler "Write corrections to parity disk:" to "No", but the check mark next to "Write corrections to parity" on Main does not go away. I suspect that check mark only applies if I tell it to check parity outside the schedule, correct? Diagnostics attached the-dark-tower-diagnostics-20220816-1520.zip Quote Link to comment
trurl Posted August 16, 2022 Share Posted August 16, 2022 1 hour ago, geofbennett said: that check mark only applies if I tell it to check parity outside the schedule, correct? correct 4 hours ago, geofbennett said: the Data drive is probably on the way out you didn't actually tell us which drive that was so I had to open them all I would replace disk3 Quote Link to comment
geofbennett Posted August 16, 2022 Author Share Posted August 16, 2022 15 minutes ago, trurl said: you didn't actually tell us which drive that was so I had to open them all Sorry about that... Yes, Disk 3 is the one that reported errors last month. Any ideas about the error report for the parity drive? Or should I just not worry about it? That's what's kind of concerning to me. I'm cool with waiting a bit to see if more errors appear if it is only one disk, but being there are 2 and one of them is Parity I'm getting a little nervous. Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 Logged has a lot of spam, but there are ATA errors with parity, check/replace both cables. Quote Link to comment
trurl Posted August 17, 2022 Share Posted August 17, 2022 13 hours ago, geofbennett said: cool with waiting a bit to see if more errors appear if it is only one disk, but being there are 2 and one of them is Parity I'm getting a little nervous. Not sure why you would be cool with only one disk with errors. And parity isn't any more important than any other disk, arguably it is the least important since it contains none of your data. Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 48 minutes ago, trurl said: Not sure why you would be cool with only one disk with errors. And parity isn't any more important than any other disk, arguably it is the least important since it contains none of your data. It was my understanding that the Parity drive is what enables you to rebuild a Data drive if it should fail, but if you only have a single parity drive and multiple drives fail at the same time then you will lose data. Is that not true? Quote Link to comment
trurl Posted August 17, 2022 Share Posted August 17, 2022 If you have more failed drives than you have parity drives you can't rebuild anything. But if the only failed drive is parity then all of your data is OK since parity contains none of your data. So parity is less important than data drives. Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 which is why I'm kinda cool with only 1 drive having errors, but I have 2 drives with errors (one of them the Parity drive) which makes me nervous. I understand my data is not on the parity drive but it is not the only drive showing errors. Quote Link to comment
trurl Posted August 17, 2022 Share Posted August 17, 2022 15 hours ago, trurl said: I would replace disk3 but not until you get parity problems fixed 5 hours ago, JorgeB said: Logged has a lot of spam, but there are ATA errors with parity, check/replace both cables. Post new diagnostics after Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 Thanks. Just so I'm understanding correctly, 1. Replace Cables 2. Restart and run Parity Check (with corrections?) 3. Post Diagnostics Quote Link to comment
trurl Posted August 17, 2022 Share Posted August 17, 2022 2 minutes ago, geofbennett said: Thanks. Just so I'm understanding correctly, 1. Replace Cables 2. Restart and run Parity Check (with corrections?) 3. Post Diagnostics 2. Start the array, don't do anything else until we look at new diagnostics Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 I'm glad I asked. Diagnostics attached the-dark-tower-diagnostics-20220817-1108.zip Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 Still ATA errors, did you replace both cables or just check/reconnect? Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 Replaced both with the last 2 cables I had Disk 3 is in an ICY DOCK FatCage MB155SP-B Parity is connected directly to the motherboard Would it help to swap Disk 3 from it's current slot in the cage into a different slot to see if the ATA errors follow it or remain on that slot? Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 23 minutes ago, geofbennett said: Would it help to swap Disk 3 from it's current slot in the cage into a different slot to see if the ATA errors follow it or remain on that slot? Swap parity with disk3 and to see where the problem goes. Quote Link to comment
UhClem Posted August 17, 2022 Share Posted August 17, 2022 (edited) I don't think it's a cable problem. There are (suggestive) indications. in the syslog, that your problem with your Disk3 is due to a flaky SATA port (ata6) on your motherboard (chipset). I would swap connections at the motherboard between Disk3 and another DiskN. If the problems DO "transfer" to DiskN (and stay on ata6), that does eliminate Disk3 and its cable, and nails it to the board. If not, ... Disclaimer: not an Unraid user (just like fun problems) Edited August 17, 2022 by UhClem Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 Swapped Disk 3 and Parity. I used the new cables, but couldn't switch the cables at the motherboard, had to switch the cables at the drives because one of the plugs has a 90deg bend and the other port on the board is obstructed. For my own edification, which log and which details are we looking at for the ATA errors? (if you can explain without too much effort that is, I'm so grateful for the help, I want to learn more but I don't want to put you guys out any more than I have to) the-dark-tower-diagnostics-20220817-1409.zip Quote Link to comment
JorgeB Posted August 17, 2022 Share Posted August 17, 2022 ATA errors are in the syslog, like these: Aug 17 14:06:48 The-Dark-Tower kernel: ata4: link is slow to respond, please be patient (ready=0) Aug 17 14:06:50 The-Dark-Tower kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Aug 17 14:06:50 The-Dark-Tower kernel: ata4.00: configured for UDMA/133 Less than before so far, but hey followed the parity disk, so that suggests a device problem. Quote Link to comment
geofbennett Posted August 17, 2022 Author Share Posted August 17, 2022 Thanks again. After resetting everything back the way it was before (including original cables) I see that the ATA6 is showing the "slow to respond" message as well as some other messages that are not being mentioned for any of the other ports. Ok, new drive should be here Friday. Once it is installed should I check the diagnostics before or after rebuilding parity? Or Both? the-dark-tower-diagnostics-20220817-1704.zip Quote Link to comment
JorgeB Posted August 18, 2022 Share Posted August 18, 2022 After the rebuild should be enough, unless you see any issues. Quote Link to comment
geofbennett Posted September 22, 2022 Author Share Posted September 22, 2022 Just as a follow up and to close this out in case anybody has the same or similar problem in the future... I replaced the parity drive, restarted, and rebuilt parity over 30 days ago. Every weekly parity check since has turned up with zero errors. I also reset the data drive that had reported errors and it has not reported any problems since (though I have a feeling it will before long) The attached diagnostics no longer show any of the ata links as slow to respond. Thanks again to everybody for their help. the-dark-tower-diagnostics-20220922-1035.zip 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.