propman07 Posted January 11 Share Posted January 11 (edited) Hi- Running Version: 6.9.2 and the parity check that I have scheduled to run every 90 days threw the following: Last check completed on Sat 06 Jan 2024 07:08:41 PM PST (four days ago) Finding 37756 errors Duration: 18 hours, 8 minutes, 40 seconds. Average speed: 153.1 MB/sec I have a single 10TB parity drive and four 10TB drives in the array which is at 64.6% capacity. What would the next steps be in order to preserve all of the data on the array. I'm not sure how to figure out which drive(s) are failing / throwing the errors. I checked the individual drives SMART reports, and they all show as completed without errors. No power outages to report as the server is connected to a UPS device. Thanks. Edited to add some parity check history for reference. No recent hardware changes Date Duration Speed Status Errors 2024-01-06, 19:08:41 18 hr, 8 min, 40 sec 153.1 MB/s OK 37756 2023-10-07, 19:04:22 18 hr, 4 min, 20 sec 153.7 MB/s OK 0 2023-07-01, 19:04:26 18 hr, 4 min, 25 sec 153.7 MB/s OK 0 2023-04-01, 19:07:37 18 hr, 7 min, 36 sec 153.3 MB/s OK 0 2023-01-07, 19:05:34 18 hr, 5 min, 33 sec 153.5 MB/s OK 0 2022-10-01, 19:04:53 18 hr, 4 min, 52 sec 153.6 MB/s OK 0 Edited January 11 by propman07 Added parity check history Quote Link to comment
propman07 Posted January 12 Author Share Posted January 12 Thanks. See attached. tower-diagnostics-20240111-1823.zip Quote Link to comment
Solution JorgeB Posted January 12 Solution Share Posted January 12 There are ATA errors for parity, not necessarily what caused the issue, but still should be fixed, replace cables and see if more errors are found on the next check. Quote Link to comment
propman07 Posted January 12 Author Share Posted January 12 (edited) JorgeB- Got it. I'll work on ordering cables and run a parity check once I get them installed. If you wouldn't mind, could you point out what led you to believe that there were ATA errors in the log file? I'd like to learn more about troubleshooting these types of issues. Thanks. When I do perform the next parity check, should I select the box to write the corrections or leave it blank? Edited January 12 by propman07 edited to add question about writing Quote Link to comment
JorgeB Posted January 13 Share Posted January 13 10 hours ago, propman07 said: could you point out what led you to believe that there were ATA errors in the log file? They look like this: Oct 26 22:00:25 DLVTOWER kernel: ata3.00: exception Emask 0x10 SAct 0xfa3863f9 SErr 0x90202 action 0xe frozen Oct 26 22:00:25 DLVTOWER kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed Oct 26 22:00:25 DLVTOWER kernel: ata3: SError: { RecovComm Persist PHYRdyChg 10B8B } Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:00:28:34:15/00:00:80:03:00/40 tag 0 ncq dma 16384 in Oct 26 22:00:25 DLVTOWER kernel: res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error) Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY } Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:18:f8:be:f5/00:00:9d:00:00/40 tag 3 ncq dma 16384 in Oct 26 22:00:25 DLVTOWER kernel: res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error) Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY } Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:20:18:fc:c7/00:00:b0:00:00/40 tag 4 ncq dma 16384 in Oct 26 22:00:25 DLVTOWER kernel: res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error) Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY } Run a non correcting check first. Quote Link to comment
propman07 Posted January 25 Author Share Posted January 25 Sorry for late reply. I'll run a non-correcting check after I get the SATA cables swapped out. Thanks again for the help. Quote Link to comment
propman07 Posted March 10 Author Share Posted March 10 Posting an update. I was finally able to get some new SATA cables installed. Ran not correcting parity check and it returned 6 errors. Diagnostic log attached. How would I go about determining what the 6 sync errors are? Thanks. 2024-03-10, 14:15:09 17 hr, 59 min, 14 sec 154.4 MB/s OK 6 2024-02-25, 23:02:12 52 sec Unavailable Canceled 4 2024-01-06, 19:08:41 18 hr, 8 min, 40 sec 153.1 MB/s OK 37756 dlvtower-diagnostics-20240310-1417.zip Quote Link to comment
trurl Posted March 10 Share Posted March 10 Much better. Mar 9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=0 Mar 9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=8 Mar 9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=16 Mar 9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=32 Mar 9 21:07:40 DLVTOWER kernel: md: recovery thread: P incorrect, sector=2576730632 Mar 9 21:07:40 DLVTOWER kernel: md: recovery thread: P incorrect, sector=2576730640 Run a correcting parity check. Then post new diagnostics so we can see if it finds exactly those same few sectors and corrects them. Quote Link to comment
propman07 Posted March 10 Author Share Posted March 10 Thanks for the reply... I'll run the correcting parity check as recommended. Is it a good idea to shut down all docker containers, or just let the parity check run with them running? Thanks. Quote Link to comment
trurl Posted March 10 Share Posted March 10 Disk access will affect parity check speed, and parity check will affect disk access speed. But file access, even writing files, will not affect parity results. If you and your dockers don't access the disks a lot during the parity check, then it won't make a lot of difference. Quote Link to comment
propman07 Posted March 10 Author Share Posted March 10 Makes sense. Thanks for the info...running correcting parity check now. Will post results when complete. Quote Link to comment
propman07 Posted March 12 Author Share Posted March 12 Parity check completed. Last check completed on Mon 11 Mar 2024 10:59:41 AM PDT (today) Duration: 19 hours, 39 minutes, 22 seconds. Average speed: 141.3 MB/s Finding 6 errors Diagnostics attached. Thanks. dlvtower-diagnostics-20240311-2237.zip Quote Link to comment
itimpi Posted March 12 Share Posted March 12 That looks good. A correcting check reports as 'errors' each sector it corrects. The next check should report 0 errors. Quote Link to comment
trurl Posted March 12 Share Posted March 12 Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=0 Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=8 Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=16 Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=32 Mar 10 17:16:41 DLVTOWER kernel: md: recovery thread: P corrected, sector=2576730632 Mar 10 17:16:41 DLVTOWER kernel: md: recovery thread: P corrected, sector=2576730640 It found exactly the same parity errors and this time it corrected them. 4 hours ago, itimpi said: The next check should report 0 errors. Quote Link to comment
propman07 Posted March 12 Author Share Posted March 12 trurl / itimpi- Thanks for the replies. Next time, I'll know how to proceed. Thanks again for the help. Quote Link to comment
propman07 Posted April 7 Author Share Posted April 7 Final update- I ran my scheduled parity check, and it completed with no errors. Thanks to all on the thread for your help. Date Duration Speed Status Errors Parity-Check 2024-04-06, 20:21:09 (Saturday) 10 TB 19 hr, 21 min, 8 sec 143.5 MB/s OK 0 Parity-Check 2024-03-11, 10:59:41 (Monday) 10 TB 19 hr, 39 min, 22 sec 141.3 MB/s OK 6 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.