Jump to content

Parity check completed Finding 37756 errors...next steps?


Go to solution Solved by JorgeB,

Recommended Posts

Hi-

Running Version: 6.9.2 and the parity check that I have scheduled to run every 90 days threw the following:

 

Last check completed on Sat 06 Jan 2024 07:08:41 PM PST (four days ago)
Finding 37756 errors Duration: 18 hours, 8 minutes, 40 seconds. Average speed: 153.1 MB/sec

 

I have a single 10TB parity drive and four 10TB drives in the array which is at 64.6% capacity. What would the next steps be in order to preserve all of the data on the array. I'm not sure how to figure out which drive(s) are failing / throwing the errors. I checked the individual drives SMART reports, and they all show as completed without errors.

 

No power outages to report as the server is connected to a UPS device.

 

Thanks.

 

Edited to add some parity check history for reference. No recent hardware changes

 

Date     Duration     Speed     Status     Errors

2024-01-06, 19:08:41 18 hr, 8 min, 40 sec 153.1 MB/s OK 37756

2023-10-07, 19:04:22 18 hr, 4 min, 20 sec 153.7 MB/s OK 0

2023-07-01, 19:04:26 18 hr, 4 min, 25 sec 153.7 MB/s OK 0

2023-04-01, 19:07:37 18 hr, 7 min, 36 sec 153.3 MB/s OK 0

2023-01-07, 19:05:34 18 hr, 5 min, 33 sec 153.5 MB/s OK 0

2022-10-01, 19:04:53 18 hr, 4 min, 52 sec 153.6 MB/s OK 0

Edited by propman07
Added parity check history
Link to comment

JorgeB- Got it. I'll work on ordering cables and run a parity check once I get them installed. If you wouldn't mind, could you point out what led you to believe that there were ATA errors in the log file? I'd like to learn more about troubleshooting these types of issues.

 

Thanks.

 

When I do perform the next parity check, should I select the box to write the corrections or leave it blank?

Screenshot 2024-01-12 144736.jpg

Edited by propman07
edited to add question about writing
Link to comment
10 hours ago, propman07 said:

could you point out what led you to believe that there were ATA errors in the log file?

 

They look like this:

 

Oct 26 22:00:25 DLVTOWER kernel: ata3.00: exception Emask 0x10 SAct 0xfa3863f9 SErr 0x90202 action 0xe frozen
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed
Oct 26 22:00:25 DLVTOWER kernel: ata3: SError: { RecovComm Persist PHYRdyChg 10B8B }
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:00:28:34:15/00:00:80:03:00/40 tag 0 ncq dma 16384 in
Oct 26 22:00:25 DLVTOWER kernel:         res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error)
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY }
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:18:f8:be:f5/00:00:9d:00:00/40 tag 3 ncq dma 16384 in
Oct 26 22:00:25 DLVTOWER kernel:         res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error)
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY }
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: failed command: READ FPDMA QUEUED
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: cmd 60/20:20:18:fc:c7/00:00:b0:00:00/40 tag 4 ncq dma 16384 in
Oct 26 22:00:25 DLVTOWER kernel:         res 40/00:00:08:d7:0e/00:00:80:02:00/40 Emask 0x10 (ATA bus error)
Oct 26 22:00:25 DLVTOWER kernel: ata3.00: status: { DRDY }

 

Run a non correcting check first.

Link to comment
  • 2 weeks later...
  • 1 month later...

Posting an update.

 

I was finally able to get some new SATA cables installed. Ran not correcting parity check and it returned 6 errors. Diagnostic log attached.

 

How would I go about determining what the 6 sync errors are?

 

Thanks.

 

2024-03-10, 14:15:09    17 hr, 59 min, 14 sec    154.4 MB/s    OK    6
2024-02-25, 23:02:12    52 sec    Unavailable    Canceled    4
2024-01-06, 19:08:41    18 hr, 8 min, 40 sec    153.1 MB/s    OK    37756

dlvtower-diagnostics-20240310-1417.zip

Link to comment

Much better.

Mar  9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=0
Mar  9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=8
Mar  9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=16
Mar  9 19:15:55 DLVTOWER kernel: md: recovery thread: P incorrect, sector=32
Mar  9 21:07:40 DLVTOWER kernel: md: recovery thread: P incorrect, sector=2576730632
Mar  9 21:07:40 DLVTOWER kernel: md: recovery thread: P incorrect, sector=2576730640

Run a correcting parity check. Then post new diagnostics so we can see if it finds exactly those same few sectors and corrects them.

Link to comment

Disk access will affect parity check speed, and parity check will affect disk access speed.

 

But file access, even writing files, will not affect parity results.

 

If you and your dockers don't access the disks a lot during the parity check, then it won't make a lot of difference.

Link to comment
Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=0
Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=8
Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=16
Mar 10 15:20:20 DLVTOWER kernel: md: recovery thread: P corrected, sector=32
Mar 10 17:16:41 DLVTOWER kernel: md: recovery thread: P corrected, sector=2576730632
Mar 10 17:16:41 DLVTOWER kernel: md: recovery thread: P corrected, sector=2576730640

It found exactly the same parity errors and this time it corrected them.

4 hours ago, itimpi said:

The next check should report 0 errors.

 

Link to comment
  • 4 weeks later...

Final update-

 

I ran my scheduled parity check, and it completed with no errors. Thanks to all on the thread for your help.

 

                                 Date                                                Duration                   Speed       Status  Errors
Parity-Check    2024-04-06, 20:21:09 (Saturday)    10 TB    19 hr, 21 min, 8 sec    143.5 MB/s    OK    0
Parity-Check    2024-03-11, 10:59:41 (Monday)    10 TB    19 hr, 39 min, 22 sec    141.3 MB/s    OK    6

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...