Jump to content

Parity-Check errors after failing drive and rebuilding array from parity - what now?


Recommended Posts

Hi.

 

I had a failing drive a view days ago. 24 errors on the write collumn indicates that the disk went offline immediatly, so it had not a lot of time to corrupt the data fortunatelly.

After seeing this i swaped the disk with a new one and rebuilt the data from parity. Note that i did use the array while this process was going on. There were no errors or what so ever in the syncing process, so i thought I am good but out of curiosity initiated a parity check.

 

The parity check finished today, and throw 1024 errors. Now I don't know if the corrupt data is on my freshly inserted disk, or on the parity disk.

 

What should I do now? Since I am using only two drives in the array (one data, one parity) it is therotically possible to see which files are effected. How can I see, which ones?

 

 

server-diagnostics-20230404-1254.zip

Edited by Greyberry
Link to comment

Thank you for your reply.

 

I also see them in the syslog, but how do I know that this is in fact disk1? (Not that I don't belive you, but so that i can debug it for myself in the future.)

 

If disk1 (data-disk) is the problem, wouldn't it be better to remove disk1 from the array and start a new sync/repair process from parity once the issues are resolved? instead of doing a correcting parity check, which would write the corrupt data from the data-disk to parity.?

Link to comment
5 minutes ago, Greyberry said:

but how do I know that this is in fact disk1?

On the main GUI page click on the disk icon for that disk to see the related log.

 

5 minutes ago, Greyberry said:

If disk1 (data-disk) is the problem, wouldn't it be better to remove disk1 from the array and start a new sync/repair process from parity once the issues are resolved?

Doesn't look like a disk problem, more a power/connection problem, but if errors persist after replacing the cables (both power and SATA) it could be the disk.

Link to comment
4 minutes ago, JorgeB said:

On the main GUI page click on the disk icon for that disk to see the related log.

You saw the ATA errors in the syslog. What i wanted to know is, how do you know that these are related to disk1? (and not parity disk?)

Apr  3 19:21:28 SERVER kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr  3 19:21:28 SERVER kernel: ata3.00: cmd 60/28:58:10:12:1e/02:00:e3:00:00/40 tag 11 ncq dma 282624 in
Apr  3 19:21:28 SERVER kernel:         res 40/00:60:38:14:1e/00:00:e3:00:00/40 Emask 0x10 (ATA bus error)
Apr  3 19:21:28 SERVER kernel: ata3.00: status: { DRDY }
Apr  3 19:21:28 SERVER kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr  3 19:21:28 SERVER kernel: ata3.00: cmd 60/d0:60:38:14:1e/02:00:e3:00:00/40 tag 12 ncq dma 368640 in
Apr  3 19:21:28 SERVER kernel:         res 40/00:60:38:14:1e/00:00:e3:00:00/40 Emask 0x10 (ATA bus error)
Apr  3 19:21:28 SERVER kernel: ata3.00: status: { DRDY }
Apr  3 19:21:28 SERVER kernel: ata3: hard resetting link
Apr  3 19:21:29 SERVER kernel: ata3: SATA link down (SStatus 0 SControl 300)
Apr  3 19:21:34 SERVER kernel: ata3: hard resetting link
Apr  3 19:21:39 SERVER kernel: ata3: link is slow to respond, please be patient (ready=0)
Apr  3 19:21:42 SERVER kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr  3 19:21:42 SERVER kernel: ata3.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
Apr  3 19:21:42 SERVER kernel: ata3.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
Apr  3 19:21:42 SERVER kernel: ata3.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
Apr  3 19:21:42 SERVER kernel: ata3.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out

 

 

 

4 minutes ago, JorgeB said:

Doesn't look like a disk problem, more a power/connection problem, but if errors persist after replacing the cables (both power and SATA) it could be the disk.

Yes I know. But:

Quote

check/replace those cables and run a correcting check.

Doesn't it make more sense to do a DISK-REBUILD (parity --> data) instead of a correcting parity-check? (data --> parity)

Because in this case it is more likely that the data is corrupt and the parity is in tact.

Link to comment
3 minutes ago, Greyberry said:

What i wanted to know is, how do you know that these are related to disk1?

Click here:

imagem.png

 

7 minutes ago, Greyberry said:

Doesn't it make more sense to do a DISK-REBUILD (parity --> data) instead of a correcting parity-check? (data --> parity)

Because in this case it is more likely that the data is corrupt and the parity is in tact.

You can do that, but unless you have checksums or were using btrfs/zfs no way to know for certain.

Link to comment
39 minutes ago, JorgeB said:

Click here:

imagem.png

You couldn't do that on my machine, could you?

I wanted to know how you knew FROM THE LOGS the errors were from disk1.

 

39 minutes ago, JorgeB said:

You can do that, but unless you have checksums or were using btrfs/zfs no way to know for certain.

yeah disk1 (data-disk) is corrupt, so I think it is better to rebuild the data from parity.

Link to comment
52 minutes ago, Greyberry said:

You couldn't do that on my machine, could you?

Not sure what you mean, why not? Misread as you couldn't do that.

 

I see which disk it is based on the full diags, depending on which controller it is using, in this case using lsscsi.txt, but for you it's easier to just click that.

  • Like 1
Link to comment
2 hours ago, JorgeB said:

Not sure what you mean, why not? Misread as you couldn't do that.

 

I see which disk it is based on the full diags, depending on which controller it is using, in this case using lsscsi.txt, but for you it's easier to just click that.

Thank you! 🙂 Now I know. Sometimes it is faster to do it via terminal or look into the diagnostics when you have that opened anyway.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...