parity check triggered on reboot

Followers

June 23, 20233 yr

I was getting crc count error warnings for one of my disks (sdi in attached diagnostics), so I was moving it around to different drive bays (so as to use different SATA ports and cables), but somehow its always the same disk which seems to show CRC errors. Nothing was wrong with data / filesystems during this, except for the latest reboot. After the reboot, a parity check started automatically

1. can I know what caused this? I assume unclean shutdown, timeout on something exceeded, but I can't see what

2. the trouble disk was on a btrfs pool of 2 HDDs. Why did the array get affected?

3. Anything else I am missing? is there another underlying issue

I upgraded from 6.11.5 to 6.12.1 yesterday, and it went smooth including importing a zfs pool. I don't believe this is probably the culprit, but still worth a mention

the log is full of disk errors, but I cannot pinpoint which disk. I am assuming sdi based on creeping CRC errors

diagnostics attached

godaam-diagnostics-20230623-1312.zip

Quote

Solved by JorgeB

June 23, 20233 yr

Go to solution

June 23, 20233 yr

2 hours ago, apandey said:

1. can I know what caused this? I assume unclean shutdown

Correct, there should be diags saved in the flash drive /logs folder from that shutdown that might show the issue.

Quote

June 23, 20233 yr

Author

6 hours ago, JorgeB said:

Correct, there should be diags saved in the flash drive /logs folder from that shutdown that might show the issue.

nice. I didn't know a diag is automatically saved in such a case. Attached the one from shutdown. seems the troubled btrfs pool drive had problems unmounting

diags attached

godaam-diagnostics-20230623-1255.zip

I am still not clear why this should mark the array dirty. a bit scary if issues with a cache pool would affect array operations

so what should be my next steps? I am not very familiar with btrfs recovery. I ran a scrub before reboot and it seemed OK, not sure how I see the same issue that the logs see. If the disk needs replacing, how do I replace it?

Quote

June 23, 20233 yr

Solution

The issues with tsdb2 disk start write from boot, and are constantly spamming the log, looks more like a connection issue, replace cables for that disk or swap it to a different controller.

Quote

June 23, 20233 yr

Author

13 minutes ago, JorgeB said:

The issues with tsdb2 disk start write from boot, and are constantly spamming the log, looks more like a connection issue, replace cables for that disk or swap it to a different controller.

OK, I will swap things out once the current parity check finishes. I have also increased the shutdown timeout in disk settings for now to avoid this for next shutdown

I did move the drive from motherboard SATA to my LSI controller when the crc errors first showed up. The drive bays use different cables too. I will also examine the drive side connectors this time when I swap it. I have one more spare slot where I can try to put the disk

If this continues, is there a way I can downgrade tsdb to a single drive pool temporarily? would like to take the disk out and test it outside the system if swapping cables etc doesnt work

Quote

June 23, 20233 yr

27 minutes ago, apandey said:

I did move the drive from motherboard SATA to my LSI controller

It's not using the LSI controller, it's connected to the Intel SCU controller, that's the secondary Intel SATA/SAS controller

Quote

June 24, 20233 yr

Author

10 hours ago, JorgeB said:

It's not using the LSI controller, it's connected to the Intel SCU controller, that's the secondary Intel SATA/SAS controller

arrrrgggh. sorry, and thanks for spotting this. I mistakenly moved the other 2TB drive to the LSI controller. no wonder the crc errors didn't stop.

I have a spare port on the motherboard SATA, so will rewire to that and report back

Quote

June 24, 20233 yr

Author

I have switched the affected drive to the other motherboard SATA controller, so far so good

started a scrub on the tsdb pool, and its reporting some errors. I did not check the fix errors

Scrub started:    Sat Jun 24 11:12:36 2023
Status:           running
Duration:         0:13:38
Time left:        3:06:22
ETA:              Sat Jun 24 14:32:36 2023
Total to scrub:   2.48TiB
Bytes scrubbed:   173.21GiB  (6.82%)
Rate:             216.83MiB/s
Error summary:    verify=2 csum=120266
  Corrected:      0
  Uncorrectable:  0
  Unverified:     0

how do I go about fixing the errors? should I run with fix errors checkbox? need to wait for initial scrube to finish, or just cancel and do the fix steps?

EDIT: started a correcting scrub

Edited June 24, 20233 yr by apandey
updated on scrub

Quote

June 24, 20233 yr

4 hours ago, apandey said:

EDIT: started a correcting scrub

Yep

Quote

June 24, 20233 yr

Author

scrub finished and corrected all verify + csum errors. thanks for pointers

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

parity check triggered on reboot

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)