BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 106017132, rd 82906, flush 0, corrupt 0, gen 0

Followers

May 6, 20233 yr

I started getting these notifications a few weeks ago, "fstrim: /mnt/mediacache: the discard operation is not supported". I went to poke around and checked my logs to find a wall of this: BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 106017132, rd 82906, flush 0, corrupt 0, gen 0

what causes this, and what should I do to remedy the situation?

Shut down, reseat cables, start back up?

I've uploaded my diagnostics

skynet-diagnostics-20230506-0817.zip

Edited May 7, 20233 yr by Neldonado

Quote

Solved by JorgeB

May 15, 20233 yr

Go to solution

May 6, 20233 yr

Author

Quote

3 yr3 yr Neldonado changed the title to BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 106017132, rd 82906, flush 0, corrupt 0, gen 0

May 7, 20233 yr

Author

I’ve got roughly 1.2TB of data on this cache pool, would it be safe to run the mover and get the data off?

Quote

May 7, 20233 yr

Community Expert

Syslog rotated so cannot see the beginning of the problem, but looks like this device dropped offline a few days ago:

May  4 04:43:00 Skynet kernel: BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 67401451, rd 44447, flush 0, corrupt 0, gen 0

Reboot and post new diags after array start.

Quote

May 7, 20233 yr

Author

new diagnostics

skynet-diagnostics-20230507-0831.zip

Quote

May 7, 20233 yr

Author

looks like the drive is throwing the same errors, I imagine it'll drop offline any minute.

Quote

May 8, 20233 yr

Community Expert

No device errors so far, the ones you see logged is btrfs bringing that device up to sync, SMART looks good, you should now run a scrub, if it happens again replace the cables, also take a look here for better pool monitoring so you're notified if there's a problem.

Quote

May 8, 20233 yr

Author

3 hours ago, JorgeB said:

No device errors so far, the ones you see logged is btrfs bringing that device up to sync, SMART looks good, you should now run a scrub, if it happens again replace the cables, also take a look here for better pool monitoring so you're notified if there's a problem.

Are these the same errors (see picture) sdx is the other drive in this cache pool.

BTFS error (device sdl: state EA): parent transid verify failed on 316334

9286912 wanted 36459 found 36425

uploading diagnostics again. Somethings weird going on, I replaced all my cables a month or two ago and I just started noticing these errors out of nowhere.

skynet-diagnostics-20230508-0436.zip

Quote

May 8, 20233 yr

Community Expert

20 minutes ago, Neldonado said:

you should now run a scrub

make sure all errors are corrected.

Quote

May 8, 20233 yr

Author

So is this what I want to do?

btrfs scrub start -B -d -r /dev/sdaa1 

and 

btrfs scrub start -B -d -r /dev/sdx1

Quote

May 8, 20233 yr

Community Expert

You can use the GUI, click on the first pool member and scroll down to the scrub section.

Quote

May 8, 20233 yr

Author

14 minutes ago, JorgeB said:

You can use the GUI, click on the first pool member and scroll down to the scrub section.

So I do that and it refreshes and says aborted?

UUID: xxxx

Scrub started: Mon May 8 05:57:58 2023 Status: aborted Duration: 0:00:00 Total to scrub: 3.04TiB Rate: 0.00B/s Error summary: no errors found

Quote

May 8, 20233 yr

Community Expert

Post new diags to see if there's something there.

Quote

May 8, 20233 yr

Author

New diagnostics

skynet-diagnostics-20230508-0655.zip

Quote

May 8, 20233 yr

Community Expert

I see that the scrub is aborting but not why it is aborting, reboot and try again, if the issue persists best to backup and recreate the pool.

Quote

May 8, 20233 yr

Author

6 minutes ago, JorgeB said:

I see that the scrub is aborting but not why it is aborting, reboot and try again, if the issue persists best to backup and recreate the pool.

Tons of errors being corrected… this is all good I hope? Looks like I’ve got some downtime before it’s finished.

Quote

May 8, 20233 yr

Community Expert

As long as they all are corrected it's good, disk that before dropped offline needs to be synced up.

Quote

May 10, 20233 yr

Author

On 5/8/2023 at 7:47 AM, JorgeB said:

As long as they all are corrected it's good, disk that before dropped offline needs to be synced up.

Finished scrubbing and including diagnostics after finishing.

also noticed my log is full, what should next steps be?

skynet-diagnostics-20230510-1535.zip

Quote

May 11, 20233 yr

Community Expert

Pool should be fixed, reboot to clear the log, also see the link I posted above to reset the pool stats and keep monitoring.

Quote

May 15, 20233 yr

Author

On 5/11/2023 at 12:53 AM, JorgeB said:

Pool should be fixed, reboot to clear the log, also see the link I posted above to reset the pool stats and keep monitoring.

OK, it's been almost 4 days and I started getting errors again.

diagnostics attached.

skynet-diagnostics-20230515-0334.zip

Quote

May 15, 20233 yr

Community Expert

Disk dropped offline again, check/replace cables or swap with a different disk.

Quote

May 15, 20233 yr

Author

These cables are relatively new, is there any way to see if it’s just a bad disk or a power issue?

Quote

May 15, 20233 yr

Community Expert
Solution

Swap both SATA and power cable with a different disk, then see where the problem follows.

Quote

3 weeks later...

May 30, 20233 yr

Author

Calling this closed for now, I swapped the drives around and haven't noticed anything

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

BTRFS error (device sdaa1): bdev /dev/sdaa1 errs: wr 106017132, rd 82906, flush 0, corrupt 0, gen 0

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)