BTRFS Error: csum mismatch on free space cache


Recommended Posts

I don't exactly what happened but it appears that either my main cache SSD is straight up broken or the motherboard has some sort of issue, however I started getting emails with the following text: 

 

fstrim: /mnt/cache: FITRIM ioctl failed: Input/output error

 

After restarting the server once the drive was not detected anymore.

So I popped the SSD into a external enclosure and plugged it into another PC to see if the drive was indeed bad, but it got detected.

Back into the server it got detected by the BIOS and after booting unRAID back up everything seemed fine at first.

I noticed that my Emby Docker was acting up and after a short search found out that the given error is drive related and a look at the unRAID log revealed the error mentioned in the title.

 

So what now?

iduna-diagnostics-20171028-2118.zip

Link to comment
4 minutes ago, johnnie.black said:

The error in the title is just a warning and it can usually be ignored.

 

 

This is more serious, can you post the output of:

 


btrfs dev stats /mnt/cache

 

root@Iduna:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs   8699682
[/dev/sdb1].read_io_errs    9130479
[/dev/sdb1].flush_io_errs   10864
[/dev/sdb1].corruption_errs 433
[/dev/sdb1].generation_errs 1
[/dev/sdc1].write_io_errs   0
[/dev/sdc1].read_io_errs    0
[/dev/sdc1].flush_io_errs   0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0

thanks for the quick reply

 

Edit: since the corruption error match with the scrub I did after reseating the SSD that data might be old and due to me not being at home the degraded stat was going for about 4-5h

Edited by Napper198
Link to comment

As you can see lots of errors on the sdb SSD (cache1), these are usually the result of a bad cable, replace both cables for that SSD, run a correcting scrub, make sure there are no uncorrectable errors, then reset the stats with:

 

btrfs dev stats -z /mnt/cache

Check stats again after 1 or 2 days to see they remain at 0.

  • Like 1
  • Upvote 1
Link to comment
8 minutes ago, johnnie.black said:

As you can see lots of errors on the sdb SSD (cache1), these are usually the result of a bad cable, replace both cables for that SSD, run a correcting scrub, make sure there are no uncorrectable errors, then reset the stats with:

 


btrfs dev stats -z /mnt/cache

Check stats again after 1 or 2 days to see they remain at 0.

will try and report later. Thanks :)

Link to comment
2 minutes ago, johnnie.black said:

That's your docker image, it's corrupt and it's kind of expected with so many errors, you'll need to delete and recreate.

that is the system.

I changed the cable and the port on the motherboard.

The drive labels have changed now. Unraid picked sdc as Parity 1 (which should be fine since this is the good drive)

Should I try to reformat the bad drive?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.