BTRFS Error: csum mismatch on free space cache

Napper198 · October 28, 2017

I don't exactly what happened but it appears that either my main cache SSD is straight up broken or the motherboard has some sort of issue, however I started getting emails with the following text:

fstrim: /mnt/cache: FITRIM ioctl failed: Input/output error

After restarting the server once the drive was not detected anymore.

So I popped the SSD into a external enclosure and plugged it into another PC to see if the drive was indeed bad, but it got detected.

Back into the server it got detected by the BIOS and after booting unRAID back up everything seemed fine at first.

I noticed that my Emby Docker was acting up and after a short search found out that the given error is drive related and a look at the unRAID log revealed the error mentioned in the title.

So what now?

iduna-diagnostics-20171028-2118.zip

JorgeB · October 28, 2017

The error in the title is just a warning and it can usually be ignored.

5 minutes ago, Napper198 said:

After restarting the server once the drive was not detected anymore.

This is more serious, can you post the output of:

btrfs dev stats /mnt/cache

Napper198 · October 28, 2017

4 minutes ago, johnnie.black said:
The error in the title is just a warning and it can usually be ignored.

This is more serious, can you post the output of:
btrfs dev stats /mnt/cache

root@Iduna:~# btrfs dev stats /mnt/cache
[/dev/sdb1].write_io_errs   8699682
[/dev/sdb1].read_io_errs    9130479
[/dev/sdb1].flush_io_errs   10864
[/dev/sdb1].corruption_errs 433
[/dev/sdb1].generation_errs 1
[/dev/sdc1].write_io_errs   0
[/dev/sdc1].read_io_errs    0
[/dev/sdc1].flush_io_errs   0
[/dev/sdc1].corruption_errs 0
[/dev/sdc1].generation_errs 0

thanks for the quick reply

Edit: since the corruption error match with the scrub I did after reseating the SSD that data might be old and due to me not being at home the degraded stat was going for about 4-5h

Edited October 28, 2017 by Napper198

JorgeB · October 28, 2017

As you can see lots of errors on the sdb SSD (cache1), these are usually the result of a bad cable, replace both cables for that SSD, run a correcting scrub, make sure there are no uncorrectable errors, then reset the stats with:

btrfs dev stats -z /mnt/cache

Check stats again after 1 or 2 days to see they remain at 0.

Napper198 · October 28, 2017

8 minutes ago, johnnie.black said:
As you can see lots of errors on the sdb SSD (cache1), these are usually the result of a bad cable, replace both cables for that SSD, run a correcting scrub, make sure there are no uncorrectable errors, then reset the stats with:
btrfs dev stats -z /mnt/cache
Check stats again after 1 or 2 days to see they remain at 0.

will try and report later. Thanks

Napper198 · October 28, 2017

Uh, that doesn't look too well:

Edit: keeps scrolling as well

Edited October 28, 2017 by Napper198

JorgeB · October 28, 2017

That's your docker image, it's corrupt and it's kind of expected with so many errors, you'll need to delete and recreate.

Napper198 · October 28, 2017

2 minutes ago, johnnie.black said:

That's your docker image, it's corrupt and it's kind of expected with so many errors, you'll need to delete and recreate.

that is the system.

I changed the cable and the port on the motherboard.

The drive labels have changed now. Unraid picked sdc as Parity 1 (which should be fine since this is the good drive)

Should I try to reformat the bad drive?

JorgeB · October 28, 2017

1 minute ago, Napper198 said:

that is the system.

Not sure what you mean, loop0 is the docker image, you just need to delete and recreate:

JorgeB · October 28, 2017

3 minutes ago, Napper198 said:

Should I try to reformat the bad drive?

Run a correcting scrub, only if there are uncorrectable errors do you need to take more drastic measures, like re-formatting the pool.

Napper198 · October 28, 2017

2 minutes ago, johnnie.black said:

Not sure what you mean, loop0 is the docker image, you just need to delete and recreate:

Oh, I see. A scrub on the docker Image held no results so I guess I'll try to delete it now

Napper198 · October 28, 2017

After deleting and rebuilding the docker file it seems mostly fine but I may need to delete some files from the emby docker config folder as they appear to be corrupted.

If nothing comes up I'll mark this as solved on monday.Thanks for the quick help again.

BTRFS Error: csum mismatch on free space cache

Recommended Posts

Napper198

Link to comment

JorgeB

Link to comment

Napper198

Link to comment

JorgeB

Link to comment

Napper198

Link to comment

Napper198

Link to comment

JorgeB

Link to comment

Napper198

Link to comment

JorgeB

Link to comment

JorgeB

Link to comment

Napper198

Link to comment

Napper198

Link to comment

Join the conversation