6b10a BTRFS error

October 7, 201411 yr

Hi, i am running 6b10a and i have an SSD for cache only. The SSD is formatted as BTRFS and in the lat weeks I am encountering some errors reported on my main tft connected to unraid. I am struggling myself with kvm and i think this errors may be tthe cause of my lack of success while starting my VM.

Today I started the array, and without doing anything else this is what I get (EDIT: I only edited my .cfg file located in the array):

I hope someone can confirm this is an error. And also how to solve it.

Rgds.

Quote

October 7, 201411 yr

Since migrating my cache to btrfs a few weeks ago, Ive been getting lost of indecipherable errors as well. Twice my Xen VMs (located on the cache drive) have crashed forcing me to reboot the system,. Most recently, the VMs went into read only mode causing all sorts of problems.

I've just last night re-formatted the cache as XFS and restored my VMs from backup. Time will tell but I've certaibnly lost confidence in btrfs and will not be using it on any of my data drives.

Peter

Quote

October 7, 201411 yr

Author

Thx for your feedback, I did run into same/similar errors too. Too anoying for me know while I don´t have much time to try unstable releases.

I did the same thing but formatted the cache SSD to ReiserFS...never had a problem with it before running 5.0-beta11 for a long time...time will tell.

Let us know how XFS works if used as cache.

Rgds.

Quote

October 7, 201411 yr

OK, for those having problems on btrfs and using KVM virtual machines, are you using RAW images for your image types or QCOW/QCOW2?

Quote

October 8, 201411 yr

Author

Hi jon, i am using qcow2 images.

Rgds.

Quote

October 8, 201411 yr

Author

Hi jon,

today I started the array using this syslinux lines:

label KVM unRAID OS 6b10a
  menu default
  kernel /bzimage
  append amd_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream initrd=/bzroot

I connected via putty and entered:

chmod 755 vfio-bind
vfio-bind 0000:03:00.0 0000:00:14.2 0000:07:00.0 0000:08:00.0 0000:00:11.0

...and this is what I get:

Tower2 login: REISERFS abort (device sdd1): Journal write error in flush_commit_list

...I then stop the array and I get:

REISERFS abort (device md1): Journal write error in flush_commit_list.

This is also what I get:

.

Yesterday I formatted my cache only SSD to reiserFS, but as you can see I have one of the problems that also arised while using Btrfs. Let me know if you need my syslog.

Rgds.

Quote

October 8, 201411 yr

Author

Hi again, sorry for my last post. I checked for it in depth and the error I was reporting today was caused becaused I was passing through 1 sata port (0000:00:11.0) using the vfio-bind command. I eliminated it from the .xml too and the problem was solved.

I must assume that the reported error yesterday was caused by the same issue.

Sorry jon.

Rgds.

Quote

November 19, 201411 yr

I just migrated my cache drive from a single spinner to 3x SSDs. I am now seeing these errors:

Nov 19 11:31:22 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 8048746496 csum 1267341460 expected csum 2413356607
Nov 19 11:31:22 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 17655549952 csum 4046818684 expected csum 3672383649

To answer your question Jon, my VMs are all qcow2 running from /mnt/cache/VMs. My cache drive is not being used for cache.

Is there a command I should run to check/fix these errors?

John

Quote

November 19, 201411 yr

I ran a scrub on sdb1 (don't know what that really does. here is the output:

btrfs scrub start /dev/sdb1 -B -R -d -r 2>&1
scrub device /dev/sdb1 (id 1) done
scrub started at Wed Nov 19 11:59:28 2014 and finished after 262 seconds
data_extents_scrubbed: 880339
tree_extents_scrubbed: 8273
data_bytes_scrubbed: 56042057728
tree_bytes_scrubbed: 135544832
read_errors: 0
csum_errors: 5
verify_errors: 0
no_csum: 5056
csum_discards: 7522
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 66592964608

Once it was completed, I see this in the syslog:

Nov 19 11:56:54 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 8273084416 csum 948789339 expected csum 1351011486
Nov 19 12:03:40 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Nov 19 12:03:43 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Nov 19 12:03:46 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Nov 19 12:03:46 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Nov 19 12:03:49 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0

Quote

November 19, 201411 yr

Where ever the cache is mounted you have to do a find on the inode to find the suspect file.

find (directory of cache) -ino 2000 -ls

I don't run a cache drive, so I do not know where /dev/sdb1 is mounted fill it in above.

It would not be /dev/sdb1 because that's the device, you need the mount point.

Probably something like /mnt/cache. Check the unraid interface.

Once you know the suspect file you can cat it to /dev/null and confirm the message again

cat (suspect file) > /dev/null

Check syslog

grep 'csum failed ' /var/log/syslog

Quote

November 19, 201411 yr

root@unRAID:/# find /mnt/cache -ino 2000 -ls
find: unknown predicate `-ino'

Quote

November 19, 201411 yr

OK...I found the culprit. It was one of my qcow2 files that was corrupt. I deleted it and will rebuild the VM.

Question is if this indicates that I have a drive issue or it was just that the file was buggered? I did copy that qcow2 file from my old cache drive which was a health-questionable drive.

John

Quote

November 19, 201411 yr

My mistake, -inum

it will take a long time, It has to search the whole filesystem.

 -inum n File has inode number n.

Quote

November 19, 201411 yr

OK...I found the culprit. It was one of my qcow2 files that was corrupt. I deleted it and will rebuild the VM.

Question is if this indicates that I have a drive issue or it was just that the file was buggered? I did copy that qcow2 file from my old cache drive which was a health-questionable drive.

John

My guess is, if it was copied 'buggered' before you copied it, you would not know.

The issue happened after the file was written.

Quote

November 19, 201411 yr

Well, I deleted the suspect file and started to rebuild the VM using a newly created qcow2 image and started seeing this a minute ago:

Nov 19 16:55:10 unRAID kernel: BTRFS error (device sdd1): csum failed ino 2087 off 10579087360 csum 3369183537 expected csum 1033459909
Nov 19 16:55:11 unRAID kernel: BTRFS error (device sdd1): csum failed ino 2087 off 10626752512 csum 3863425221 expected csum 1461438144

I just created this file 30 mins ago or so...

root@unRAID:/mnt/cache/VMs# find /mnt/cache -inum 2087 -ls
  2087 15692100 -rw-r--r--   1 root     users    16069820416 Nov 19 16:57 /mnt/cache/VMs/TVPVR.qcow2

Quote

November 19, 201411 yr

I don't have an answer for you on this one, could be hardware, could be the BTRFS itself. It's still not considered 100% mainstream yet.

I think if it were me I might try simpler and go with 1SSD first,

scrub it, Add in the next, scrub the fs, and then the third to see what happens.

Unless someone knows how to identify where this new file exists on the 3 SSD array.

I wonder if there's some bug with trim support?

I suppose you could try XFS on the cache drive too.

Quote

6b10a BTRFS error

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)