October 7, 201411 yr Hi, i am running 6b10a and i have an SSD for cache only. The SSD is formatted as BTRFS and in the lat weeks I am encountering some errors reported on my main tft connected to unraid. I am struggling myself with kvm and i think this errors may be tthe cause of my lack of success while starting my VM. Today I started the array, and without doing anything else this is what I get (EDIT: I only edited my .cfg file located in the array): I hope someone can confirm this is an error. And also how to solve it. Rgds.
October 7, 201411 yr Since migrating my cache to btrfs a few weeks ago, Ive been getting lost of indecipherable errors as well. Twice my Xen VMs (located on the cache drive) have crashed forcing me to reboot the system,. Most recently, the VMs went into read only mode causing all sorts of problems. I've just last night re-formatted the cache as XFS and restored my VMs from backup. Time will tell but I've certaibnly lost confidence in btrfs and will not be using it on any of my data drives. Peter
October 7, 201411 yr Author Thx for your feedback, I did run into same/similar errors too. Too anoying for me know while I don´t have much time to try unstable releases. I did the same thing but formatted the cache SSD to ReiserFS...never had a problem with it before running 5.0-beta11 for a long time...time will tell. Let us know how XFS works if used as cache. Rgds.
October 7, 201411 yr OK, for those having problems on btrfs and using KVM virtual machines, are you using RAW images for your image types or QCOW/QCOW2?
October 8, 201411 yr Author Hi jon, today I started the array using this syslinux lines: label KVM unRAID OS 6b10a menu default kernel /bzimage append amd_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream initrd=/bzroot I connected via putty and entered: chmod 755 vfio-bind vfio-bind 0000:03:00.0 0000:00:14.2 0000:07:00.0 0000:08:00.0 0000:00:11.0 ...and this is what I get: Tower2 login: REISERFS abort (device sdd1): Journal write error in flush_commit_list ...I then stop the array and I get: REISERFS abort (device md1): Journal write error in flush_commit_list. This is also what I get: . Yesterday I formatted my cache only SSD to reiserFS, but as you can see I have one of the problems that also arised while using Btrfs. Let me know if you need my syslog. Rgds.
October 8, 201411 yr Author Hi again, sorry for my last post. I checked for it in depth and the error I was reporting today was caused becaused I was passing through 1 sata port (0000:00:11.0) using the vfio-bind command. I eliminated it from the .xml too and the problem was solved. I must assume that the reported error yesterday was caused by the same issue. Sorry jon. Rgds.
November 19, 201411 yr I just migrated my cache drive from a single spinner to 3x SSDs. I am now seeing these errors: Nov 19 11:31:22 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 8048746496 csum 1267341460 expected csum 2413356607 Nov 19 11:31:22 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 17655549952 csum 4046818684 expected csum 3672383649 To answer your question Jon, my VMs are all qcow2 running from /mnt/cache/VMs. My cache drive is not being used for cache. Is there a command I should run to check/fix these errors? John
November 19, 201411 yr I ran a scrub on sdb1 (don't know what that really does. here is the output: btrfs scrub start /dev/sdb1 -B -R -d -r 2>&1 scrub device /dev/sdb1 (id 1) done scrub started at Wed Nov 19 11:59:28 2014 and finished after 262 seconds data_extents_scrubbed: 880339 tree_extents_scrubbed: 8273 data_bytes_scrubbed: 56042057728 tree_bytes_scrubbed: 135544832 read_errors: 0 csum_errors: 5 verify_errors: 0 no_csum: 5056 csum_discards: 7522 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 66592964608 Once it was completed, I see this in the syslog: Nov 19 11:56:54 unRAID kernel: BTRFS error (device sdb1): csum failed ino 2000 off 8273084416 csum 948789339 expected csum 1351011486 Nov 19 12:03:40 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Nov 19 12:03:43 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Nov 19 12:03:46 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Nov 19 12:03:46 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Nov 19 12:03:49 unRAID kernel: BTRFS: bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
November 19, 201411 yr Where ever the cache is mounted you have to do a find on the inode to find the suspect file. find (directory of cache) -ino 2000 -ls I don't run a cache drive, so I do not know where /dev/sdb1 is mounted fill it in above. It would not be /dev/sdb1 because that's the device, you need the mount point. Probably something like /mnt/cache. Check the unraid interface. Once you know the suspect file you can cat it to /dev/null and confirm the message again cat (suspect file) > /dev/null Check syslog grep 'csum failed ' /var/log/syslog
November 19, 201411 yr OK...I found the culprit. It was one of my qcow2 files that was corrupt. I deleted it and will rebuild the VM. Question is if this indicates that I have a drive issue or it was just that the file was buggered? I did copy that qcow2 file from my old cache drive which was a health-questionable drive. John
November 19, 201411 yr My mistake, -inum it will take a long time, It has to search the whole filesystem. -inum n File has inode number n.
November 19, 201411 yr OK...I found the culprit. It was one of my qcow2 files that was corrupt. I deleted it and will rebuild the VM. Question is if this indicates that I have a drive issue or it was just that the file was buggered? I did copy that qcow2 file from my old cache drive which was a health-questionable drive. John My guess is, if it was copied 'buggered' before you copied it, you would not know. The issue happened after the file was written.
November 19, 201411 yr Well, I deleted the suspect file and started to rebuild the VM using a newly created qcow2 image and started seeing this a minute ago: Nov 19 16:55:10 unRAID kernel: BTRFS error (device sdd1): csum failed ino 2087 off 10579087360 csum 3369183537 expected csum 1033459909 Nov 19 16:55:11 unRAID kernel: BTRFS error (device sdd1): csum failed ino 2087 off 10626752512 csum 3863425221 expected csum 1461438144 I just created this file 30 mins ago or so... root@unRAID:/mnt/cache/VMs# find /mnt/cache -inum 2087 -ls 2087 15692100 -rw-r--r-- 1 root users 16069820416 Nov 19 16:57 /mnt/cache/VMs/TVPVR.qcow2
November 19, 201411 yr I don't have an answer for you on this one, could be hardware, could be the BTRFS itself. It's still not considered 100% mainstream yet. I think if it were me I might try simpler and go with 1SSD first, scrub it, Add in the next, scrub the fs, and then the third to see what happens. Unless someone knows how to identify where this new file exists on the 3 SSD array. I wonder if there's some bug with trim support? I suppose you could try XFS on the cache drive too.
Archived
This topic is now archived and is closed to further replies.