Calling All Xen Users...Feedback Needed...


jonp

Recommended Posts

I can't answer your first question, but no you don't need to do this for b15. It's very likely that you will need to do this for future releases though.

 

Yeah,  I might as well do it now if xen is being dropped.

 

Were you passing any PCI devices through to your Ubuntu VM or just using it as a traditional headless VM?  Check your domain cfg file for the Xen VM and see if the builder = hvm.  If not, then you have a paravirtualized Ubuntu VM, which will be far more difficult to convert.  Here's a thread on the Ubuntu forums about this very topic:  http://ubuntuforums.org/showthread.php?t=1668809

 

If you have HVM-based guests, converting should be a breeze.  Just fire up beta 15, create a new VM, point the primary vdisk to your Xen VM vdisk, and done.  Ubuntu already has all the virtio drivers and whatnot in their build, so it should just work.  That said, make a backup of the vdisk first...just to be safe.

Link to comment
  • Replies 156
  • Created
  • Last Reply

Top Posters In This Topic

 

Yeah,  I might as well do it now if xen is being dropped.

 

Were you passing any PCI devices through to your Ubuntu VM or just using it as a traditional headless VM?  Check your domain cfg file for the Xen VM and see if the builder = hvm.  If not, then you have a paravirtualized Ubuntu VM, which will be far more difficult to convert.  Here's a thread on the Ubuntu forums about this very topic:  http://ubuntuforums.org/showthread.php?t=1668809

 

If you have HVM-based guests, converting should be a breeze.  Just fire up beta 15, create a new VM, point the primary vdisk to your Xen VM vdisk, and done.  Ubuntu already has all the virtio drivers and whatnot in their build, so it should just work.  That said, make a backup of the vdisk first...just to be safe.

 

OK Looks like mine aren't HVM.....  That thread just says rebuild them under HVM...  That solution I can come up with myself :)

Link to comment

So I followed your guide and converted my two windows 7 domains from Xen to KVM, thanks for the guide, I wouldn't have attempted without that  ;)

 

One of them runs SageTV and passes through a quad HD tuner card and everything seems OK so far...

 

The only thing I saw which I have never seen before in my logs:

 

May 1 08:42:29 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 11020079104 csum 1150147365 expected csum 864986921

May 1 08:42:30 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:31 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:32 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:33 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

 

that's for my SSD cache drive that the domains reside on.

 

I ran a parity check which came back with zero errors.

 

Is this something to worry about?, I'll keep my eye on it...

 

 

Link to comment

So I followed your guide and converted my two windows 7 domains from Xen to KVM, thanks for the guide, I wouldn't have attempted without that  ;)

 

One of them runs SageTV and passes through a quad HD tuner card and everything seems OK so far...

 

The only thing I saw which I have never seen before in my logs:

 

May 1 08:42:29 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 11020079104 csum 1150147365 expected csum 864986921

May 1 08:42:30 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:31 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:32 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

May 1 08:42:33 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10756849664 csum 2130822108 expected csum 1739488230

 

that's for my SSD cache drive that the domains reside on.

 

I ran a parity check which came back with zero errors.

 

Is this something to worry about?, I'll keep my eye on it...

Click on your cache device on the main tab.  Click the "scrub" button and let it run and report back the results (I would post this in general support though, as this isn't a KVM or Xen thing, its BTRFS).

 

Also, if you can login to your server via telnet or ssh, type the following command:

 

lsattr /mnt/cache/path/to/vdisk.img

 

Replace the part after /mnt/cache with the actual path to your vdisk.  Do this for each vdisk you have and report back the command output.

 

Lastly, are these virtual disks raw or QCOW?  Can you remember how you created them?

 

Link to comment

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Fri May  1 13:49:03 2015 and finished after 506 seconds

total bytes scrubbed: 191.00GiB with 68 errors

error details: csum=68

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

 

 

root@Tower:/mnt/cache/domains# lsattr dlna/dlna.img

---------------- dlna/dlna.img

root@Tower:/mnt/cache/domains# lsattr

dlna/    newsage/

root@Tower:/mnt/cache/domains# lsattr newsage/newsage.img

---------------- newsage/newsage.img

root@Tower:/mnt/cache/domains#

 

 

I'm sure they are raw, originally they were created via truncate, eg:

 

truncate -s10G /mnt/disk1/VM/winxp.img

 

Note, when they were created my cache drive was a HDD not my SSD, and riserfs I believe not BTRFS like now

 

 

I also backed them up before beginning of switch to KVM via cp

 

eg.

cp dlna.img dlna.img_xen

 

 

 

Here's the logs grepped for BTRFS:

 

May  1 13:50:48 Tower kernel: BTRFS: checksum error at logical 49871953920 on dev /dev/sdh1, sector 97406160, root 5, inode 263, offset 25505734656, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:50:48 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0

May  1 13:50:48 Tower kernel: BTRFS: checksum error at logical 49871511552 on dev /dev/sdh1, sector 97405296, root 5, inode 263, offset 25505292288, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:50:48 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0

May  1 13:50:55 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0

May  1 13:50:55 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0

May  1 13:50:56 Tower kernel: BTRFS: checksum error at logical 56038289408 on dev /dev/sdh1, sector 109449784, root 5, inode 263, offset 10173288448, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:50:56 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79497785344 on dev /dev/sdh1, sector 155269112, root 5, inode 274, offset 30994079744, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79497961472 on dev /dev/sdh1, sector 155269456, root 5, inode 274, offset 30994255872, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79502553088 on dev /dev/sdh1, sector 155278424, root 5, inode 274, offset 30998847488, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79498006528 on dev /dev/sdh1, sector 155269544, root 5, inode 274, offset 30994300928, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79508230144 on dev /dev/sdh1, sector 155289512, root 5, inode 274, offset 31004348416, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0

May  1 13:51:52 Tower kernel: BTRFS: checksum error at logical 79508238336 on dev /dev/sdh1, sector 155289528, root 5, inode 274, offset 31004356608, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0

May  1 13:51:53 Tower kernel: BTRFS: checksum error at logical 79918530560 on dev /dev/sdh1, sector 156090880, root 5, inode 274, offset 31063384064, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:53 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0

May  1 13:51:53 Tower kernel: BTRFS: checksum error at logical 79918534656 on dev /dev/sdh1, sector 156090888, root 5, inode 274, offset 31063388160, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:51:53 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 13, gen 0

May  1 13:52:11 Tower kernel: BTRFS: checksum error at logical 87918108672 on dev /dev/sdh1, sector 171715056, root 5, inode 274, offset 31070003200, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:11 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0

May  1 13:52:11 Tower kernel: BTRFS: checksum error at logical 87918116864 on dev /dev/sdh1, sector 171715072, root 5, inode 274, offset 31070011392, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:11 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 15, gen 0

May  1 13:52:11 Tower kernel: BTRFS: checksum error at logical 87918120960 on dev /dev/sdh1, sector 171715080, root 5, inode 274, offset 31070015488, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:11 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 16, gen 0

May  1 13:52:11 Tower kernel: BTRFS: checksum error at logical 87954554880 on dev /dev/sdh1, sector 171786240, root 5, inode 274, offset 31070687232, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:11 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102552776704 on dev /dev/sdh1, sector 200298392, root 5, inode 274, offset 31173648384, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102552883200 on dev /dev/sdh1, sector 200298600, root 5, inode 274, offset 31173754880, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102552428544 on dev /dev/sdh1, sector 200297712, root 5, inode 274, offset 31173300224, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 18, gen 0

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 19, gen 0

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 20, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102552780800 on dev /dev/sdh1, sector 200298400, root 5, inode 274, offset 31173652480, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 21, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102557917184 on dev /dev/sdh1, sector 200308432, root 5, inode 274, offset 31174463488, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 22, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102558081024 on dev /dev/sdh1, sector 200308752, root 5, inode 274, offset 31174627328, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 23, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102552793088 on dev /dev/sdh1, sector 200298424, root 5, inode 274, offset 31173664768, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102558511104 on dev /dev/sdh1, sector 200309592, root 5, inode 274, offset 31175057408, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 24, gen 0

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 25, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102560301056 on dev /dev/sdh1, sector 200313088, root 5, inode 274, offset 31176847360, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 26, gen 0

May  1 13:52:46 Tower kernel: BTRFS: checksum error at logical 102560374784 on dev /dev/sdh1, sector 200313232, root 5, inode 274, offset 31176921088, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:46 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 27, gen 0

May  1 13:52:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 37, gen 0

May  1 13:52:52 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 38, gen 0

May  1 13:52:58 Tower kernel: BTRFS: checksum error at logical 107721375744 on dev /dev/sdh1, sector 210393312, root 5, inode 274, offset 31299059712, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:52:58 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 39, gen 0

May  1 13:53:15 Tower kernel: BTRFS: checksum error at logical 114866774016 on dev /dev/sdh1, sector 224349168, root 5, inode 274, offset 31323684864, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:53:15 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 40, gen 0

May  1 13:53:15 Tower kernel: BTRFS: checksum error at logical 114866778112 on dev /dev/sdh1, sector 224349176, root 5, inode 274, offset 31323688960, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:53:15 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 41, gen 0

May  1 13:53:15 Tower kernel: BTRFS: checksum error at logical 114866782208 on dev /dev/sdh1, sector 224349184, root 5, inode 274, offset 31323693056, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:53:15 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 42, gen 0

May  1 13:53:18 Tower kernel: BTRFS: checksum error at logical 115909296128 on dev /dev/sdh1, sector 226385344, root 5, inode 274, offset 31349510144, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:53:18 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 43, gen 0

May  1 13:53:18 Tower kernel: BTRFS: checksum error at logical 115909300224 on dev /dev/sdh1, sector 226385352, root 5, inode 274, offset 31349514240, length 4096, links 1 (path: domains/newsage/newsage.img)

May  1 13:53:18 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 44, gen 0

May  1 13:54:44 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 45, gen 0

May  1 13:55:28 Tower kernel: BTRFS: checksum error at logical 178478096384 on dev /dev/sdh1, sector 348590032, root 5, inode 263, offset 10473365504, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:55:28 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 46, gen 0

May  1 13:55:28 Tower kernel: BTRFS: checksum error at logical 178478104576 on dev /dev/sdh1, sector 348590048, root 5, inode 263, offset 10473373696, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:55:28 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 47, gen 0

May  1 13:55:29 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 48, gen 0

May  1 13:55:29 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 49, gen 0

May  1 13:55:29 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 50, gen 0

May  1 13:55:29 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 51, gen 0

May  1 13:55:29 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 52, gen 0

May  1 13:55:43 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 53, gen 0

May  1 13:55:55 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 54, gen 0

May  1 13:55:55 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 55, gen 0

May  1 13:55:55 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 56, gen 0

May  1 13:55:58 Tower kernel: BTRFS: checksum error at logical 192787910656 on dev /dev/sdh1, sector 376538888, root 5, inode 263, offset 28981272576, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:55:58 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 57, gen 0

May  1 13:55:58 Tower kernel: BTRFS: checksum error at logical 192856936448 on dev /dev/sdh1, sector 376673704, root 5, inode 263, offset 28981587968, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:55:58 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 58, gen 0

May  1 13:56:00 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 59, gen 0

May  1 13:56:01 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 60, gen 0

May  1 13:56:01 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 61, gen 0

May  1 13:56:05 Tower kernel: BTRFS: checksum error at logical 196031877120 on dev /dev/sdh1, sector 382874760, root 5, inode 263, offset 31789404160, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:56:05 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 62, gen 0

May  1 13:56:05 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 63, gen 0

May  1 13:56:12 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 64, gen 0

May  1 13:56:12 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 65, gen 0

May  1 13:56:13 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 66, gen 0

May  1 13:57:04 Tower kernel: BTRFS: checksum error at logical 226817155072 on dev /dev/sdh1, sector 443002256, root 5, inode 263, offset 10427338752, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:57:04 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 67, gen 0

May  1 13:57:04 Tower kernel: BTRFS: checksum error at logical 226886148096 on dev /dev/sdh1, sector 443137008, root 5, inode 263, offset 10095783936, length 4096, links 1 (path: domains/dlna/dlna.img)

May  1 13:57:04 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 68, gen 0

 

 

Link to comment

Ok, please copy your new KVM vdisks off your cache to a non btrfs device (this is a temporary action).  Next, run the following command on the directory storing your VMs:

 

chattr +C -R /mnt/cache/domains

 

Lastly, copy your vdisks back to your cache under the domains directory (or any subdirectory thereof) and delete your original vdisks (the ones from before the copy back and forth).

 

What this will do is disable the use of copy on write for your vdisks, which is recommended with BTRFS.

Link to comment

Looks like its broke, tried to copy off:

 

root@Tower:/mnt/cache/domains# cp ./dlna/dlna.img /mnt/disk1/

cp: error reading â./dlna/dlna.imgâ: Input/output error

cp: failed to extend â/mnt/disk1/dlna.imgâ: Input/output error

root@Tower:/mnt/cache/domains#

 

from the logs at the same time:

 

May 1 23:36:55 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10095783936 csum 3350473770 expected csum 3508421241

May 1 23:36:55 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10095783936 csum 3350473770 expected csum 3508421241

May 1 23:36:55 Tower kernel: BTRFS info (device sdh1): csum failed ino 263 off 10095783936 csum 3350473770 expected csum 3508421241

 

 

 

 

Should I try a scrub again and remove the '-r', ie force it to correct ?

Link to comment

It failed  :(

 

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Sat May  2 00:04:04 2015 and finished after 312 seconds

total bytes scrubbed: 117.59GiB with 37 errors

error details: csum=37

corrected errors: 0, uncorrectable errors: 37, unverified errors: 0

 

logs fulls of:

 

May 2 00:08:50 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 100, gen 0

May 2 00:08:50 Tower kernel: BTRFS: unable to fixup (regular) error at logical 209647181824 on dev /dev/sdh1

May 2 00:08:50 Tower kernel: BTRFS: checksum error at logical 209647222784 on dev /dev/sdh1, sector 409467232, root 5, inode 274, offset 31501836288, length 4096, links 1 (path: domains/newsage/newsage.img)

May 2 00:08:50 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 101, gen 0

May 2 00:08:50 Tower kernel: BTRFS: unable to fixup (regular) error at logical 209647222784 on dev /dev/sdh1

May 2 00:08:50 Tower kernel: BTRFS: checksum error at logical 209716936704 on dev /dev/sdh1, sector 409603392, root 5, inode 274, offset 31757787136, length 4096, links 1 (path: domains/newsage/newsage.img)

May 2 00:08:50 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 102, gen 0

May 2 00:08:50 Tower kernel: BTRFS: unable to fixup (regular) error at logical 209716936704 on dev /dev/sdh1

May 2 00:09:04 Tower kernel: BTRFS: checksum error at logical 226817155072 on dev /dev/sdh1, sector 443002256, root 5, inode 263, offset 10427338752, length 4096, links 1 (path: domains/dlna/dlna.img)

May 2 00:09:04 Tower kernel: BTRFS: bdev /dev/sdh1 errs: wr 0, rd 0, flush 0, corrupt 103, ge

 

 

etc.

 

 

 

I can only assume KVM does not like COW files then and Xen doesn't mind as I've been running fine on this BTRFS cache disk with Xen for quite some time?

 

 

Should I go back to Xen, follow your guide for removing gplpv drivers, then copy the .img to disk1 riserfs, boot into KVM then chattr +C -R /mnt/cache/domain and copy back?

 

 

 

 

edit: or can I go back to riserfs for cache disk easily, would that help?

 

 

Link to comment

One more thing you can try.  With the array stopped, note the device letter for your cache drive on the main page (sda, sdb, sdc, etc).  Drop to command line again and try this

 

btrfs check --repair /dev/sdX

 

Replace X with your drive letter. Let me know if that works for you.

Link to comment

root@Tower:~# btrfs check --repair /dev/sdh

enabling repair mode

No valid Btrfs found on /dev/sdh

Couldn't open file system

 

root@Tower:~# btrfs check --repair /dev/sdh1

enabling repair mode

Checking filesystem on /dev/sdh1

UUID: fa87e6cd-43f5-444d-8544-e0e8adda79bb

checking extents

Fixed 0 roots.

checking free space cache

cache and super generation don't match, space cache will be invalidated

checking fs roots

checking csums

checking root refs

found 126201766112 bytes used err is 0

total csum bytes: 122979400

total tree bytes: 270860288

total fs tree bytes: 52756480

total extent tree bytes: 54525952

btree space waste bytes: 54221809

file data blocks allocated: 4001811857408

referenced 96003809280

btrfs-progs v3.19.1

root@Tower:~#

 

Link to comment

root@Tower:~# btrfs check --repair /dev/sdh

enabling repair mode

No valid Btrfs found on /dev/sdh

Couldn't open file system

 

root@Tower:~# btrfs check --repair /dev/sdh1

enabling repair mode

Checking filesystem on /dev/sdh1

UUID: fa87e6cd-43f5-444d-8544-e0e8adda79bb

checking extents

Fixed 0 roots.

checking free space cache

cache and super generation don't match, space cache will be invalidated

checking fs roots

checking csums

checking root refs

found 126201766112 bytes used err is 0

total csum bytes: 122979400

total tree bytes: 270860288

total fs tree bytes: 52756480

total extent tree bytes: 54525952

btree space waste bytes: 54221809

file data blocks allocated: 4001811857408

referenced 96003809280

btrfs-progs v3.19.1

root@Tower:~#

Did this have any impact on things?  No change in your IO errors?

 

Can you check the smart data for you cache device?

Link to comment

Off to bed here, I restarted my VMs & will see what syslog produces overnight.

 

The command finished almost instantly so I doubt it did anything... Don't know anything about this filesystem but maybe it has a clean unmounted flag (like ufs perhaps) in which case it just exists unless you force it to run regardless ?

 

The smart health and disk test log all report fine for the cache device.

Link to comment

Because I'm about to be away from the forums for a bit, your previous post about redoing the process or converting the VMs again would be fine. I wouldn't necessarily reformat your cache as reiserfs just yet and if you were going to change it, I'd go xfs.

Link to comment

OK, I just need to know how to proceed.

 

Are the .img file's I have now likely corrupt ?

 

If so, then I want to start with something else - either rebuild my VMs from scratch via a fresh install (I'd rather not as a lot of work) OR switch back to Xen and my previously backed up .img files ?

 

If I switch back to Xen, can I somehow validate my .img files to give me confidence?  eg. brtfs scrub be sufficient?

 

 

And then if I switch to a different filesystem for my cache drive, then back to KVM, would that be OK ?  Is it likely BTRFS is the root cause here because my .img files are sparse?

 

Thanks!

Link to comment

OK, I just need to know how to proceed.

 

Are the .img file's I have now likely corrupt ?

 

If so, then I want to start with something else - either rebuild my VMs from scratch via a fresh install (I'd rather not as a lot of work) OR switch back to Xen and my previously backed up .img files ?

 

If I switch back to Xen, can I somehow validate my .img files to give me confidence?  eg. brtfs scrub be sufficient?

 

 

And then if I switch to a different filesystem for my cache drive, then back to KVM, would that be OK ?  Is it likely BTRFS is the root cause here because my .img files are sparse?

 

Thanks!

 

When you were running Xen, were you virtual disks on a BTRFS formatted device?

Link to comment

Yes, ever since I got an SSD.

 

Previously they were on reiserfs when I had a HDD, I then bought an SSD and put it under unraid control at which point it was formatted as BTRFS (must be a default as I didn't specifically want this).

Hmm, ok, is it possible you were getting these errors while in xen mode but just weren't aware because you were not looking at the log?  It really shouldn't make a difference between boot modes as far as btrfs errors are concerned.

 

Side note, are you VMs experiencing any odd behavior?

Link to comment

No, I looked at the logs most days and never saw those messages before I switched to KVM.

 

VMs seem to be OK so far, guess I could run an NTFS filesystem check.

This is just really odd.  Maybe try the process again for converting, but with the vdisks starting on a non btrfs disk, do the convert, put into the folder on btrfs with NOCOW and try again?

Link to comment

OK, I will :

 

1. Switch back to Xen

2. Out of interest run another btrfs scrub to confirm it still reports errors like I see in KVM mode

3. I copied the VMs to a RFS disk as backup prior to switch to KVM so will then try setting the NOCOW and copying back to BTRFS cache and running another scrub

 

if it passes will then try switching back to KVM

 

if it fails, not sure...

Link to comment

OK, I will :

 

1. Switch back to Xen

2. Out of interest run another btrfs scrub to confirm it still reports errors like I see in KVM mode

3. I copied the VMs to a RFS disk as backup prior to switch to KVM so will then try setting the NOCOW and copying back to BTRFS cache and running another scrub

 

if it passes will then try switching back to KVM

 

if it fails, not sure...

Either way, report back for sure. Need to know if this is consistently repeatable for you.

Link to comment

OK,

 

after 1 & 2:

 

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Sun May  3 00:49:31 2015 and finished after 287 seconds

total bytes scrubbed: 107.82GiB with 31 errors

error details: csum=31

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

 

after 3:

 

root@Tower:/mnt/cache# mkdir nocow

root@Tower:/mnt/cache# chattr +C -R nocow

root@Tower:/mnt/cache# cd nocow

root@Tower:/mnt/cache/nocow# cp /mnt/user/VM\ Backups/xen_backups_prior_to_kvm/* .

cp: skipping file â/mnt/user/VM Backups/xen_backups_prior_to_kvm/newsage.img_xenâ, as it was replaced while being copied

root@Tower:/mnt/cache/nocow# ls -lh

total 40G

-rw-rw-rw- 1 root root 40G May  3 01:24 dlna.img_xen

 

 

not sure why that fails, its a backup and not replaced or changed.....

 

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Sun May  3 01:29:39 2015 and finished after 391 seconds

total bytes scrubbed: 147.82GiB with 31 errors

error details: csum=31

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

 

btrfs check syslog output doesn't report anything for the nocow directory and same no. of errors so maybe my dlna.img_xen is OK....

 

root@Tower:/mnt/cache/nocow# cd /var/log/

root@Tower:/var/log# grep -i nocow syslog

root@Tower:/var/log#

 

 

so just got to think about sage VM now

Link to comment

OK,

 

after 1 & 2:

 

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Sun May  3 00:49:31 2015 and finished after 287 seconds

total bytes scrubbed: 107.82GiB with 31 errors

error details: csum=31

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

 

after 3:

 

root@Tower:/mnt/cache# mkdir nocow

root@Tower:/mnt/cache# chattr +C -R nocow

root@Tower:/mnt/cache# cd nocow

root@Tower:/mnt/cache/nocow# cp /mnt/user/VM\ Backups/xen_backups_prior_to_kvm/* .

cp: skipping file â/mnt/user/VM Backups/xen_backups_prior_to_kvm/newsage.img_xenâ, as it was replaced while being copied

root@Tower:/mnt/cache/nocow# ls -lh

total 40G

-rw-rw-rw- 1 root root 40G May  3 01:24 dlna.img_xen

 

 

not sure why that fails, its a backup and not replaced or changed.....

 

scrub status for fa87e6cd-43f5-444d-8544-e0e8adda79bb

scrub started at Sun May  3 01:29:39 2015 and finished after 391 seconds

total bytes scrubbed: 147.82GiB with 31 errors

error details: csum=31

corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

 

btrfs check syslog output doesn't report anything for the nocow directory and same no. of errors so maybe my dlna.img_xen is OK....

 

root@Tower:/mnt/cache/nocow# cd /var/log/

root@Tower:/var/log# grep -i nocow syslog

root@Tower:/var/log#

 

 

so just got to think about sage VM now

Ok, maybe we should try a wipefs on the cache device to clear it completely, then initialize it with btrfs again.  That's probably an extreme measure, but would be helpful in seeing if the csum errors come back or not.

 

Alternatively, you could try copying your sage VM from the backup directory its in now to another directory on the array (non btrfs), try the conversion process from there. Obviously performance for VMs on the array isn't going to be that stellar, but it should be possible to at least work through the conversion process on a non btrfs device, then we could try a copy again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.