December 14, 20169 yr Supermicro X10-SRAF MB 4 x 8TB Seagate archive drives 2 x 1TB Samsung 850 EVO as cache pool 1 x 2TB USB HDD as an unassinged devices 64 GB Ram Dockers and VMs running from Cache Pool. Last night all my VMs suspended. I checked syslog and noticed some ata6.00: failed command: WRITE FPDMA QUEUED errors in the log last night for Cache drive 2. These have been happening since the last boot 30 days ago. Cache 1 seemed fine. I replaced both SATA cables to the cache drives, and rebooted. Boot seemed ok. I did a BTFRS scrub, with the following results This morning all my VMs (except 1) are suspended again scrub status for ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a scrub started at Wed Dec 14 03:46:08 2016 and finished after 00:53:27 total bytes scrubbed: 1013.86GiB with 626 errors error details: csum=626 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 Here is then an excerpt from the logs. After this there were several entries similar to the last one. I've attached diagnostics.zip What should I do? Should I run scrub with "correct"? Should i take Cache 2 out and replace it? Dec 14 03:46:08 Tower php: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '-r' Dec 14 04:01:03 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Dec 14 04:13:42 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837556240384 on dev /dev/sdc1, sector 1621171968, root 5, inode 3200621, offset 2008371200, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837586173952 on dev /dev/sdc1, sector 1621230432, root 5, inode 3200621, offset 2015199232, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837587066880 on dev /dev/sdc1, sector 1621232176, root 5, inode 3200621, offset 2016092160, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837585190912 on dev /dev/sdc1, sector 1621228512, root 5, inode 3200621, offset 2014216192, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837587156992 on dev /dev/sdc1, sector 1621232352, root 5, inode 3200621, offset 2016182272, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837588115456 on dev /dev/sdc1, sector 1621234224, root 5, inode 3200621, offset 2017140736, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837588049920 on dev /dev/sdc1, sector 1621234096, root 5, inode 3200621, offset 2017075200, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837588402176 on dev /dev/sdc1, sector 1621234784, root 5, inode 3200621, offset 2017427456, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 Dec 14 04:29:47 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837588951040 on dev /dev/sdc1, sector 1621235856, root 5, inode 3200621, offset 2017976320, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 04:29:47 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 Dec 14 04:29:48 Tower kernel: BTRFS error (device sdd1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 Dec 14 04:29:48 Tower kernel: BTRFS warning (device sdd1): checksum error at logical 837589934080 on dev /dev/sdc1, sector 1621237776, root 5, inode 3200621, offset 2018959360, length 4096, links 1 (path: BlueIrisNew/New/DoorCam.20160908_120000.bvr) Dec 14 06:07:14 Tower kernel: ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Dec 14 06:07:14 Tower kernel: ata9.00: irq_stat 0x08000000, interface fatal error Dec 14 06:07:14 Tower kernel: ata9: SError: { UnrecovData 10B8B BadCRC } Dec 14 06:07:14 Tower kernel: ata9.00: failed command: READ DMA EXT Dec 14 06:07:14 Tower kernel: ata9.00: cmd 25/00:40:c0:6e:dc/00:05:a9:00:00/e0 tag 10 dma 688128 in Dec 14 06:07:14 Tower kernel: res 50/00:00:bf:6e:dc/00:00:a9:00:00/e0 Emask 0x10 (ATA bus error) Dec 14 06:07:14 Tower kernel: ata9.00: status: { DRDY } Dec 14 06:07:14 Tower kernel: ata9: hard resetting link Dec 14 06:07:15 Tower kernel: ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out Dec 14 06:07:15 Tower kernel: ata9.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out Dec 14 06:07:15 Tower kernel: ata9.00: configured for UDMA/133 Dec 14 06:07:15 Tower kernel: ata9: EH complete Dec 14 06:41:55 Tower shfs/user: err: shfs_write: write: (28) No space left on device Dec 14 07:20:06 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481700006_117712644 /mnt/cache/Al/Documents/.sync/.fuse_hidden00132d9b00000169 (28) No space left on device Dec 14 07:20:38 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481700038_3151085547 /mnt/cache/Al/Documents/.sync/.fuse_hidden00132f8200000172 (28) No space left on device Dec 14 07:20:41 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481700041_2478112417 /mnt/cache/Al/Documents/.sync/.fuse_hidden00132f9e00000173 (28) No space left on device Dec 14 08:04:17 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702657_1918056504 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134542000001d8 (28) No space left on device Dec 14 08:04:20 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702660_1848523143 /mnt/cache/Al/Documents/.sync/.fuse_hidden0013454c000001d9 (28) No space left on device Dec 14 08:04:32 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702672_921145909 /mnt/cache/Al/Documents/.sync/.fuse_hidden001345f8000001dd (28) No space left on device Dec 14 08:04:43 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702682_622597922 /mnt/cache/Al/Documents/.sync/.fuse_hidden001346ac000001e3 (28) No space left on device Dec 14 08:05:11 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702711_2301762711 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134872000001e8 (28) No space left on device Dec 14 08:05:17 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Downloads/.sync/drive_test_1481702716_4134684967 /mnt/cache/Downloads/.sync/.fuse_hidden001348c6000001ea (28) No space left on device Dec 14 08:05:29 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702729_3624365405 /mnt/cache/Al/Documents/.sync/.fuse_hidden001349be000001ed (28) No space left on device Dec 14 08:05:53 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702753_1548168528 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134af3000001f4 (28) No space left on device Dec 14 08:06:23 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702783_2737779621 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134cd6000001fe (28) No space left on device Dec 14 08:06:46 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702805_3122568511 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134e3000000204 (28) No space left on device Dec 14 08:06:55 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481702815_283113713 /mnt/cache/Al/Documents/.sync/.fuse_hidden00134ea200000206 (28) No space left on device Dec 14 08:13:12 Tower kernel: loop: Write error at byte offset 4160462848, length 4096. Dec 14 08:13:12 Tower kernel: blk_update_request: I/O error, dev loop0, sector 8125904 Dec 14 08:13:12 Tower kernel: btrfs_dev_stat_print_on_error: 616 callbacks suppressed Dec 14 08:13:12 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Dec 14 08:15:22 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Downloads/.sync/drive_test_1481703322_2860276855 /mnt/cache/Downloads/.sync/.fuse_hidden001a0b0500000207 (28) No space left on device Dec 14 08:16:02 Tower kernel: loop: Write error at byte offset 4649369600, length 4096. Dec 14 08:16:02 Tower kernel: blk_update_request: I/O error, dev loop0, sector 9080792 Dec 14 08:16:02 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Dec 14 08:16:03 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Downloads/.sync/drive_test_1481703363_608247545 /mnt/cache/Downloads/.sync/.fuse_hidden001a0bb700000208 (28) No space left on device tower-diagnostics-20161214-1003.zip
December 14, 20169 yr Try the scrub correct 1st, then run another one to see if all errors were fixed.
December 14, 20169 yr Author Tried that. Lots of uncorrectable errors Not sure what to do next. Is the Cache2 SSD failing? Should I take it out? scrub status for ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a scrub started at Wed Dec 14 11:18:26 2016 and finished after 00:29:20 total bytes scrubbed: 1021.69GiB with 626 errors error details: csum=626 corrected errors: 0, uncorrectable errors: 626, unverified errors: 0 Dec 14 11:18:26 Tower php: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_scrub 'start' '/mnt/cache' '' Dec 14 11:18:46 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481714326_94606292 /mnt/cache/Al/Documents/.sync/.fuse_hidden001a5b1500000343 (28) No space left on device Dec 14 11:19:08 Tower kernel: loop: Write error at byte offset 5259964416, length 4096. Dec 14 11:19:08 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10273136 Dec 14 11:19:57 Tower kernel: loop: Write error at byte offset 4649746432, length 4096. Dec 14 11:19:57 Tower kernel: blk_update_request: I/O error, dev loop0, sector 9081536 Dec 14 11:19:57 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 20, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:20:23 Tower kernel: loop: Write error at byte offset 5258887168, length 4096. Dec 14 11:20:23 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10271080 Dec 14 11:20:23 Tower kernel: loop: Write error at byte offset 5258923520, length 512. Dec 14 11:20:23 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10271335 Dec 14 11:20:24 Tower kernel: loop: Write error at byte offset 5259054080, length 1024. Dec 14 11:20:24 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10271590 Dec 14 11:20:24 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 21, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:20:44 Tower kernel: loop: Write error at byte offset 5858996224, length 4096. Dec 14 11:20:44 Tower kernel: blk_update_request: I/O error, dev loop0, sector 11443279 Dec 14 11:20:44 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 22, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:21:00 Tower kernel: loop: Write error at byte offset 926781440, length 4096. Dec 14 11:21:00 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1810080 Dec 14 11:21:00 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 23, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:21:07 Tower kernel: loop: Write error at byte offset 926879744, length 4096. Dec 14 11:21:07 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1810304 Dec 14 11:21:07 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 24, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:21:17 Tower kernel: loop: Write error at byte offset 2000596992, length 4096. Dec 14 11:21:17 Tower kernel: blk_update_request: I/O error, dev loop0, sector 3907328 Dec 14 11:21:17 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 25, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:21:29 Tower kernel: loop: Write error at byte offset 927072256, length 4096. Dec 14 11:21:29 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1810688 Dec 14 11:21:29 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 26, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:21:47 Tower kernel: loop: Write error at byte offset 927350784, length 4096. Dec 14 11:21:47 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1811200 Dec 14 11:21:47 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 27, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:22:02 Tower kernel: loop: Write error at byte offset 927649792, length 4096. Dec 14 11:22:02 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1811712 Dec 14 11:22:02 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 28, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:22:24 Tower kernel: loop: Write error at byte offset 927916032, length 4096. Dec 14 11:22:24 Tower kernel: blk_update_request: I/O error, dev loop0, sector 1812224 Dec 14 11:22:24 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 29, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:22:26 Tower shfs/user: err: shfs_create: open: /mnt/cache/Al/Documents/.sync/drive_test_1481714545_2784813900 (28) No space left on device Dec 14 11:22:33 Tower kernel: loop: Write error at byte offset 2001309696, length 4096. Dec 14 11:22:33 Tower kernel: blk_update_request: I/O error, dev loop0, sector 3908736 Dec 14 11:22:33 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 30, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:22:35 Tower kernel: loop: Write error at byte offset 5259141120, length 4096. Dec 14 11:22:35 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10271632 Dec 14 11:22:37 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 31, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:22:46 Tower kernel: loop: Write error at byte offset 10033967104, length 4096. Dec 14 11:22:46 Tower kernel: blk_update_request: I/O error, dev loop0, sector 19597592 Dec 14 11:22:46 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 32, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:23:03 Tower shfs/user: err: shfs_rename: rename: /mnt/cache/Al/Documents/.sync/drive_test_1481714583_1423207648 /mnt/cache/Al/Documents/.sync/.fuse_hidden001a5b1900000344 (28) No space left on device Dec 14 11:23:06 Tower kernel: loop: Write error at byte offset 4649340928, length 4096. Dec 14 11:23:06 Tower kernel: blk_update_request: I/O error, dev loop0, sector 9080744 Dec 14 11:23:06 Tower kernel: BTRFS error (device loop0): bdev /dev/loop0 errs: wr 33, rd 0, flush 0, corrupt 0, gen 0 Dec 14 11:23:08 Tower kernel: loop: Write error at byte offset 5259857920, length 4096. Dec 14 11:23:08 Tower kernel: blk_update_request: I/O error, dev loop0, sector 10273136 Dec 14 11:23:11 Tower shfs/user: err: shfs_create: open: /mnt/cache/Downloads/.sync/drive_test_1481714590_2888190931 (28) No space left on device Dec 14 11:23:15 Tower shfs/user: err: shfs_create: open: /mnt/cache/Al/Documents/.sync/drive_test_1481714595_1529987331 (28) No space left on device Dec 14 11:23:21 Tower shfs/user: err: shfs_create: open: /mnt/cache/Downloads/.sync/drive_test_1481714600_2781946006 (28) No space left on device Dec 14 11:23:21 Tower kernel: ------------[ cut here ]------------ Dec 14 11:23:21 Tower kernel: WARNING: CPU: 3 PID: 30475 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Dec 14 11:23:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net tun vhost macvtap macvlan xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod bonding mlx4_en mlx4_core vxlan udp_tunnel igb ptp pps_core fbcon bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor font ast drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea ttm drm agpgart syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt fb_sys_fops coretemp kvm_intel kvm ahci ftdi_sio i2c_i801 fb i2c_algo_bit pl2303 fbdev i2c_core cdc_acm usbserial libahci wmi ipmi_si [last unloaded: md_mod] Dec 14 11:23:21 Tower kernel: CPU: 3 PID: 30475 Comm: docker Not tainted 4.4.30-unRAID #2 Dec 14 11:23:21 Tower kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016 Dec 14 11:23:21 Tower kernel: 0000000000000000 ffff8801066d7c68 ffffffff8136f79f 0000000000000000 Dec 14 11:23:21 Tower kernel: 0000000000001054 ffff8801066d7ca0 ffffffff8104a4ab ffffffff812ada13 Dec 14 11:23:21 Tower kernel: 0000000000002000 ffff880fe89d7200 0000000000001000 ffff8801066d7d80 Dec 14 11:23:21 Tower kernel: Call Trace: Dec 14 11:23:21 Tower kernel: [<ffffffff8136f79f>] dump_stack+0x61/0x7e Dec 14 11:23:21 Tower kernel: [<ffffffff8104a4ab>] warn_slowpath_common+0x8f/0xa8 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff8104a568>] warn_slowpath_null+0x15/0x17 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff812ada4a>] btrfs_free_reserved_data_space+0x17/0x2c Dec 14 11:23:21 Tower kernel: [<ffffffff812ade41>] btrfs_delalloc_release_space+0x29/0x2f Dec 14 11:23:21 Tower kernel: [<ffffffff812d309b>] __btrfs_buffered_write.isra.5+0x426/0x4a7 Dec 14 11:23:21 Tower kernel: [<ffffffff810b958d>] ? generic_perform_write+0x156/0x17e Dec 14 11:23:21 Tower kernel: [<ffffffff812d5fca>] btrfs_file_write_iter+0x2f1/0x402 Dec 14 11:23:21 Tower kernel: [<ffffffff8110a4e2>] __vfs_write+0x90/0xb9 Dec 14 11:23:21 Tower kernel: [<ffffffff8110aa6d>] vfs_write+0xbc/0x160 Dec 14 11:23:21 Tower kernel: [<ffffffff8110b1ba>] SyS_write+0x49/0x84 Dec 14 11:23:21 Tower kernel: [<ffffffff81629c2e>] entry_SYSCALL_64_fastpath+0x12/0x6d Dec 14 11:23:21 Tower kernel: ---[ end trace d4017952aa40921a ]--- Dec 14 11:23:21 Tower kernel: ------------[ cut here ]------------ Dec 14 11:23:21 Tower kernel: WARNING: CPU: 3 PID: 30475 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Dec 14 11:23:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net tun vhost macvtap macvlan xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod bonding mlx4_en mlx4_core vxlan udp_tunnel igb ptp pps_core fbcon bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor font ast drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea ttm drm agpgart syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt fb_sys_fops coretemp kvm_intel kvm ahci ftdi_sio i2c_i801 fb i2c_algo_bit pl2303 fbdev i2c_core cdc_acm usbserial libahci wmi ipmi_si [last unloaded: md_mod] Dec 14 11:23:21 Tower kernel: CPU: 3 PID: 30475 Comm: docker Tainted: G W 4.4.30-unRAID #2 Dec 14 11:23:21 Tower kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016 Dec 14 11:23:21 Tower kernel: 0000000000000000 ffff8801066d7c68 ffffffff8136f79f 0000000000000000 Dec 14 11:23:21 Tower kernel: 0000000000001054 ffff8801066d7ca0 ffffffff8104a4ab ffffffff812ada13 Dec 14 11:23:21 Tower kernel: 0000000000002000 ffff880fe89d7200 0000000000001000 ffff8801066d7d80 Dec 14 11:23:21 Tower kernel: Call Trace: Dec 14 11:23:21 Tower kernel: [<ffffffff8136f79f>] dump_stack+0x61/0x7e Dec 14 11:23:21 Tower kernel: [<ffffffff8104a4ab>] warn_slowpath_common+0x8f/0xa8 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff8104a568>] warn_slowpath_null+0x15/0x17 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff812ada4a>] btrfs_free_reserved_data_space+0x17/0x2c Dec 14 11:23:21 Tower kernel: [<ffffffff812ade41>] btrfs_delalloc_release_space+0x29/0x2f Dec 14 11:23:21 Tower kernel: [<ffffffff812d309b>] __btrfs_buffered_write.isra.5+0x426/0x4a7 Dec 14 11:23:21 Tower kernel: [<ffffffff810b958d>] ? generic_perform_write+0x156/0x17e Dec 14 11:23:21 Tower kernel: [<ffffffff812d5fca>] btrfs_file_write_iter+0x2f1/0x402 Dec 14 11:23:21 Tower kernel: [<ffffffff8110a4e2>] __vfs_write+0x90/0xb9 Dec 14 11:23:21 Tower kernel: [<ffffffff8110aa6d>] vfs_write+0xbc/0x160 Dec 14 11:23:21 Tower kernel: [<ffffffff8110b1ba>] SyS_write+0x49/0x84 Dec 14 11:23:21 Tower kernel: [<ffffffff81629c2e>] entry_SYSCALL_64_fastpath+0x12/0x6d Dec 14 11:23:21 Tower kernel: ---[ end trace d4017952aa40921b ]--- Dec 14 11:23:21 Tower kernel: ------------[ cut here ]------------ Dec 14 11:23:21 Tower kernel: WARNING: CPU: 3 PID: 30475 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Dec 14 11:23:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net tun vhost macvtap macvlan xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod bonding mlx4_en mlx4_core vxlan udp_tunnel igb ptp pps_core fbcon bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor font ast drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea ttm drm agpgart syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt fb_sys_fops coretemp kvm_intel kvm ahci ftdi_sio i2c_i801 fb i2c_algo_bit pl2303 fbdev i2c_core cdc_acm usbserial libahci wmi ipmi_si [last unloaded: md_mod] Dec 14 11:23:21 Tower kernel: CPU: 3 PID: 30475 Comm: docker Tainted: G W 4.4.30-unRAID #2 Dec 14 11:23:21 Tower kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016 Dec 14 11:23:21 Tower kernel: 0000000000000000 ffff8801066d7c68 ffffffff8136f79f 0000000000000000 Dec 14 11:23:21 Tower kernel: 0000000000001054 ffff8801066d7ca0 ffffffff8104a4ab ffffffff812ada13 Dec 14 11:23:21 Tower kernel: 0000000000002000 ffff880fe89d7200 0000000000001000 ffff8801066d7d80 Dec 14 11:23:21 Tower kernel: Call Trace: Dec 14 11:23:21 Tower kernel: [<ffffffff8136f79f>] dump_stack+0x61/0x7e Dec 14 11:23:21 Tower kernel: [<ffffffff8104a4ab>] warn_slowpath_common+0x8f/0xa8 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff8104a568>] warn_slowpath_null+0x15/0x17 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff812ada4a>] btrfs_free_reserved_data_space+0x17/0x2c Dec 14 11:23:21 Tower kernel: [<ffffffff812ade41>] btrfs_delalloc_release_space+0x29/0x2f Dec 14 11:23:21 Tower kernel: [<ffffffff812d309b>] __btrfs_buffered_write.isra.5+0x426/0x4a7 Dec 14 11:23:21 Tower kernel: [<ffffffff810b958d>] ? generic_perform_write+0x156/0x17e Dec 14 11:23:21 Tower kernel: [<ffffffff812d5fca>] btrfs_file_write_iter+0x2f1/0x402 Dec 14 11:23:21 Tower kernel: [<ffffffff8110a4e2>] __vfs_write+0x90/0xb9 Dec 14 11:23:21 Tower kernel: [<ffffffff8110aa6d>] vfs_write+0xbc/0x160 Dec 14 11:23:21 Tower kernel: [<ffffffff8110b1ba>] SyS_write+0x49/0x84 Dec 14 11:23:21 Tower kernel: [<ffffffff81629c2e>] entry_SYSCALL_64_fastpath+0x12/0x6d Dec 14 11:23:21 Tower kernel: ---[ end trace d4017952aa40921c ]--- Dec 14 11:23:21 Tower kernel: ------------[ cut here ]------------ Dec 14 11:23:21 Tower kernel: WARNING: CPU: 3 PID: 30475 at fs/btrfs/extent-tree.c:4180 btrfs_free_reserved_data_space_noquota+0x5b/0x7b() Dec 14 11:23:21 Tower kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net tun vhost macvtap macvlan xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod bonding mlx4_en mlx4_core vxlan udp_tunnel igb ptp pps_core fbcon bitblit fbcon_rotate fbcon_ccw fbcon_ud fbcon_cw softcursor font ast drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea ttm drm agpgart syscopyarea sysfillrect x86_pkg_temp_thermal sysimgblt fb_sys_fops coretemp kvm_intel kvm ahci ftdi_sio i2c_i801 fb i2c_algo_bit pl2303 fbdev i2c_core cdc_acm usbserial libahci wmi ipmi_si [last unloaded: md_mod] Dec 14 11:23:21 Tower kernel: CPU: 3 PID: 30475 Comm: docker Tainted: G W 4.4.30-unRAID #2 Dec 14 11:23:21 Tower kernel: Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0a 06/23/2016 Dec 14 11:23:21 Tower kernel: 0000000000000000 ffff8801066d7c68 ffffffff8136f79f 0000000000000000 Dec 14 11:23:21 Tower kernel: 0000000000001054 ffff8801066d7ca0 ffffffff8104a4ab ffffffff812ada13 Dec 14 11:23:21 Tower kernel: 0000000000002000 ffff880fe89d7200 0000000000001000 ffff8801066d7d80 Dec 14 11:23:21 Tower kernel: Call Trace: Dec 14 11:23:21 Tower kernel: [<ffffffff8136f79f>] dump_stack+0x61/0x7e Dec 14 11:23:21 Tower kernel: [<ffffffff8104a4ab>] warn_slowpath_common+0x8f/0xa8 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] ? btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff8104a568>] warn_slowpath_null+0x15/0x17 Dec 14 11:23:21 Tower kernel: [<ffffffff812ada13>] btrfs_free_reserved_data_space_noquota+0x5b/0x7b Dec 14 11:23:21 Tower kernel: [<ffffffff812ada4a>] btrfs_free_reserved_data_space+0x17/0x2c Dec 14 11:23:21 Tower kernel: [<ffffffff812ade41>] btrfs_delalloc_release_space+0x29/0x2f Dec 14 11:23:21 Tower kernel: [<ffffffff812d309b>] __btrfs_buffered_write.isra.5+0x426/0x4a7 Dec 14 11:23:21 Tower kernel: [<ffffffff810b958d>] ? generic_perform_write+0x156/0x17e Dec 14 11:23:21 Tower kernel: [<ffffffff812d5fca>] btrfs_file_write_iter+0x2f1/0x402 Dec 14 11:23:21 Tower kernel: [<ffffffff8110a4e2>] __vfs_write+0x90/0xb9 Dec 14 11:23:21 Tower kernel: [<ffffffff8110aa6d>] vfs_write+0xbc/0x160 Dec 14 11:23:21 Tower kernel: [<ffffffff8110b1ba>] SyS_write+0x49/0x84 Dec 14 11:23:21 Tower kernel: [<ffffffff81629c2e>] entry_SYSCALL_64_fastpath+0x12/0x6d Dec 14 11:23:21 Tower kernel: ---[ end trace d4017952aa40921d ]--- Dec 14 11:23:21 Tower kernel: ------------[ cut here ]------------
December 14, 20169 yr You can try removing the 2nd device and then add it back again, or completely format both devices and start over using the replace cache procedure.
December 14, 20169 yr Author I have another 1TB Samsung 850 I can put in as a replacement for cache 2. I'll do that now. Will it be able to rebuild from cache 1? I don't want to lose the VMs on cache1. Thanks, Al
December 14, 20169 yr If all goes well it will let you remove one device and add the other while keeping all your data, but it's always good to make a backup first just in case.
December 14, 20169 yr Author I've removed the old cache2 drive, plugged in my new SSD, and am ready to start the array. Should I start it with "no device" showing for cache ? or should I assign my replacement SSD into the cache2 slot and then restart ? Will it auto rebuild onto cache2? This thread https://lime-technology.com/forum/index.php?topic=45711.0 talks about adding the extra drive into the pool before removing the old one. I haven't done that.
December 14, 20169 yr Author Thanks JohnnieBlack. I've assigned the replacement SSD to Cache2 and pressed start. Cache page is showing as per FAQ, and has been like this for about 20 minutes. How long should it take? On the main page, reads are at 611 and writes at 7. Cache 1 is showing as "unmountable" Here is some of the syslog warning, device 1 is missing Label: none uuid: ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a Total devices 2 FS bytes used 717.29GiB devid 2 size 931.51GiB used 395.52GiB path /dev/sdd1 *** Some devices missing Dec 14 13:47:50 Tower emhttp: Mounting disks... Dec 14 13:47:50 Tower emhttp: shcmd (1093): /sbin/btrfs device scan |& logger Dec 14 13:47:50 Tower root: Scanning for Btrfs filesystems Dec 14 13:47:50 Tower emhttp: shcmd (1094): mkdir -p /mnt/disk1 Dec 14 13:47:50 Tower emhttp: shcmd (1095): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md1 /mnt/disk1 |& logger Dec 14 13:47:50 Tower kernel: XFS (md1): Mounting V5 Filesystem Dec 14 13:47:50 Tower kernel: XFS (md1): Ending clean mount Dec 14 13:47:50 Tower emhttp: shcmd (1096): xfs_growfs /mnt/disk1 |& logger Dec 14 13:47:50 Tower root: meta-data=/dev/md1 isize=512 agcount=8, agsize=268435455 blks Dec 14 13:47:50 Tower root: = sectsz=512 attr=2, projid32bit=1 Dec 14 13:47:50 Tower root: = crc=1 finobt=1 spinodes=0 Dec 14 13:47:50 Tower root: data = bsize=4096 blocks=1953506633, imaxpct=5 Dec 14 13:47:50 Tower root: = sunit=0 swidth=0 blks Dec 14 13:47:50 Tower root: naming =version 2 bsize=4096 ascii-ci=0 ftype=1 Dec 14 13:47:50 Tower root: log =internal bsize=4096 blocks=521728, version=2 Dec 14 13:47:50 Tower root: = sectsz=512 sunit=0 blks, lazy-count=1 Dec 14 13:47:50 Tower root: realtime =none extsz=4096 blocks=0, rtextents=0 Dec 14 13:47:50 Tower emhttp: shcmd (1097): mkdir -p /mnt/disk2 Dec 14 13:47:50 Tower emhttp: shcmd (1098): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md2 /mnt/disk2 |& logger Dec 14 13:47:50 Tower kernel: XFS (md2): Mounting V5 Filesystem Dec 14 13:47:50 Tower kernel: XFS (md2): Ending clean mount Dec 14 13:47:50 Tower emhttp: shcmd (1099): xfs_growfs /mnt/disk2 |& logger Dec 14 13:47:50 Tower root: meta-data=/dev/md2 isize=512 agcount=8, agsize=268435455 blks Dec 14 13:47:50 Tower root: = sectsz=512 attr=2, projid32bit=1 Dec 14 13:47:50 Tower root: = crc=1 finobt=1 spinodes=0 Dec 14 13:47:50 Tower root: data = bsize=4096 blocks=1953506633, imaxpct=5 Dec 14 13:47:50 Tower root: = sunit=0 swidth=0 blks Dec 14 13:47:50 Tower root: naming =version 2 bsize=4096 ascii-ci=0 ftype=1 Dec 14 13:47:50 Tower root: log =internal bsize=4096 blocks=521728, version=2 Dec 14 13:47:50 Tower root: = sectsz=512 sunit=0 blks, lazy-count=1 Dec 14 13:47:50 Tower root: realtime =none extsz=4096 blocks=0, rtextents=0 Dec 14 13:47:50 Tower emhttp: shcmd (1100): mkdir -p /mnt/disk3 Dec 14 13:47:50 Tower emhttp: shcmd (1101): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md3 /mnt/disk3 |& logger Dec 14 13:47:50 Tower kernel: XFS (md3): Mounting V5 Filesystem Dec 14 13:47:50 Tower emhttp: shcmd (1102): xfs_growfs /mnt/disk3 |& logger Dec 14 13:47:50 Tower kernel: XFS (md3): Ending clean mount Dec 14 13:47:50 Tower root: meta-data=/dev/md3 isize=512 agcount=8, agsize=268435455 blks Dec 14 13:47:50 Tower root: = sectsz=512 attr=2, projid32bit=1 Dec 14 13:47:50 Tower root: = crc=1 finobt=1 spinodes=0 Dec 14 13:47:50 Tower root: data = bsize=4096 blocks=1953506633, imaxpct=5 Dec 14 13:47:50 Tower root: = sunit=0 swidth=0 blks Dec 14 13:47:50 Tower root: naming =version 2 bsize=4096 ascii-ci=0 ftype=1 Dec 14 13:47:50 Tower root: log =internal bsize=4096 blocks=521728, version=2 Dec 14 13:47:50 Tower root: = sectsz=512 sunit=0 blks, lazy-count=1 Dec 14 13:47:50 Tower root: realtime =none extsz=4096 blocks=0, rtextents=0 Dec 14 13:47:50 Tower emhttp: shcmd (1103): mkdir -p /mnt/cache Dec 14 13:47:50 Tower emhttp: shcmd (1104): set -o pipefail ; mount -t btrfs -o noatime,nodiratime,degraded -U ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a /mnt/cache |& logger Dec 14 13:47:50 Tower kernel: BTRFS info (device sdd1): allowing degraded mounts Dec 14 13:47:50 Tower kernel: BTRFS info (device sdd1): disk space caching is enabled Dec 14 13:47:50 Tower kernel: BTRFS: has skinny extents Dec 14 13:47:50 Tower kernel: BTRFS warning (device sdd1): devid 1 uuid 41e61b50-03bc-49b9-adbf-b2b6c0accbd7 is missing Dec 14 13:47:50 Tower kernel: BTRFS: failed to read chunk tree on sdd1 Dec 14 13:47:50 Tower root: mount: wrong fs type, bad option, bad superblock on /dev/sdd1, Dec 14 13:47:50 Tower root: missing codepage or helper program, or other error Dec 14 13:47:50 Tower root: Dec 14 13:47:50 Tower root: In some cases useful info is found in syslog - try Dec 14 13:47:50 Tower root: dmesg | tail or so. Dec 14 13:47:50 Tower emhttp: err: shcmd: shcmd (1104): exit status: 32 Dec 14 13:47:50 Tower emhttp: mount error: No file system (32) Dec 14 13:47:50 Tower emhttp: shcmd (1105): umount /mnt/cache |& logger Dec 14 13:47:50 Tower kernel: BTRFS: open_ctree failed Dec 14 13:47:50 Tower root: umount: /mnt/cache: not mounted Dec 14 13:47:50 Tower emhttp: shcmd (1106): rmdir /mnt/cache Dec 14 13:47:50 Tower emhttp: shcmd (1107): sync Dec 14 13:47:53 Tower emhttp: shcmd (1108): mkdir /mnt/user0 Dec 14 13:47:53 Tower emhttp: shcmd (1109): /usr/local/sbin/shfs /mnt/user0 -disks 14 -o noatime,big_writes,allow_other,use_ino |& logger Dec 14 13:47:53 Tower emhttp: shcmd (1110): mkdir /mnt/user Dec 14 13:47:53 Tower emhttp: shcmd (1111): /usr/local/sbin/shfs /mnt/user -disks 15 51200000000 -o noatime,big_writes,allow_other,use_ino -o remember=330 |& logger Dec 14 13:47:53 Tower emhttp: shcmd (1112): cat - > /boot/config/plugins/dynamix/mover.cron <<< "# Generated mover schedule:#01240 3 * * * /usr/local/sbin/mover |& logger#012" Dec 14 13:47:53 Tower emhttp: shcmd (1113): /usr/local/sbin/update_cron &> /dev/null Dec 14 13:47:53 Tower cache_dirs: ============================================== Dec 14 13:47:53 Tower cache_dirs: Starting cache_dirs: Dec 14 13:47:53 Tower cache_dirs: Arguments= Dec 14 13:47:53 Tower cache_dirs: Cache Pressure=10 Dec 14 13:47:53 Tower cache_dirs: Max Scan Secs=10, Min Scan Secs=1 Dec 14 13:47:53 Tower cache_dirs: Scan Type=adaptive Dec 14 13:47:53 Tower cache_dirs: Max Scan Depth=none Dec 14 13:47:53 Tower cache_dirs: Use Command='find -noleaf' Dec 14 13:47:53 Tower cache_dirs: Version=2.1.1 Dec 14 13:47:53 Tower cache_dirs: ---------- Caching Directories --------------- Dec 14 13:47:53 Tower cache_dirs: .Recycle.Bin Dec 14 13:47:53 Tower cache_dirs: Al Dec 14 13:47:53 Tower cache_dirs: BlueIrisArchive Dec 14 13:47:53 Tower cache_dirs: Crashplan Dec 14 13:47:53 Tower cache_dirs: Downloads Dec 14 13:47:53 Tower cache_dirs: Jenny Dec 14 13:47:53 Tower cache_dirs: Movies Dec 14 13:47:53 Tower cache_dirs: Music Dec 14 13:47:53 Tower cache_dirs: Photos Dec 14 13:47:53 Tower cache_dirs: SageTV Dec 14 13:47:53 Tower cache_dirs: SageTV2 Dec 14 13:47:53 Tower cache_dirs: Share Dec 14 13:47:53 Tower cache_dirs: Squidbait-among Dec 14 13:47:53 Tower cache_dirs: Squidbait-been Dec 14 13:47:53 Tower cache_dirs: Squidbait-carriage Dec 14 13:47:53 Tower cache_dirs: Squidbait-casualties Dec 14 13:47:53 Tower cache_dirs: Squidbait-certainly Dec 14 13:47:53 Tower cache_dirs: Squidbait-chill Dec 14 13:47:53 Tower cache_dirs: Squidbait-further Dec 14 13:47:53 Tower cache_dirs: Squidbait-glorious Dec 14 13:47:53 Tower cache_dirs: Squidbait-herr Dec 14 13:47:53 Tower cache_dirs: Squidbait-lamps Dec 14 13:47:53 Tower cache_dirs: Squidbait-order Dec 14 13:47:53 Tower cache_dirs: Squidbait-reins Dec 14 13:47:53 Tower cache_dirs: Squidbait-some Dec 14 13:47:53 Tower cache_dirs: Squidbait-them Dec 14 13:47:53 Tower cache_dirs: Squidbait-through Dec 14 13:47:53 Tower cache_dirs: Squidbait-valleys Dec 14 13:47:53 Tower cache_dirs: Squidbait-were Dec 14 13:47:53 Tower cache_dirs: Squidbait-whip Dec 14 13:47:53 Tower cache_dirs: Squidbait-whose Dec 14 13:47:53 Tower cache_dirs: TV Dec 14 13:47:53 Tower cache_dirs: system Dec 14 13:47:53 Tower cache_dirs: ---------------------------------------------- Dec 14 13:47:53 Tower cache_dirs: cache_dirs process ID 20181 started Dec 14 13:47:53 Tower root: Fix Common Problems Version 2016.12.10 Dec 14 13:47:55 Tower root: Fix Common Problems: Error: cache (Samsung_SSD_850_EVO_1TB_S21DNXAGB02821Z) has file system errors (No file system (32)) Dec 14 13:47:58 Tower sSMTP[20911]: Creating SSL connection to host Dec 14 13:47:58 Tower sSMTP[20911]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384 Dec 14 13:47:59 Tower sSMTP[20911]: Sent mail for [email protected] (221 b-painless.mh.aa.net.uk closing connection) uid=0 username=root outbytes=713 Dec 14 13:47:59 Tower root: ransomware protection:ransomware protection service not running Dec 14 13:47:59 Tower root: ransomware protection:ransomware deletion process not running Dec 14 13:47:59 Tower root: ransomware protection:ransomware share deletion process not running Dec 14 13:47:59 Tower root: ransomware protection:ransomeware bait share creation process not running Dec 14 13:47:59 Tower root: ransomware protection:ransomware bait share count process not running Dec 14 13:47:59 Tower root: ransomware protection:ransomware bait share monitor process not running Dec 14 13:47:59 Tower recycle.bin: Starting Recycle Bin Dec 14 13:47:59 Tower emhttp: Starting Recycle Bin... Dec 14 13:47:59 Tower root: ransomware protection:Gathering Inventory Of Old Bait Files Dec 14 13:47:59 Tower root: ransomware protection:It appears previous bait shares still exist on array. Exiting Dec 14 13:47:59 Tower root: ransomware protection:Starting Background Monitoring of Baitshares Dec 14 13:47:59 Tower root[21102]: Setting up watches. Beware: since -r was given, this may take a while! Dec 14 13:48:01 Tower unassigned.devices: Mounting Devices... Dec 14 13:48:01 Tower emhttp: Dec 14 13:48:01 Tower emhttp: Dec 14 13:48:01 Tower kernel: XFS (sda1): Mounting V4 Filesystem Dec 14 13:48:02 Tower kernel: XFS (sda1): Ending clean mount Dec 14 13:48:02 Tower emhttp: Starting services... Dec 14 13:48:02 Tower emhttp: nothing to sync Dec 14 13:48:19 Tower root[21102]: Watches established. Dec 14 13:50:05 Tower root: ransomware protection:Found 50508 previous bait files. Dec 14 13:50:05 Tower root: ransomware protection:Starting Background Monitoring Of Bait Files Dec 14 13:50:05 Tower root[27164]: Setting up watches. Dec 14 13:50:06 Tower root[27164]: Watches established. Dec 14 13:51:15 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Dec 14 13:58:59 Tower emhttp: cmd: /usr/local/emhttp/plugins/dynamix/scripts/tail_log syslog Dec 14 14:05:09 Tower kernel: mdcmd (40): spindown 2 Dec 14 14:05:10 Tower kernel: mdcmd (41): spindown 3 DownloadDone
December 14, 20169 yr You didn't remove the old device before adding the new one following the procedure I linked above: http://lime-technology.com/forum/index.php?topic=48508.msg484479#msg484479
December 14, 20169 yr Author No I just followed Method 2 from the FAQ. Sorry... What should I do now? Shut down the array disconnect the replacement SSD? reconnect the old failing drive power up assign the old drive back into slot 2 start array backup the ssd follow the remove procedure?
December 14, 20169 yr Without cleanly removing the old device the cache pool won't mount with it connected and unassigned, you can try starting the array after physically disconnecting the old SSD from the server.
December 14, 20169 yr Author ok, let me clarify. I have followed method 2 in this post http://lime-technology.com/forum/index.php?topic=48508.msg484480#msg484480 if enable disable array auto start -Disabled shutdown server -Done replace the cache device (old device has to be physically disconnected, or cleared, starting the array with a previously used pool device unassigned will result in unmountable cache) -Removed old cache 2 SSD from the PC. Left the good Cache drive 1 where it is -Installed a new 1TB SSD that has never been in this PC before. It has an NTFS file system on it at the moment power up, assign new cache device (if device was ever used in a cache pool in the past preclear it before adding to pool, if not cache can be corrupted) -Powered up - assigned the new SSD to be the cache2 drive start array, there will be read/write activity on the pool, WebGUI cache page "btrfs filesystem show" will show "***some devices missing", wait and after some time it will stop showing that and a balance will begin -Yes it did say "Some Devices Missing", but cache disk 1 is showing as unmountable, no file system. cache disk 1 has all my data on it. this can take some time depending on how much data is on the pool and how fast your devices are, don't stop the array until it's done when cache pool reads/writes stop and balance is done check that on the cache page "btrfs filesystem show" total devices are correct and it's not displaying "***some devices missing", e.g., this is how a 2 disk pool should look: What do I do now? Can I recover the data on disk 1?
December 14, 20169 yr Using that procedure cache should never be unmontable, if it is the remaining cache device is not working correctly, probably has some BTRFS corruption, no use waiting, it won't mount. You can try the other device.
December 14, 20169 yr Author Ok, I've put both original cache drives back in their original places, and powered back up. The array started, and the cache pool and all my data is available. So now I am copying the data to an external USB hard disk. Then I will be back to my starting point. Should I then retry Method2? Or should I try to "remove a cache pool disk" as here http://lime-technology.com/forum/index.php?topic=48508.msg484479#msg484479? I could just format the 2 cache drives, and start over from my backup, but, I'd like to try and recover the data from the 2 cache drives, as this is the point of having the 2nd drive.
December 14, 20169 yr When there are filsystem corruption issues it's difficult to guess what's going to work or not, first step would be to backup your cache, then you could try removing one of the devices using this procedure, if successful then add the new SSD, but there could be a problem if you remove the wrong device. If it were me I'd format the cache pool and restore data from backups, in my experience btrfs pools are not very good at recovering from corruption.
December 14, 20169 yr Author Ok I'll try that once the backup finishes. Thanks for sticking with me! Genuine question - what are BTFRS cache pools good at recovering from? Maybe I should go back to XFS and forget the cache pool idea.
December 14, 20169 yr Genuine question - what are BTFRS cache pools good at recovering from? Maybe I should go back to XFS and forget the cache pool idea. Single catastrophic device failure. If a drive is totally gone, it copes ok. Not at all tolerant of intermittent communication errors, or shutting down without cleanly unmounting. If you have a hard server crash for whatever reason, expect to deal with some BTRFS issues.
December 14, 20169 yr Author Does XFS cope with unclean shutdowns better? If I have an unclean shutdown with BTRFS, should I always run a scrub afterwards? Is there an equivalent with XFS? With an unclean shutdown, is there a higher risk of data loss with BTRFS than XFS?
December 14, 20169 yr Author Latest update. My backup finished, with the exception of 1 file which was just a CCTV video recording file from BlueIris. I re-ran scrub, which came up with the same 634 uncorrectable errors. I deleted the BlueIris file and re-run scrub. This was successful with no errors. scrub status for ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a scrub started at Wed Dec 14 19:00:51 2016 and finished after 00:23:27 total bytes scrubbed: 1009.41GiB with 0 errors The filesystem is showing this Label: none uuid: ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a Total devices 2 FS bytes used 711.02GiB devid 1 size 931.51GiB used 931.51GiB path /dev/sdc1 devid 2 size 931.51GiB used 395.49GiB path /dev/sdd1 Then I ran Balance, This ran for only a few seconds and showed No balance found on '/mnt/cache' Is it correct to have different amounts of data used on each drive? If not, how do I correct it? I am not seeing any ATA disconnects in the logs any more. All is now working again, but I am not sure why I had the problem, and whether there is still a problem with the balancing. I'll keep an eye on the syslog.
December 14, 20169 yr Is it correct to have different amounts of data used on each drive? If not, how do I correct it? Try running balance again after array stop/restart. If it doesn't correctly balance it's not a very good sign.
December 14, 20169 yr Author I tried that rebooting, but no change. Here are the log entries from the balance Dec 14 20:27:37 Tower php: /usr/local/emhttp/plugins/dynamix/scripts/btrfs_balance 'start' '/mnt/cache' '-dconvert=raid1 -mconvert=raid1' Dec 14 20:27:39 Tower kernel: BTRFS info (device sdd1): relocating block group 1451176361984 flags 17 I have never run a balance on these drives. Originally it was just a single cache drive, and I added another cache drive at a later date. Looking at this Label: none uuid: ef0cbf24-f3c9-4e0d-90a2-6533b7751f4a Total devices 2 FS bytes used 711.02GiB devid 1 size 931.51GiB used 931.51GiB path /dev/sdc1 devid 2 size 931.51GiB used 395.49GiB path /dev/sdd1 Is the problem here that I've filled up the 1st cache drive, because the balancing isn't working? Drive 1 seems to show 931.51GiB used 931.51GiB.?
Archived
This topic is now archived and is closed to further replies.