Ymetro Posted July 21, 2020 Posted July 21, 2020 One of my 2 SSD's in the Cache pool seems to have errors in BTRFS. Making it go in read-only mode and crashing my Docker containers. Diagnostic zip is attached. I cannot seem to fix it with btrfs check --repair /dev/sdi1 -p It gives a few errors I cannot recall at the moment at the first of seven steps ([1/7]). If needed than I would set the array in maintenance mode to check. Please, let me know if that is needed. The SSD log says: Quote Jul 21 21:46:10 PCUS kernel: ata8: SATA max UDMA/133 abar m2048@0xf33ff000 port 0xf33ff180 irq 26 Jul 21 21:46:10 PCUS kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jul 21 21:46:10 PCUS kernel: ata8.00: supports DRM functions and may not be fully accessible Jul 21 21:46:10 PCUS kernel: ata8.00: disabling queued TRIM support Jul 21 21:46:10 PCUS kernel: ata8.00: ATA-9: Samsung SSD 850 EVO 1TB, S2RFNX0H502874J, EMT02B6Q, max UDMA/133 Jul 21 21:46:10 PCUS kernel: ata8.00: 1953525168 sectors, multi 1: LBA48 NCQ (depth 32), AA Jul 21 21:46:10 PCUS kernel: ata8.00: supports DRM functions and may not be fully accessible Jul 21 21:46:10 PCUS kernel: ata8.00: disabling queued TRIM support Jul 21 21:46:10 PCUS kernel: ata8.00: configured for UDMA/133 Jul 21 21:46:10 PCUS kernel: sd 8:0:0:0: [sdi] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jul 21 21:46:10 PCUS kernel: sd 8:0:0:0: [sdi] Write Protect is off Jul 21 21:46:10 PCUS kernel: sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Jul 21 21:46:10 PCUS kernel: sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 21 21:46:10 PCUS kernel: sdi: sdi1 Jul 21 21:46:10 PCUS kernel: sd 8:0:0:0: [sdi] Attached SCSI removable disk Jul 21 21:46:10 PCUS kernel: BTRFS: device fsid ea8b613a-3208-4cd9-a512-80eb1fb736c5 devid 1 transid 14672899 /dev/sdi1 Jul 21 21:46:53 PCUS emhttpd: Samsung_SSD_850_EVO_1TB_S2RFNX0H502874J (sdi) 512 1953525168 Jul 21 21:46:53 PCUS emhttpd: import 30 cache device: (sdi) Samsung_SSD_850_EVO_1TB_S2RFNX0H502874J Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): disk space caching is enabled Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): has skinny extents Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): enabling ssd optimizations Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): start tree-log replay Jul 21 21:47:04 PCUS kernel: BTRFS warning (device sdi1): block group 3243245568 has wrong amount of free space Jul 21 21:47:04 PCUS kernel: BTRFS warning (device sdi1): failed to load free space cache for block group 3243245568, rebuilding it now Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): checking UUID tree Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): resizing devid 1 Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): new size for /dev/sdi1 is 1000204853248 Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): resizing devid 2 Jul 21 21:47:04 PCUS kernel: BTRFS info (device sdi1): new size for /dev/sdh1 is 1000204853248 Jul 21 21:47:05 PCUS s3_sleep: included disks=sdd sde sdf sdg sdh sdi sdj sdk Jul 21 21:48:48 PCUS kernel: BTRFS critical (device sdi1): corrupt leaf: root=2 block=3032002707456 slot=167, unexpected item end, have 929783252 expect 7432 Jul 21 21:48:48 PCUS kernel: BTRFS: error (device sdi1) in __btrfs_free_extent:6805: errno=-5 IO failure Jul 21 21:48:48 PCUS kernel: BTRFS critical (device sdi1): corrupt leaf: root=2 block=3032002707456 slot=167, unexpected item end, have 929783252 expect 7432 Jul 21 21:48:48 PCUS kernel: BTRFS info (device sdi1): forced readonly Jul 21 21:48:48 PCUS kernel: BTRFS: error (device sdi1) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Jul 21 21:48:48 PCUS kernel: BTRFS: error (device sdi1) in __btrfs_free_extent:6805: errno=-5 IO failure Jul 21 21:48:48 PCUS kernel: BTRFS: error (device sdi1) in btrfs_run_delayed_refs:2935: errno=-5 IO failure Jul 21 21:48:48 PCUS kernel: BTRFS error (device sdi1): pending csums is 27144192 And the syslog says in almost red everywhere: Quote Jul 21 22:05:54 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1348832 Jul 21 22:05:54 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 347, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:06:25 PCUS kernel: loop: Write error at byte offset 686866432, length 4096. Jul 21 22:06:25 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1341536 Jul 21 22:06:25 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 348, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:06:25 PCUS kernel: loop: Write error at byte offset 690601984, length 4096. Jul 21 22:06:25 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1348832 Jul 21 22:06:25 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 349, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:06:56 PCUS kernel: loop: Write error at byte offset 686866432, length 4096. Jul 21 22:06:56 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1341536 Jul 21 22:06:56 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 350, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:06:56 PCUS kernel: loop: Write error at byte offset 690601984, length 4096. Jul 21 22:06:56 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1348832 Jul 21 22:06:56 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 351, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:07:26 PCUS kernel: loop: Write error at byte offset 686866432, length 4096. Jul 21 22:07:26 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1341536 Jul 21 22:07:26 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 352, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:07:26 PCUS kernel: loop: Write error at byte offset 690601984, length 4096. Jul 21 22:07:26 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1348832 Jul 21 22:07:26 PCUS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 353, rd 0, flush 0, corrupt 0, gen 0 Jul 21 22:07:57 PCUS kernel: loop: Write error at byte offset 686866432, length 4096. Jul 21 22:07:57 PCUS kernel: print_req_error: I/O error, dev loop2, sector 1341536 ... etc. I noticed Nextcloud wouldn't respond and found MariaDB Docker missing from the Docker tab in the WebGui. I did notice 2 CRC error counts, that may be related to a bad cable connection before, but the connectors seem fine now. I might replace the cable, if it be the cause of this. Could it be the SSD in the Cache pool may be faulty? The errors seem show up when I add the disapeared MariaDB Docker from the "Previous Installed" list in Community Applications. I had already had to replace a harddrive (a spinnig one) that gave too many errors, that it got simulated. I should RMA it as it still is in it's warranty period. I added 2 same capacity disks to replace smaller ones. Can it be some corruption moved from the array to the cache disks or are those isolated from each other? I really need some system stability as it seems that I have to reboot the system every other day. Can someone help? pcus-diagnostics-20200721-2201.zip Quote
Ymetro Posted July 21, 2020 Author Posted July 21, 2020 I am not sure if the topic matches the content. I not so good at that. I am open for suggestions about this. Quote
JorgeB Posted July 22, 2020 Posted July 22, 2020 Best bet is to backup cache data and re-format the pool. Quote
Ymetro Posted July 25, 2020 Author Posted July 25, 2020 (edited) Thanks for te reply. I stopped all te VM's and Dockers. Also did active the mover. The latter didn't seem to do much. I just copied te /mnt/cache/ folder contents to my backup array share /mnt/user0/backup/cache backup/cache/ in Midnight Commander. There where some file errors. I will try again when I put the array in maintenance mode. Before this works and I format the SSD's again: Is BTRFS still the best relaiable option for a SSD cache pool nowadays? Edited July 25, 2020 by Ymetro added MC and mover info Quote
Ymetro Posted July 25, 2020 Author Posted July 25, 2020 Maintenance mode didn't do much, because the disks aren't mounted. Just restarted the server with Docker and Virtual Machines disabled and started another run of copying files with MC to the backup folder. Quote
JorgeB Posted July 25, 2020 Posted July 25, 2020 1 hour ago, Ymetro said: Is BTRFS still the best relaiable option for a SSD cache pool nowadays? It's currently the only option for a pool, for single cache you can also use xfs. There are some btrfs recovery options here. Quote
Ymetro Posted August 24, 2020 Author Posted August 24, 2020 On 7/25/2020 at 12:53 PM, johnnie.black said: It's currently the only option for a pool, for single cache you can also use xfs. There are some btrfs recovery options here. Thank you for the info. I appreciate it. Even got my 10GbE MTU set to 9000 for te server and main PC NIC. It seems to get faster transfer speeds. The reformat of the cache pool has been done and the backup has been restored to it, and it seems to run fine. Now I am a bit worried about my parity disk as its Reported uncorrected went up to 17. I seem to either got a bad batch of 2 Seagate's Ironwolf 14TB (non Pro) and/or my PSU is not up for them. The latter I read somewhere on this forum. I am curious though: 2 of the 3 Ironwolfs non Pro's came in from Amazon. I hope they are kind to HDD's.... Maybe I should put this in a new topic? Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.