DieFalse Posted June 18, 2020 Posted June 18, 2020 Hello, I have recently had some CRC errors pop back up, and I am beginning to think its drive or motherboard controller related. I have replaced all the sata cables (the only ones doing it are Sata to the MB and not on my expander cards), the enclosure (5.25" to 6x2.5" hot swap bays), and trays. I will be ordering an 8 port pci card and trying it soon. I wanted to check some things here as I am getting errors I do not understand fully. fstrim: /var/lib/docker: FITRIM ioctl failed: Input/output error Jun 18 10:33:30 Arcanine kernel: print_req_error: I/O error, dev loop2, sector 21048920 Jun 18 10:33:30 Arcanine kernel: BTRFS warning (device loop2): failed to trim 30 block group(s), last error -5 Jun 18 10:33:30 Arcanine kernel: BTRFS warning (device loop2): failed to trim 1 device(s), last error -5 I have my array started but all dockers and vm's off at the moment. I think I may have some corruption going on. I am also still getting other Cron Job errors I can't figure out. I have uninstalled and reinstall TinC and the error persists with or without TinC: error: stat of /var/log/tinc.* failed: No such file or directory Do I just need to make this directory? An alternative to the changing the 6x2.5" ssd's to a PCI controller is moving cache to 2x8tb available drives I have on the existing 42bay enclosure. Any ideas? Diags attached. arcanine-diagnostics-20200618-1041.zip Quote
JorgeB Posted June 18, 2020 Posted June 18, 2020 Cache device dropped offline: Jun 17 14:43:14 Arcanine kernel: ata1.00: failed to set xfermode (err_mask=0x40) Jun 17 14:43:14 Arcanine kernel: ata1.00: disabled Resulting in the next errors, both on the cache filesystem and docker image since it was there. Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0 Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0 Jun 17 14:43:17 Arcanine kernel: BTRFS info (device loop2): no csum found for inode 56871 start 5624741888 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 244714144 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 432260600 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 433854440 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 123460544 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 32063328 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 32065344 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159616 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159712 Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159904 Quote
DieFalse Posted June 18, 2020 Author Posted June 18, 2020 Hi Johnnie. This seems to happen every time my cron calls for fstrim (hourly) or mover. This has been occurring for about 7 days now. I have replaced all the components above trying to track down the root. Quote
JorgeB Posted June 18, 2020 Posted June 18, 2020 6 minutes ago, fmp4m said: This seems to happen every time my cron calls for fstrim (hourly) or mover. The ATA errors are constant and suggest a SATA cable problem: Jun 17 13:02:18 Arcanine kernel: ata1: SError: { UnrecovData BadCRC Handshk } And it's not your cache like I assumed, it's an unassigned device: Jun 17 13:01:18 Arcanine kernel: ata1.00: ATA-11: SanDisk SDSSDH3250G, 181085804720, X61110RL, max UDMA/133 Quote
DieFalse Posted June 18, 2020 Author Posted June 18, 2020 (edited) Hi Johnnie, That is an unassigned drive, Im wondering if its the drive itself. I use that drive for SQL databases and img's. It is a Marvell based ssd. I know previous versions had issues trimming the Marvel based but thought it was resolved. The sata cable, housing, backplane are all new. The only remaining is the drive and the sata port itself on the motherboard. Jun 18 11:23:35 Arcanine unassigned.devices: Mount of '/dev/sdo1' failed. Error message: mount: /mnt/disks/250GB_BAY2: wrong fs type, bad option, bad superblock on /dev/sdo1, missing codepage or helper program, or other error. Edited June 18, 2020 by fmp4m Quote
JorgeB Posted June 18, 2020 Posted June 18, 2020 4 minutes ago, fmp4m said: Im wondering if its the drive itself. Not impossible, but unlikely, I would try a new SATA cable first, even it was already replaced once, try also a different SATA port, swap with another device if needed. Quote
DieFalse Posted June 18, 2020 Author Posted June 18, 2020 I will try new cables this evening ( good thing I ordered extras ). I will pull one off another drive not erroring and put that drive on this port with a new cable. As for the BTRFS, The only BtrFS that I can recall in my system is the cache drives, so something is occuring with those as well as the unassigned drive. Quote
JorgeB Posted June 18, 2020 Posted June 18, 2020 15 minutes ago, fmp4m said: As for the BTRFS, Docker image is always btrfs, even if it's on an XFS device. Quote
DieFalse Posted June 18, 2020 Author Posted June 18, 2020 (edited) I wanted to rule out everything, I swapped to a pci controller with all new cables, and shifted the drives in the chassis. I am still getting errors, and added a new one. Guess my PCI controller doesnt handle trim fully. fstrim: /mnt/disks/512SSD-TOP: the discard operation is not supported arcanine-diagnostics-20200618-1448.zip Edited June 18, 2020 by fmp4m Quote
JorgeB Posted June 19, 2020 Posted June 19, 2020 10 hours ago, fmp4m said: Guess my PCI controller doesnt handle trim fully. LSI SAS3 HBAs like the SAS3008 models you have can trim if they are in IT mode, but only SSD with deterministic trim support, SAS2 models currently can't trim any SSD. Quote
DieFalse Posted June 19, 2020 Author Posted June 19, 2020 (edited) Hi Johnnie, Thanks for confirming that - it is odd that all my SSD's one that controller are trimming except 512SSD-TOP which is a different model ssd I believe. I think there is corruption on my Cache pool after all of this. Jun 19 12:47:35 Arcanine root: mount: /var/lib/docker: mount(2) system call failed: File exists. Jun 19 12:47:35 Arcanine root: mount error Jun 19 12:47:35 Arcanine emhttpd: shcmd (478): exit status: 1 Jun 19 12:47:35 Arcanine kernel: BTRFS warning (device loop2): duplicate device fsid:devid for 5a56f8e9-9eec-4ee0-9bb4-9d88a7c04293:1 old:/dev/loop2 new:/dev/loop3 Jun 19 12:47:35 Arcanine kernel: BTRFS warning (device loop2): duplicate device fsid:devid for 5a56f8e9-9eec-4ee0-9bb4-9d88a7c04293:1 old:/dev/loop2 new:/dev/loop3 Jun 19 13:00:01 Arcanine speedtest: Internet bandwidth test started Jun 19 13:00:01 Arcanine speedtest: Host: Jun 19 13:00:01 Arcanine speedtest: Jun 19 13:00:01 Arcanine speedtest: Internet bandwidth test completed Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1 Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1 Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1 Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1 Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1 Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1 Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1 Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1 Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1 Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1 Jun 19 13:09:38 Arcanine crond[2763]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Edited June 19, 2020 by fmp4m Quote
DieFalse Posted June 20, 2020 Author Posted June 20, 2020 Well, im in a world of hurt now. My Docker.img decided at 3am to have something write to it, until it was full. corrupting it. so I have to rebuild all my dockers (Thank you CA Previous apps for helping make this easier!) I am now also getting these in my logs: Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token **** Docker image file is getting full (currently 100 % used) **** **** Unable to write to Docker Image **** Jun 20 14:16:26 Arcanine emhttpd: shcmd (259): /usr/local/sbin/mount_image '/mnt/user/docker/docker.img' /var/lib/docker 600 Jun 20 14:16:26 Arcanine root: /mnt/user/docker/docker.img is in-use, cannot mount Quote
trurl Posted June 20, 2020 Posted June 20, 2020 7 minutes ago, fmp4m said: Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token Why do you have 100G docker image anyway? Very rarely would anyone need more than 20G. Any time a user has more than 20G docker image I suspect they have one or more of their applications misconfigured. An application will write into the docker image if it writes to a path that isn't mapped to the host. Common mistakes are application paths that don't match the mapped container path in upper/lower case, or application paths that are relative (what are they relative to?) Quote
trurl Posted June 20, 2020 Posted June 20, 2020 2 hours ago, trurl said: Very rarely would anyone need more than 20G For example, I am running 16 dockers, and they only use 39% of my 20G docker image. Docker image basically should only contain the executable code of your dockers, and everything else should be in appdata or in other user shares. Quote
DieFalse Posted June 20, 2020 Author Posted June 20, 2020 (edited) I had set the 100g back when I had one docker verbose logging and writing to the docker image section incorrectly. Once fixed, I left the 100G image and never downsized it. I had the extra space and left it that way anyways. Nothing was supposed to be writing anything to the image itself, but to /mnt/user and /appdata only. So I am unsure what was writing incorrectly. Unless I missed something a while back, I don't recall anything writing to the docker image at all in over a year. I will dig into the CSRF error later after the rest is settled. I am now also getting segfaults Jun 20 17:23:31 Arcanine kernel: vnstati[15525]: segfault at 20 ip 0000000000407f7a sp 00007ffddc1149d0 error 4 in vnstati[400000+16000] Edited June 20, 2020 by fmp4m Quote
DieFalse Posted June 20, 2020 Author Posted June 20, 2020 I found the culprit that was writing to docker.img. Apparently an update to deluge set downloads to /home/nobody/ instead of /mnt/downloads/. now to sort the corruption on cache and segfaults. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.