Jump to content

Errors in Cron Jobs - FSTrim - and CRC Errors


Recommended Posts

Hello,

 

I have recently had some CRC errors pop back up, and I am beginning to think its drive or motherboard controller related.  I have replaced all the sata cables (the only ones doing it are Sata to the MB and not on my expander cards), the enclosure (5.25" to 6x2.5" hot swap bays), and trays.   I will be ordering an 8 port pci card and trying it soon. I wanted to check some things here as I am getting errors I do not understand fully.

 

fstrim: /var/lib/docker: FITRIM ioctl failed: Input/output error

Jun 18 10:33:30 Arcanine kernel: print_req_error: I/O error, dev loop2, sector 21048920
Jun 18 10:33:30 Arcanine kernel: BTRFS warning (device loop2): failed to trim 30 block group(s), last error -5
Jun 18 10:33:30 Arcanine kernel: BTRFS warning (device loop2): failed to trim 1 device(s), last error -5

 

I have my array started but all dockers and vm's off at the moment.    I think I may have some corruption going on.

 

I am also still getting other Cron Job errors I can't figure out.   I have uninstalled and reinstall TinC and the error persists with or without TinC:

error: stat of /var/log/tinc.* failed: No such file or directory

Do I just need to make this directory?

 

An alternative to the changing the 6x2.5" ssd's to a PCI controller is moving cache to 2x8tb available drives I have on the existing 42bay enclosure.

 

Any ideas?

 

Diags attached.

arcanine-diagnostics-20200618-1041.zip

Link to comment

Cache device dropped offline:

 

Jun 17 14:43:14 Arcanine kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Jun 17 14:43:14 Arcanine kernel: ata1.00: disabled

 

Resulting in the next errors, both on the cache filesystem and docker image since it was there.

 

Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
Jun 17 14:43:14 Arcanine kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
Jun 17 14:43:17 Arcanine kernel: BTRFS info (device loop2): no csum found for inode 56871 start 5624741888
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 244714144
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 432260600
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 433854440
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 123460544
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 32063328
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 32065344
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159616
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159712
Jun 17 14:43:18 Arcanine kernel: XFS (sdb1): writeback error on sector 34159904

 

Link to comment
6 minutes ago, fmp4m said:

This seems to happen every time my cron calls for fstrim (hourly) or mover. 

The ATA errors are constant and suggest a SATA cable problem:

 

Jun 17 13:02:18 Arcanine kernel: ata1: SError: { UnrecovData BadCRC Handshk }

 

And it's not your cache like I assumed, it's an unassigned device:

 

Jun 17 13:01:18 Arcanine kernel: ata1.00: ATA-11: SanDisk SDSSDH3250G, 181085804720, X61110RL, max UDMA/133

 

 

 

Link to comment

Hi Johnnie,  

 

That is an unassigned drive,  Im wondering if its the drive itself.  I use that drive for SQL databases and img's.

It is a Marvell based ssd.  I know previous versions had issues trimming the Marvel based but thought it was resolved.    The sata cable, housing, backplane are all new.   The only remaining is the drive and the sata port itself on the motherboard.

 

Jun 18 11:23:35 Arcanine unassigned.devices: Mount of '/dev/sdo1' failed. Error message: mount: /mnt/disks/250GB_BAY2: wrong fs type, bad option, bad superblock on /dev/sdo1, missing codepage or helper program, or other error.

Edited by fmp4m
Link to comment

I will try new cables this evening ( good thing I ordered extras ).  I will pull one off another drive not erroring and put that drive on this port with a new cable.

 

As for the BTRFS,  The only BtrFS that I can recall in my system is the cache drives, so something is occuring with those as well as the unassigned drive.

 

Link to comment

Hi Johnnie,

 

Thanks for confirming that - it is odd that all my SSD's one that controller are trimming except 512SSD-TOP which is a different model ssd I believe.

 

I think there is corruption on my Cache pool after all of this.   

 

Jun 19 12:47:35 Arcanine root: mount: /var/lib/docker: mount(2) system call failed: File exists.
Jun 19 12:47:35 Arcanine root: mount error
Jun 19 12:47:35 Arcanine emhttpd: shcmd (478): exit status: 1
Jun 19 12:47:35 Arcanine kernel: BTRFS warning (device loop2): duplicate device fsid:devid for 5a56f8e9-9eec-4ee0-9bb4-9d88a7c04293:1 old:/dev/loop2 new:/dev/loop3
Jun 19 12:47:35 Arcanine kernel: BTRFS warning (device loop2): duplicate device fsid:devid for 5a56f8e9-9eec-4ee0-9bb4-9d88a7c04293:1 old:/dev/loop2 new:/dev/loop3
Jun 19 13:00:01 Arcanine speedtest: Internet bandwidth test started
Jun 19 13:00:01 Arcanine speedtest: Host:
Jun 19 13:00:01 Arcanine speedtest:
Jun 19 13:00:01 Arcanine speedtest: Internet bandwidth test completed
Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1
Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1
Jun 19 13:08:45 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976309 off 48177983488 csum 0x98f94189 expected csum 0x3fe1c9c2 mirror 1
Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1
Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1
Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1
Jun 19 13:08:50 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976366 off 6451195904 csum 0x98f94189 expected csum 0xce9bfe79 mirror 1
Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1
Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1
Jun 19 13:09:36 Arcanine kernel: BTRFS warning (device sdah1): csum failed root 5 ino 22976083 off 24481468416 csum 0x98f94189 expected csum 0xa7fe654f mirror 1
Jun 19 13:09:38 Arcanine crond[2763]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

 

Edited by fmp4m
Link to comment

Well,   im in a world of hurt now.    My Docker.img decided at 3am to have something write to it, until it was full.  corrupting it.  so I have to rebuild all my dockers (Thank you CA Previous apps for helping make this easier!)

 

I am now also getting these in my logs:

 

Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token
Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token

 

 

**** Docker image file is getting full (currently 100 % used) ****   **** Unable to write to Docker Image ****

 

Jun 20 14:16:26 Arcanine emhttpd: shcmd (259): /usr/local/sbin/mount_image '/mnt/user/docker/docker.img' /var/lib/docker 600
Jun 20 14:16:26 Arcanine root: /mnt/user/docker/docker.img is in-use, cannot mount

 

 

Link to comment
7 minutes ago, fmp4m said:

Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token
Jun 20 14:15:46 Arcanine root: error: /update.php: missing csrf_token

 

Why do you have 100G docker image anyway? Very rarely would anyone need more than 20G. Any time a user has more than 20G docker image I suspect they have one or more of their applications misconfigured.

 

An application will write into the docker image if it writes to a path that isn't mapped to the host. Common mistakes are application paths that don't match the mapped container path in upper/lower case, or application paths that are relative (what are they relative to?)

 

 

Link to comment
2 hours ago, trurl said:

Very rarely would anyone need more than 20G

For example, I am running 16 dockers, and they only use 39% of my 20G docker image. Docker image basically should only contain the executable code of your dockers, and everything else should be in appdata or in other user shares.

Link to comment

I had set the 100g back when I had one docker verbose logging and writing to the docker image section incorrectly.    Once fixed, I left the 100G image and never downsized it.  I had the extra space and left it that way anyways.   Nothing was supposed to be writing anything to the image itself,  but to /mnt/user and /appdata only.  So I am unsure what was writing incorrectly.    Unless I missed something a while back,  I don't recall anything writing to the docker image at all in over a year.

 

 

I will dig into the CSRF error later after the rest is settled.

 

I am now also getting segfaults :(

 

 

Jun 20 17:23:31 Arcanine kernel: vnstati[15525]: segfault at 20 ip 0000000000407f7a sp 00007ffddc1149d0 error 4 in vnstati[400000+16000]

Edited by fmp4m
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...