hermy65 Posted April 18 Share Posted April 18 I had an SSD go bad in my cache pool last week so I swapped a new drive in and I think got everything fixed. Today I'm getting BTRFS errors and I'm not sure exactly what to do. Diagnostics attached storage-diagnostics-20240418-1054.zip Quote Link to comment
JorgeB Posted April 18 Share Posted April 18 Syslog rotated so we cannot see the beginning of the problem, reboot and post new diags after array start. Quote Link to comment
hermy65 Posted April 18 Author Share Posted April 18 @JorgeB here are fresh diagnostics after a reboot. One thing i noticed is that now my entire cache pool says unmountable when it was working fine last night. Any ideas? storage-diagnostics-20240418-1501.zip Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 Type: btrfs rescue zero-log /dev/sdc1 Then re-start the array and post new diags. Quote Link to comment
hermy65 Posted April 19 Author Share Posted April 19 @JorgeB I ran the command in terminal, received this response: Clearing log on /dev/sdc1, previous log_root 4906986635264, level 0 Stopped then restarted the array and it looks like my cache pool has come back online, here are the new diagnostics. storage-diagnostics-20240419-0838.zip Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 Run a correcting scrub on the pool and post the results. P.S. you should change the docker network to ipvlan, since there are macvlan call traces. Quote Link to comment
hermy65 Posted April 19 Author Share Posted April 19 @JorgeB Here are the results from the correcting scrub UUID: bbc56f07-1a5f-4d7b-b019-a515d7eb35aa Scrub started: Fri Apr 19 08:48:42 2024 Status: finished Duration: 0:39:21 Total to scrub: 1.26TiB Rate: 563.20MiB/s Error summary: csum=8 Corrected: 0 Uncorrectable: 8 Unverified: 0 Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 Look in syslog for a list of corrupt file(s), those should be deleted/restored from a backup, then re-run a scrub to confirm there aren't any more errors. Quote Link to comment
hermy65 Posted April 19 Author Share Posted April 19 @JorgeB looks like its a couple pieces of plex and jellyfin metadata unless tthis is the wrong area to look at: Apr 19 08:49:34 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 19965161472 on dev /dev/sdc1, physical 10267930624, root 5, inode 686348049, offset 4096, length 4096, links 1 (path: appdata/jellyfin/data/metadata/People/E/Eamon Sheehan/folder.jpg) Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 335783, rd 191311, flush 1, corrupt 29971, gen 0 Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 19965161472 on dev /dev/sdc1 Apr 19 08:49:34 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 19965161472 on dev /dev/sdb1, physical 9206771712, root 5, inode 686348049, offset 4096, length 4096, links 1 (path: appdata/jellyfin/data/metadata/People/E/Eamon Sheehan/folder.jpg) Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 19965161472 on dev /dev/sdb1 Apr 19 08:49:34 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 19965165568 on dev /dev/sdc1, physical 10267934720, root 5, inode 686348049, offset 8192, length 4096, links 1 (path: appdata/jellyfin/data/metadata/People/E/Eamon Sheehan/folder.jpg) Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 335783, rd 191311, flush 1, corrupt 29972, gen 0 Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 19965165568 on dev /dev/sdc1 Apr 19 08:49:34 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 19965165568 on dev /dev/sdb1, physical 9206775808, root 5, inode 686348049, offset 8192, length 4096, links 1 (path: appdata/jellyfin/data/metadata/People/E/Eamon Sheehan/folder.jpg) Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 Apr 19 08:49:34 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 19965165568 on dev /dev/sdb1 Apr 19 08:48:46 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 2367467520 on dev /dev/sdc1, physical 1293725696, root 5, inode 812538915, offset 20480, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Media/localhost/8/4b1112dba0e382f5a87080425e1a7ac0d711dec.bundle/Contents/GoP-0.xml) Apr 19 08:48:46 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 2367467520 on dev /dev/sdb1, physical 199012352, root 5, inode 812538915, offset 20480, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Media/localhost/8/4b1112dba0e382f5a87080425e1a7ac0d711dec.bundle/Contents/GoP-0.xml) Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 335783, rd 191311, flush 1, corrupt 29969, gen 0 Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 2367467520 on dev /dev/sdc1 Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): unable to fixup (regular) error at logical 2367467520 on dev /dev/sdb1 Apr 19 08:48:46 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 2367471616 on dev /dev/sdc1, physical 1293729792, root 5, inode 812538915, offset 24576, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Media/localhost/8/4b1112dba0e382f5a87080425e1a7ac0d711dec.bundle/Contents/GoP-0.xml) Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 335783, rd 191311, flush 1, corrupt 29970, gen 0 Apr 19 08:48:46 Storage kernel: BTRFS warning (device sdc1): checksum error at logical 2367471616 on dev /dev/sdb1, physical 199016448, root 5, inode 812538915, offset 24576, length 4096, links 1 (path: appdata/PlexMediaServer/Library/Application Support/Plex Media Server/Media/localhost/8/4b1112dba0e382f5a87080425e1a7ac0d711dec.bundle/Contents/GoP-0.xml) Apr 19 08:48:46 Storage kernel: BTRFS error (device sdc1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 You should delete those files if possible, though don't know if it will affect Plex. Quote Link to comment
hermy65 Posted April 19 Author Share Posted April 19 @JorgeB deleted those files, re-ran a repair scrub and it came back clean. Also moved docker to ipvlan. Anything else that i need to do/watch for? Appreciate the help as always! Quote Link to comment
JorgeB Posted April 19 Share Posted April 19 I would recommend resetting the current stats and monitoring the pool for future issues, since it can be much harder to resolve a problem if a bad or dropped device goes undetected for some time. Quote Link to comment
hermy65 Posted April 20 Author Share Posted April 20 @JorgeB woke up this morning to some more BTRFS errors, ive attached a new set of diagnostics. storage-diagnostics-20240420-0758.zip Quote Link to comment
JorgeB Posted April 21 Share Posted April 21 Filesystem went read-only, this suggests that it still has other issues, rebooting should make it read/write again, then suggest backing up and re-creating the filesystem to make sure it doesn't happen again. Quote Link to comment
hermy65 Posted April 21 Author Share Posted April 21 @JorgeB does that mean I should like format the cache drives and start over or how does a person recreate the filesystem? Quote Link to comment
JorgeB Posted April 22 Share Posted April 22 18 hours ago, hermy65 said: or how does a person recreate the filesystem? You can format the pool using the GUI, change the filesystem to a different one and you can then format, of course you need to backup the pool first. Quote Link to comment
hermy65 Posted April 22 Author Share Posted April 22 @JorgeB thanks. Any reason to replace sdc1? Since that's the one that keeps having errors? Quote Link to comment
JorgeB Posted April 22 Share Posted April 22 I would replace/swap the cables, if not done yet, but for now don't see a reason to suspect a device problem. Quote Link to comment
hermy65 Posted April 23 Author Share Posted April 23 (edited) @JorgeB Got the cache pool re-formatted, etc. Just noticed some more BTRFS errors, attached are my diagnostics storage-diagnostics-20240423-1113.zip Also, not sure if related but after rebuilding my cache pool my VMs tab no longer works? It doesnt load any of the vms i had or let me create vms, just a blank page. Edited April 23 by hermy65 Quote Link to comment
JorgeB Posted April 23 Share Posted April 23 There appears to be a problem with the libvirt.img, which is kind of strange since it looks like it's new, also no idea what this is about: Apr 22 21:25:42 Storage root: initializing /etc/libvirt Apr 22 21:25:42 Storage kernel: loop3: detected capacity change from 0 to 6000 Apr 22 21:25:42 Storage kernel: EXT4-fs (loop3): mounted filesystem with ordered data mode. Quota mode: disabled. Apr 22 21:25:42 Storage kernel: ext4 filesystem being mounted at /etc/libvirt- supports timestamps until 2038 (0x7fffffff) Apr 22 21:25:42 Storage kernel: EXT4-fs (loop3): unmounting filesystem. Apr 22 21:25:42 Storage emhttpd: shcmd (1368): /etc/rc.d/rc.libvirt start Never seen libvirt trying to mount an ext4 loop device after mounting the btrfs loop device, no idea what's going on there, but if the image is new try deleting and recreating. Quote Link to comment
hermy65 Posted April 23 Author Share Posted April 23 @jorgeb Ok, i deleted libvrt.img and then stopped/started and the vm page works and one of my vms came back so i should be good with that one. Were the BTRFS errors all pointing at the libvirt issue or was that something else i need to take care of? Quote Link to comment
JorgeB Posted April 23 Share Posted April 23 14 minutes ago, hermy65 said: Were the BTRFS errors all pointing at the libvirt issue Yes. Quote Link to comment
hermy65 Posted April 23 Author Share Posted April 23 @jorgeb Thanks for all of the assistance here, hopefully I am finally past all of these problems!! 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.