March 31Mar 31 Hi, I removed one cache drive as it was reporting errors. I tried to add two drives to replace it, but instead of doing so, it seems like there's some unhappy state the pool is stuck in.I can't start the array with them, I can't remove the drives, setting them to 'no device' pops up the same 'Wrong Pool State, cache - invalid expansion' message. kuutio-diagnostics-20260331-2013.zip
March 31Mar 31 Author Here you go: root@kuutio:~# btrfs fi show warning, device 1 is missing Label: none uuid: 88139840-0e20-48a6-b8dc-b69dcea7bed9 Total devices 3 FS bytes used 196.12GiB devid 2 size 894.25GiB used 73.00GiB path /dev/sdg1 devid 3 size 953.87GiB used 132.03GiB path /dev/nvme0n1p1 *** Some devices missingsidenote: isn't that in btrfs-usage.txt?
April 1Apr 1 Community Expert It is, but that's not typically the case, so I didn't check.Try reimporting the pool with just those two devices, assuming it was redudndant:on main click on the first device for that pool and then "remove pool"back on main, create a new pool with the same name and 2 slotsassign just those two devices, leave the filesystem set to autostart the array to import the pool and post new diags.
April 1Apr 1 Author kuutio-diagnostics-20260401-1249.zipedit: Huh, message disappeared. That let the array start. Edited April 1Apr 1 by korhojoa
April 1Apr 1 Community Expert With the array running, typebtrfs dev remove missing /mnt/cacheThen stop the array and reimport the tool once more with the two devices. After that, you can leave the pool as is with those two devices or add/replace them.
April 1Apr 1 Author kuutio-diagnostics-20260401-1650.zipI'm attaching one more diagnostics archive, just because while the btrfs dev remove missing /mnt/cache was running (close to finishing, it exited soon after, exit code 0 though), I was looking at the syslog, and started seeing link drops, timeouts and btrfs errors from one of the cache drives. Am I still OK to reimport (remove, recreate with same name and devices)?
April 1Apr 1 Community Expert 2 hours ago, korhojoa said:Am I still OK to reimport (remove, recreate with same name and devices)?Yep, missing device is gone.
April 1Apr 1 Author Okay, tried it.kuutio-diagnostics-20260401-1919.zipThe cache pool now says (after reimport + start array).Unmountable: unsupported or no filesystem
April 1Apr 1 Community Expert sdg dropped offline, check/replace its cables and post new diags after array start:Apr 1 16:45:51 kuutio kernel: ata5: softreset failed (device not ready)Apr 1 16:45:51 kuutio kernel: ata5: hard resetting linkApr 1 16:45:57 kuutio kernel: ata5: link is slow to respond, please be patient (ready=0)Apr 1 16:46:02 kuutio kernel: ata5: softreset failed (device not ready)Apr 1 16:46:02 kuutio kernel: ata5: hard resetting linkApr 1 16:46:07 kuutio kernel: ata5: link is slow to respond, please be patient (ready=0)### [PREVIOUS LINE REPEATED 1 TIMES] ###Apr 1 16:46:37 kuutio kernel: ata5: softreset failed (device not ready)Apr 1 16:46:37 kuutio kernel: ata5: limiting SATA link speed to 1.5 GbpsApr 1 16:46:37 kuutio kernel: ata5: hard resetting linkApr 1 16:46:42 kuutio kernel: ata5: softreset failed (device not ready)Apr 1 16:46:42 kuutio kernel: ata5: softreset failedApr 1 16:46:42 kuutio kernel: ata5: reset failed, giving upApr 1 16:46:42 kuutio kernel: ata5.00: disable deviceApr 1 16:46:42 kuutio kernel: ata5: EH complete
April 2Apr 2 Author Checked the cables, sata cable was slightly unplugged, even though using a cable with locking connector. Here you go. kuutio-diagnostics-20260402-1650.zip
April 2Apr 2 Author Scrub complete: UUID: 88139840-0e20-48a6-b8dc-b69dcea7bed9 Scrub started: Thu Apr 2 18:42:40 2026 Status: finished Duration: 0:13:48 Total to scrub: 392.35GiB Rate: 485.23MiB/s Error summary: verify=93024 csum=791170 Corrected: 884194 Uncorrectable: 0 Unverified: 0
April 2Apr 2 Community Expert All errors were corrected, so that's good. Recommend resetting the stats and monitoring for any further issues.
April 3Apr 3 Author kuutio-diagnostics-20260403-1329.zipedit: I don't know what happens to the messages I type.It looks like the scrub operation fixed the 'errors' by writing the corrupt data from 'sdg1' (the disk with the loose cable) to 'nvme0n1p1', essentially corrupting the valid data.VM disks and docker image are corrupt, so they won't run. Is there a way to get back at least that data from the drive I removed from the pool earlier (Samsung_SSD_870_QVO_2TB_S5RPNF0T615547D)? Edited April 3Apr 3 by korhojoa
April 3Apr 3 Author After attaching a file, it looks like typing more into the input, it seems to lose the text sometimes.I edited in the text soon after posting:It looks like the scrub operation fixed the 'errors' by writing the corrupt data from 'sdg1' (the disk with the loose cable) to 'nvme0n1p1', essentially corrupting the valid data.VM disks and docker image are corrupt, so they won't run. Is there a way to get back at least that data from the drive I removed from the pool earlier (Samsung_SSD_870_QVO_2TB_S5RPNF0T615547D)?
April 3Apr 3 Community Expert 2 hours ago, korhojoa said:It looks like the scrub operation fixed the 'errors' by writing the corrupt data from 'sdg1' (the disk with the loose cable) to 'nvme0n1p1', essentially corrupting the valid data.btrfs always knows which device has the correct data, based on the checksum and transid, and it will use that to update the stale device; it cannot do it on the wrong device.2 hours ago, korhojoa said:VM disks and docker image are corrupt, so they won't run.If these are corrupt, either there were other issues, or the domains share is set to NOCOW, which disables the checksums, and in that case btrfs cannot correct that data, but if you try to access it it will use any of the disks to read it, basically randomly, so it can get stale data.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.