Jump to content

JorgeB

Moderators
  • Posts

    67,406
  • Joined

  • Last visited

  • Days Won

    705

Everything posted by JorgeB

  1. The errors you're sing now is because both devices are online and one of them has old data that is being corrected as it's being read, run a scrub and check that there are no uncorrectable errors, also see here for better pool monitoring.
  2. Very bad idea to use USB drives for the array, both for reliability and performance reasons.
  3. Cache device dropped offline: Feb 17 11:21:47 Tower kernel: ata8: COMRESET failed (errno=-16) Feb 17 11:21:47 Tower kernel: ata8: limiting SATA link speed to 3.0 Gbps Feb 17 11:21:47 Tower kernel: ata8: hard resetting link Feb 17 11:21:52 Tower kernel: ata8: COMRESET failed (errno=-16) Feb 17 11:21:52 Tower kernel: ata8: reset failed, giving up Feb 17 11:21:52 Tower kernel: ata8.00: disabled You should never use the 4 Marvell ports on that board, first 4 white SATA ports, they are known to constantly drop drives.
  4. Please post the diagnostics: Tools -> Diagnostics
  5. That's not a good sign, and I was afraid of that since like mentioned in the invalid slot instructions post, the disk was already unmountable before the first rebuild attempt because no superblock was found, which suggest parity wasn't 100% valid, still wait for xfs_repair to finish searching the disk for a valid superblock, but I wouldn't keep my hopes up.
  6. Stop the array, click on disk8 and change filesystem from auto to xfs, that should do it, if it doesn't report back and I'll post instructions to run it from the CLI.
  7. Yep, but if there are vdisks they should be copied with the --sparse=always flag to remain sparse, especially when copying back to cache.
  8. Since the filesystem has gone read-only those errors are normal, just make sure everything important was copied to the array, the pool will need to be re-formatted.
  9. Enable mover logging and post a syslog after it runs.
  10. One of the cache devices (cache2) has issues, likely it's dropping offline: Feb 24 08:52:03 chenbro-svr kernel: BTRFS info (device sdr1): bdev /dev/sds1 errs: wr 3714141, rd 2077762, flush 152140, corrupt 0, gen 0 See here for more info. For the mover you need to disable docker and VM services for those files to move, note that if any of the files exists on the array it won't overwrite them, but no need to worry about the docker image, delete all and then re-create.
  11. First call trace is macvlan related, that usually happens when using dockers with custom IP addresses, after that there are out of memory errors, so also check you RAM allocation/usage.
  12. Delete or rename network.cfg and network-rules.cfg, both on the flash drive config folder, then reboot and reconfigure the network.
  13. GPT support was added on late 4.x releases, all v5 and v6 support it.
  14. These look like a connection/power issue, see if those disks share a power cable/splitter, or if possible try with a different PSU.
  15. It won't, you'll need to run a filesystem check, remove -n or nothing will be done and if it asks for -L use it.
  16. There is, but the drive is not suspect, like mentioned it's not a disk problem, if you still want to do it look for the unbalance plugin, then see here to shrink the array.
  17. You need to reset the errors (it's in the link above): btrfs dev stats -z /mnt/cache Also you still have dual data profiles, convert with: btrfs balance start -dconvert=raid1 /mnt/cache Then and as long as the scrub didn't find any uncorrectable errors you're fine now.
  18. Yes, midnight commander won't keep them sparse, you can use cp: cp --sparse=always /path/to/source/vdisk.img /path/to/dest/vdisk.img
  19. There are problems writing to one of the cache devices, cache2, this is a hardware issue: Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme1n1p1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Feb 23 22:48:58 Tower kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme1n1p1
  20. Cache2 dropped offline, and your cache pool is not redundant: Feb 24 14:18:04 Tower kernel: ata3: hard resetting link Feb 24 14:18:10 Tower kernel: ata3: link is slow to respond, please be patient (ready=0) Feb 24 14:18:39 Tower kernel: ata3: COMRESET failed (errno=-16) Feb 24 14:18:39 Tower kernel: ata3: limiting SATA link speed to 3.0 Gbps Feb 24 14:18:39 Tower kernel: ata3: hard resetting link Feb 24 14:18:44 Tower kernel: ata3: COMRESET failed (errno=-16) Feb 24 14:18:44 Tower kernel: ata3: reset failed, giving up Feb 24 14:18:44 Tower kernel: ata3.00: disabled Check cables to see if it comes back online and then run a scrub. Also, even if you want to use the data in single profile you should make the metadata raid1, and see here for better pool monitoring.
  21. Pool was likely created on v6.7.x and because of a bug it's not redundant, metadata isn't raid1, though there are other issues since data is using two profiles, single and raid1. Start by running a scrub, if all errors are corrected balance the pool to raid1 (both data and metadata), if the corruption is on single data profile or metadata it can't be fixed, you'll need to delete the affected files (if it's data) or the entire pool (if it's the metadata).
  22. Thanks for reporting back, you can tag it solved.
×
×
  • Create New...