Jump to content

JorgeB

Moderators
  • Posts

    67,771
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. Not familiar with that particular model but zeros are highly compressible, some SSDs are much faster when writing only those, if you have decent non SMR performing disks in the array, enable turbo write and test writing directly to the array, with modern disks you can sustain around 200MB/s, if that's the case it will confirm a device issue.
  2. Pending sectors can't usually be ignored, unless they are false positives and they don't appear to be since the SMART test failed, you can do a full disk write to see if they return to zero and don't show up again soon after.
  3. Forgot to mention, since disks errors are much more likely on reads, if the sync finish successfully then run a non-correcting parity check (or an extended SMART test on both parity drives).
  4. That suggests everything is OK with the LAN but the device can't keep up, what model NVMe?
  5. Diags are after rebooting so we can't see what happened, but parity1 does appear to be failing, parity2 looks fine, wait for the sync to finish and post new diags if there are errors.
  6. That was for a HP Microserver that usually come with write cache disable, regular desktop boards usually don't have that setting.
  7. Checksum errors mean btrfs is detecting data corruption: Mar 7 16:08:38 Anton kernel: BTRFS info (device sdc1): bdev /dev/sdc1 errs: wr 0, rd 0, flush 0, corrupt 288, gen 0 Mar 7 16:08:38 Anton kernel: BTRFS info (device sdc1): bdev /dev/sdb1 errs: wr 0, rd 0, flush 0, corrupt 818, gen 0 Mar 7 16:08:38 Anton kernel: BTRFS info (device sdc1): bdev /dev/sde1 errs: wr 0, rd 0, flush 0, corrupt 910, gen 0 Mar 7 16:08:38 Anton kernel: BTRFS info (device sdc1): bdev /dev/sdaf1 errs: wr 0, rd 0, flush 0, corrupt 796, gen 0 on all four devices, since you're using ECC RAM unlikely that that is the problem, other pool is also fine, suggesting the issue might be device related or some other thing, run a scrub, delete/restore any corrupt files listed in the syslog, then reset the stats and keep monitoring the pool.
  8. Diags would be better but screenshot suggests drives were formatted non-encrypted, assuming there's no data there insert the key you want to use, start the array and re-format the drives.
  9. Depends on the problem, if it doesn't appear in the VM page you can restore libvirt.img assuming there's a backup, if not re-create the VM and point to the same vdisk.
  10. You need to adjust mover settings or increase capacity to avoid the pool getting completely full.
  11. Powering off is usually hardware related and most likely there won't be anything logged, but you can enable the syslog server and post that after the next time. In the meantime check temps and that fans are working, coould also be a PSU/UPS issue.
  12. Docker is not working because the pool is completely full: Free (statfs, df): 260.00KiB Just move/delete some data and restart the service.
  13. After you fix the RAM issue run a scrub on the pools, any corrupt files will be listed in the syslog, delete those or replace them from backups.
  14. Moving the data seems unnecessary to me, though should already have backups of anything important, I would just rebuild, you can also use a spare disk if you have and keep the old one intact in case something goes wrong. New permissions won't hurt, as long as there's no appdata on those disks, if there is run the docker safe new permission from the FCP plugin.
  15. Disk looks OK, looks more like a connection/power problem, also a good idea to update the LSI firmware since it's very old, after that check/replace cables and rebuild on top.
  16. Main difference is that the 9206 and 9207 are PCIe 3.0, others are 2.0, either one will work for what you want, just make sure it's using latest IT mode firmware.
  17. Click on the current mountpoint name.
  18. Multiple disks dropped offline at the same time (then reconnected), this suggests a power/connection problem, see if they share anything in common, then reboot and post new diags after array start.
  19. That's good news, I assume you mean you don't get the xfs_growfs error and/or the experimental warning, but did you get it with rc2? Still not sure if the error always displays with any 1412TB or larger disk and rc2, but from the diags I've seen so far looks like it does.
  20. Corrupted file on cache was likely the result of the same RAM issue, if that's fixed there shouldn't be anymore.
  21. You just need to copy anything important to the array or other pool/unassigned device, you can use your favorite tool for that, e.g. midnight commander (mc on the console), when that's done stop the array and wipe the pool devices with: blkdiscard /dev/sdX then start the array and you'll have the option to format the pool.
  22. In my mind the main question is why did the "xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device" started to appear with 1412TB or larger drives, I would bet that if that error went way so would the experimental warning.
  23. For rc2 it's 5.13.00, maybe despite xfs_growfs being part of xfsprogs the warning after the error comes from the kernel? Just a wild guess...
×
×
  • Create New...