Jump to content

JorgeB

Moderators
  • Posts

    67,485
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by JorgeB

  1. I've seen some issues before with Ironwolf and LSI, there's even a firmware update for some models, but just for the 10TB IIRC, still would try connecting it to a different controller if possible to test.
  2. Try transferring directly to the array, if speeds are normal the problem is the NVMe device.
  3. If it was a driver problem it would most likely affect everyone using them, and that's not the case, I have multiple ConnectX-3 working at or close to line speed.
  4. Yes, or run a correcting check and any check after that should always find 0 errors.
  5. Logs is filled with errors like these for multiple devices: Jul 12 23:52:13 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:18 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jul 12 23:52:19 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:19 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jul 12 23:52:20 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:22 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 2 TIMES] ### Jul 12 23:52:22 Tower kernel: sd 7:0:1:0: Power-on or device reset occurred This suggest a power/connection problem on that HBA, check all cables and/or try a different PSU. After all those errors one of the cache devices ended up dropping offline: Jul 12 23:53:07 Tower kernel: sd 7:0:15:0: Power-on or device reset occurred Jul 12 23:53:07 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) Jul 12 23:53:08 Tower kernel: sd 7:0:16:0: Power-on or device reset occurred Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: device_unblock and setting to running, handle(0x001c) Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6827 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6827 CDB: opcode=0x28 28 00 0a 3c 3a a0 00 00 20 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 171719328 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6822 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6822 CDB: opcode=0x2a 2a 00 0d 21 8d c0 00 09 80 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 220302784 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6828 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 2, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6828 CDB: opcode=0x28 28 00 08 f6 5b 08 00 00 18 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 150362888 See here to help with the cache issue but you need to fix the HBA reset problems first.
  6. Syslog doesn't show the beginning of the problem, but it shows that one of your cache devices (sdm) dropped offline: Jul 11 04:40:37 Tower kernel: BTRFS error (device sdk1): bdev /dev/sdm1 errs: wr 279461563, rd 207240495, flush 384145, corrupt 0, gen 0 And by the number of errors it happened some time ago or multiple times, see here for more info.
  7. On the diags posted I only see one LSI and it's detecting 8 disks, are those diags taken with the problem configuration?
  8. Just to be clear, I was referring to when trim support is added to SSDs on the array, any device without deterministic trim support would still work, just probably wouldn't be trimmed.
  9. Not without the full diagnostics: Tools -> Diagnostics, please.
  10. The instructions should have the unmount command, if not reboot.
  11. You'd need to copy with the --sparse-always flag, assuming the NAS filesystem supports that. QCOW2 are compressed images, so they compress the zeros, i.e., all the blank space.
  12. Like Jonathanm mentioned fakes don't usually have this: They also look different (left is fake)
  13. When a drive gets disable you usually replace it with a new one or rebuild on top in case it looks like a connection issue, in either case there's no need to remove it from the array, all data will still be emulated by parity and accessible during the rebuild.
  14. Changed Status to Closed Changed Priority to Other
  15. You need to check filesystem on disk10, but note that a new config it's not usually the way to fix a disable disk, though in this case it should be OK since the disk looks healthy.
  16. It is, but like mentioned it's what a recent xfs fs requires, mostly for reflink support.
  17. Nothing obvious, I would recommend updating to latest stable, if still issues try booting in safe mode with everything off and then if OK start enabling dockers/plugins/services one at a time.
  18. That's the allocated size, not the actually used space, you can check with ls: root@Tower1:~# ls -lskh /mnt/cache/VMs/Win10/ total 27G 27G -rw-rw-rw- 1 root users 60G Jul 13 15:52 vdisk1.img 27GB used, 60GB allocated
  19. Repartitioning any device requires to a complete wipe.
  20. That's really up to you, also note that redundancy not the same as backup, I have multiple pools and they are almost all redundant, still do regular backups of anything important (usually to another pool on a different server). Again it really depends on your use case, I use them mostly to keep things separate and also for when a pool is being heavily used it won't affect performance of the others. Good chance if it's really a failing disk, some intermittent errors where disks drop offline and came back online can cause issues with btrfs pools, if it fails you can just replace the disk, if there are more serious issues restore from backup.
  21. Mover will continue to work normally once it gest fixed.
  22. If everything is backed up yes. File system corruption will cause that, they will comeback once it's fixed, might require a reboot.
×
×
  • Create New...