Jump to content

JorgeB

Moderators
  • Posts

    67,492
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by JorgeB

  1. Because of this: Jul 14 12:58:04 Interocitor kernel: Command line: BOOT_IMAGE=/bzimage xen-pciback.hide=(07:00.0)(08:00.0)(09:00.0)(0a:00.0) initrd=/bzroot
  2. Run a scrub, any corrupt files will be identified on the syslog during it.
  3. Not directly, but anyone can use it if advantageous, I use a lot of reflinking on my btrfs pools.
  4. Completely understand, can't blame a guy for asking That is indeed what I plan to use as soon as everything is working correctly with the multiple pools. P.S. I sent some beer money your way yesterday to show my appreciation for all the work you have done and continue to do on this, and encourage anyone who relies on UD to do the same.
  5. Can't download the diags, please attach again on a new post.
  6. Going to move this to the KVM forum, might get more help there.
  7. I've seen some issues before with Ironwolf and LSI, there's even a firmware update for some models, but just for the 10TB IIRC, still would try connecting it to a different controller if possible to test.
  8. Try transferring directly to the array, if speeds are normal the problem is the NVMe device.
  9. If it was a driver problem it would most likely affect everyone using them, and that's not the case, I have multiple ConnectX-3 working at or close to line speed.
  10. Yes, or run a correcting check and any check after that should always find 0 errors.
  11. Logs is filled with errors like these for multiple devices: Jul 12 23:52:13 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:18 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jul 12 23:52:19 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:19 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 1 TIMES] ### Jul 12 23:52:20 Tower kernel: sd 7:0:18:0: Power-on or device reset occurred Jul 12 23:52:22 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) ### [PREVIOUS LINE REPEATED 2 TIMES] ### Jul 12 23:52:22 Tower kernel: sd 7:0:1:0: Power-on or device reset occurred This suggest a power/connection problem on that HBA, check all cables and/or try a different PSU. After all those errors one of the cache devices ended up dropping offline: Jul 12 23:53:07 Tower kernel: sd 7:0:15:0: Power-on or device reset occurred Jul 12 23:53:07 Tower kernel: mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) Jul 12 23:53:08 Tower kernel: sd 7:0:16:0: Power-on or device reset occurred Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: device_unblock and setting to running, handle(0x001c) Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6827 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6827 CDB: opcode=0x28 28 00 0a 3c 3a a0 00 00 20 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 171719328 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6822 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6822 CDB: opcode=0x2a 2a 00 0d 21 8d c0 00 09 80 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 220302784 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6828 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Jul 12 23:53:08 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdj1 errs: wr 2, rd 1, flush 0, corrupt 0, gen 0 Jul 12 23:53:08 Tower kernel: sd 7:0:20:0: [sdj] tag#6828 CDB: opcode=0x28 28 00 08 f6 5b 08 00 00 18 00 Jul 12 23:53:08 Tower kernel: print_req_error: I/O error, dev sdj, sector 150362888 See here to help with the cache issue but you need to fix the HBA reset problems first.
  12. Syslog doesn't show the beginning of the problem, but it shows that one of your cache devices (sdm) dropped offline: Jul 11 04:40:37 Tower kernel: BTRFS error (device sdk1): bdev /dev/sdm1 errs: wr 279461563, rd 207240495, flush 384145, corrupt 0, gen 0 And by the number of errors it happened some time ago or multiple times, see here for more info.
  13. On the diags posted I only see one LSI and it's detecting 8 disks, are those diags taken with the problem configuration?
  14. Just to be clear, I was referring to when trim support is added to SSDs on the array, any device without deterministic trim support would still work, just probably wouldn't be trimmed.
  15. Not without the full diagnostics: Tools -> Diagnostics, please.
  16. The instructions should have the unmount command, if not reboot.
  17. You'd need to copy with the --sparse-always flag, assuming the NAS filesystem supports that. QCOW2 are compressed images, so they compress the zeros, i.e., all the blank space.
  18. Like Jonathanm mentioned fakes don't usually have this: They also look different (left is fake)
  19. When a drive gets disable you usually replace it with a new one or rebuild on top in case it looks like a connection issue, in either case there's no need to remove it from the array, all data will still be emulated by parity and accessible during the rebuild.
  20. Changed Status to Closed Changed Priority to Other
  21. You need to check filesystem on disk10, but note that a new config it's not usually the way to fix a disable disk, though in this case it should be OK since the disk looks healthy.
×
×
  • Create New...