Jump to content

JorgeB

Moderators
  • Posts

    67,654
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by JorgeB

  1. If that's correct then it's a good option.
  2. You should use a HBA, see here for some recommendations:
  3. That looks like a MegaRAID clone, it might not be flashable to IT mode, but no personal experience with that model, you can google it.
  4. No good reason to not use the recommended ports, unless there was some cable length/management related issue, but any port should work.
  5. Logged call traces look more like hardware related, not the macvlan/nf_nat 6.9.x related crashes.
  6. Do you mean you stopped the server and it rebooted on it's own? If yes that's a hardware problem, I do see some hardware issues logged: Jun 9 02:27:34 Vault001 kernel: mce: [Hardware Error]: Machine check events logged Jun 9 02:27:34 Vault001 kernel: [Hardware Error]: Corrected error, no action required. Jun 9 02:27:34 Vault001 kernel: [Hardware Error]: CPU:0 (17:71:0) MC27_STATUS[-|CE|MiscV|-|-|-|SyndV|-|-|-]: 0x982000000002080b Jun 9 02:27:34 Vault001 kernel: [Hardware Error]: IPID: 0x0001002e00000500, Syndrome: 0x000000005a020001 Jun 9 02:27:34 Vault001 kernel: [Hardware Error]: Power, Interrupts, etc. Ext. Error Code: 2, Link Error. Jun 9 02:27:34 Vault001 kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: GEN, part-proc: SRC (no timeout) Looks like they are CPU cache related.
  7. Assuming you copied the data directly to the disk, i.e., without using a specif folder or share and it's now together with the data that already existed on that disk I would remove the disk from the array, then run rsync again but now from an unassign disk to the array, you'll need to do it multiple times if there are multiple shares on source, but it will start where it left of, i.e., just copy whatever is missing, and since the array is now the dest no issues with space.
  8. Try deleting/renaming vfio-pci.cfg from flash drive to see if it boots.
  9. If you installed mcelog any more errors should be logged in the syslog.
  10. Corruption was detected on the cache filesystem and it went read-only: Jun 8 07:29:47 tower1 kernel: BTRFS error (device dm-3): block=1041472405504 write time tree block corruption detected Jun 8 07:29:47 tower1 kernel: BTRFS: error (device dm-3) in btrfs_commit_transaction:2377: errno=-5 IO failure (Error while writing out transaction) Jun 8 07:29:47 tower1 kernel: BTRFS info (device dm-3): forced readonly You should backup and restore data, docker image should then be recreated.
  11. There are read errors with multiple disks, first thing to do is to update the LSI firmware, all p20 releases except latest one (20.00.07.00) have known issues.
  12. Reboot and post new diags as soon as it happens again.
  13. NVMe device dropped, this can sometimes help with that: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference, also good idea to look for a board BIOS update.
  14. Diags after rebooting are not much help for this, you can try enabling syslog mirror to flash then post that log after a crash.
  15. Than you likely have some plugin or other issue keeping the drives from spinning down, note that a frequent low number of reads is normal when the drives are up to get SMART/temp, but these shouldn't prevent them from spinning down.
  16. That's a good option and that model originally was IT mode only, so no flashing needed unless someone converted it to an IR model. Should be a good combo, and good for roughly 220MB/s with 20 drives on the expander. Yep.
  17. Yep, that does it: ndk_sh B t u v w x y z aa /dev/sdt: 434.79 MB/s /dev/sdu: 434.79 MB/s /dev/sdv: 433.82 MB/s /dev/sdw: 435.06 MB/s /dev/sdx: 434.84 MB/s /dev/sdy: 434.54 MB/s /dev/sdz: 434.61 MB/s /dev/sdaa: 434.80 MB/s Total = 3477.25 MB/s And results were more consistent with multiple runs.
  18. Unfortunately parity can't help with filesystem corruption, it would rebuild the disks as they are now, but it's not normal to have severe corruption in two disks at the same time, you might have a hardware issue, like bad RAM.
  19. There are some known issues with v6.9.2 and spin down, you can try downgrading to v6.9.1 and see if it helps.
  20. Usually any expander port can be used as in or out, it just depends on what they are connected to, it's like that with the older HP 6G expander.
  21. Those diags were not created by an unclean shutdown, next time one happens check the logs folder for the diags with matching time.
  22. Are those diags after a crash? Don't see anything out of the ordinary logged.
  23. Yes, just stop the array, unassign parity2, start array, after that it can be re-used as a replacement.
×
×
  • Create New...