Jump to content

JorgeB

Moderators
  • Posts

    67,696
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by JorgeB

  1. RAID mode is fine with Intel fakeRAID, it still uses the AHCI driver. I would run memtest, multiple filesystem corruption without an apparent reason could be the result of bad RAM.
  2. Because you're using a NetApp enclosure SMART report is pretty much useless, I believe there's a way of getting a correct report manually, but you'll need to search the forum since I never used one of those enclosures.
  3. Disk17 dropped offline so there's no SMART, check connection and post new diags after that but most likely disk is fine, disk15 on the other hand appears to be failing, and that could be a problem since you don't have dual parity.
  4. Unfortunately there's nothing relevant logged before the crash, this usually indicates a hardware problem, there are some unrelated ATA errors you should also check, likely a power/connection issue.
  5. Why not? You should be able to back up to the array, or for example to an unassigned device mounted with UD plugin.
  6. No, multpitath is for redundancy, not extra bandwaith, usually you connect two HBAs to the same expander for that.
  7. Like mentioned in the link above they are under "legacy products"
  8. It's not always easy to find errors with memtest when getting so few errors, since you have 4 DIMMs I would suggest removing a couple and run two correcting parity checks, if issue persists with both sets then it's likely not a RAM problem, note that the first check after the problem is fixed might still find errors, but next ones should always find 0.
  9. You should update the LSI firmware, replace cables on disk7 and post new diags.
  10. For this you just need to backup the cache pool, though you should already have backups of anything important.
  11. Not really, if the first time it was wrongly correctly due to a RAM bit flip, second time it would be corrected again to return to original state.
  12. New firmware wans't going to make the disks appear, it was just in case they dropped because of that, try power cycling the server (not just rebooting), if they don't come back online replace/swap cables on those disks.
  13. Better to re-post there, there's some danger with this forum when attempting to merge threads.
  14. No, they just show data corruption, bad RAM is the #1 reason for that, but there could be other reasons, or the problem is not being detetced by memtest, it's not always
  15. If it happens again grab diags before rebooting, for now you can swap cables/slot with another disk, to rule that out if it happens again to the same disk.
  16. You can monitor btrfs filesystem for corruption (and other) errors, if data corruption is found a scrub will list the affected files in the syslog. It would be possible if you knew corruption happened on the data device and parity wasn't corrected, like after actual bit rot, but most corruptions are caused by other factors, and from those not easy or possible to recover from parity, and since each array drive is a single device filesystem and doesn't have redundancy it also can't be fixed by btrfs, that's just one many reasons why backups of anything important are still needed.
  17. This, usually it's plug and play as far as the array is concerned, unless using RAID controllers, VMs might require some changes.
  18. Unraid only disables a drive after a write error, posting the diagnostics, if you didn't reboot yet, might give more clues.
  19. You need a cooler with narrow ILM LGA 2011 support, e.g.: https://noctua.at/en/nh-u12dx-i4 Or a cheaper but still good cooler: https://store.supermicro.com/4u-active-cpu-cooler-snk-p0050ap4.html
  20. Those type of errors are usually bad power/connection, what have you replaced so far?
  21. They are logged as disk errors, though they can be intermittent, like mentioned extended SMART test is what you should do.
×
×
  • Create New...