Jump to content

JorgeB

Moderators
  • Posts

    67,704
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. This is for bug reports only please, see below how to install the test branch in the other thread you posted. https://forums.unraid.net/topic/113904-tpm-for-kvm-please/?do=findComment&comment=1047771
  2. Those call traces are usually the result of having dockers with a custom IP address, upgrading to v6.10 and switching to ipvlan might fix it (Settings -> Docker Settings -> Docker custom network type -> ipvlan (advanced view must be enable, top right)), or see below for more info. https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/
  3. Run a scrub and delete/replace any corrupt files, they will be listed in the syslog, then reset btrfs stats and see if there's no more corruption detected, if yes run memtest.
  4. RAM is above specs, check link above, it's known to corrupt data with some AMD hardware.
  5. Run a scrub, any corrupt files will be listed in the syslog, delete those, but note that with bad RAM some files might have gotten corrupt during writes and the checksum will match that, so there could still be undetected corruption, same for any array xfs data written while the RAM was failing.
  6. The diags before rebooting, or at least get the syslog if it happens again.
  7. Btrfs is detecting data corruption, this is usually bad RAM, or since you're using Ryzen, RAM running above officially supported speed, since you didn't post the complete diagnostics we can't check if it's within specs or not.
  8. Unrelated, but it's a known issue with LSI HBAs and the firmware you're using: FWVersion(20.00.00.00) Update to latest (20.00.07.00)
  9. I'm not familiar with that specific model but the usual problem for raid cards in JBOD mode is you'll to do a new config if you change controllers, also some create non standard partitions resulting in unmoutable disks if the controller is changed.
  10. Ideally 24 hours, but if there's a findable problem it will usually take a couple of hours at most.
  11. Memtest only works with CSM/legacy boot, it won't with UEFI boot.
  12. Btrfs is detecting data corruption, you should start by running memtest.
  13. Flash drive is likely dropping offline, see if you can get the diagnostics, also make sure it's using a USB 2.0 port, not 3.0.
  14. Where do you see this error? Screenshot please.
  15. Diags you already posted, in this case it's mostly to see the hardware used. Just before the crash there are some issues with the NVMe device, though not sure these caused it: Oct 19 13:50:55 Tower kernel: nvme nvme0: frozen state error detected, reset controller Oct 19 13:50:56 Tower kernel: pcieport 0000:00:03.0: AER: Root Port link has been reset Oct 19 13:50:56 Tower kernel: pcieport 0000:00:03.0: AER: device recovery successful Looks like the board doesn't have an M.2 slot, so try changing the NVMe adapter to a different PCIe slot, to see if it doesn't generate errors like these (and the above): Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID) Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: device [8086:3c08] error status/mask=00003101/00002000 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: [ 0] RxErr Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: [ 8] Rollover Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: [12] Timeout Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: can't find device of ID0018 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: can't find device of ID0018 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: can't find device of ID0018 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Corrected error received: 0000:00:03.0 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: can't find device of ID0018 Oct 19 13:50:55 Tower kernel: pcieport 0000:00:03.0: AER: Multiple Uncorrected (Fatal) error received: 0000:00:03.0
  16. Try either or both, and not just the power cable, could also be a PSU issue like mentioned, certainly it's not something we can help remotely, there's clearly some hardware issue, or very bad luck with the disks.
  17. For some reason the user started a new thread, continued there:
  18. Already mentioned in your other thread that's there's a problem with the cache device, try a different one.
×
×
  • Create New...