Jump to content

JorgeB

Moderators
  • Posts

    67,616
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by JorgeB

  1. The only risk is if you assign a data disk to a parity slot, and if you were using single parity it will still remain valid even if the data disks are in different slots.
  2. You can't change the UUID because both devices are still part of the same pool, do this: Stop the array, if Docker/VM services are using the cache pool disable them, unassign all pool devices (from both pools), start array to make Unraid "forget" current pool config, stop array, reassign both devices to the same pool (there can't be an "All existing data on this device will be OVERWRITTEN when array is Started" warning for any pool device), start array, pool should mount normally and you can start docker/VM services if stopped before, now see here to remove one of the devices from the pool, when it's done you can re-add it to a different pool (it will need to be formatted).
  3. This is a problem with the onboard SATA controller and quite common with some Ryzen boards, look for a BIOS update, if that doesn't help best be other then using different hardware is to use an add-on controller, but this doesn't explain the freezing.
  4. Diags only show one parity check, and it was non correcting, reboot the server to clear the logs, disable mover logging, run a correcting check then a non correcting one without rebooting, and post new diags after it's done.
  5. See if this applies to you: https://forums.unraid.net/topic/70529-650-call-traces-when-assigning-ip-address-to-docker-containers/ See also here: https://forums.unraid.net/bug-reports/stable-releases/690691-kernel-panic-due-to-netfilter-nf_nat_setup_info-docker-static-ip-macvlan-r1356/
  6. Next time please post the complete diags, they would include a SMART report for disk9, though it most likely dropped offline, in that case it should come back up after a power cycle (not just a reboot). You're using a SASLP controller and those are known to drop disks without a reason, so disk9 could be OK, parity looks mostly fine, disk3 is showing a pending sector, but WD disks are known to show false positives, an extended SMART test will confirm if disk is good or not.
  7. May 1 19:40:30 Tower kernel: sky2 0000:06:00.0: error interrupt status=0x80000000 May 1 19:40:30 Tower kernel: sky2 0000:06:00.0: PCI hardware error (0x2010) Looks like a NIC problem, I would try a different one, Intel preferred. Also lots of ATA errors: Apr 30 02:22:04 Tower kernel: ata5.00: exception Emask 0x10 SAct 0xf000 SErr 0x190002 action 0xe frozen Apr 30 02:22:04 Tower kernel: ata5.00: irq_stat 0x80400000, PHY RDY changed Apr 30 02:22:04 Tower kernel: ata5: SError: { RecovComm PHYRdyChg 10B8B Dispar } Apr 30 02:22:04 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED Apr 30 02:22:04 Tower kernel: ata5.00: cmd 60/20:60:c0:b5:6f/00:00:83:00:00/40 tag 12 ncq dma 16384 in Apr 30 02:22:04 Tower kernel: res 40/00:70:40:b6:6f/00:00:83:00:00/40 Emask 0x10 (ATA bus error) Apr 30 02:22:04 Tower kernel: ata5.00: status: { DRDY } Apr 30 02:22:04 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED This is parity, could be a cable issue, could be the Marvell controller. Apr 30 02:52:30 Tower kernel: ata8.01: status: { DRDY } Apr 30 02:52:30 Tower kernel: ata8.01: failed command: WRITE FPDMA QUEUED Apr 30 02:52:30 Tower kernel: ata8.01: cmd 61/20:70:20:9d:76/00:00:83:00:00/40 tag 14 ncq dma 16384 out Apr 30 02:52:30 Tower kernel: res 40/00:90:80:91:78/00:00:83:00:00/40 Emask 0x4 (timeout) Apr 30 02:52:30 Tower kernel: ata8.01: status: { DRDY } Apr 30 02:52:30 Tower kernel: ata8.01: failed command: WRITE FPDMA QUEUED Apr 30 02:52:30 Tower kernel: ata8.01: cmd 61/20:78:e0:9d:76/00:00:83:00:00/40 tag 15 ncq dma 16384 out Apr 30 02:52:30 Tower kernel: res 40/00:90:80:91:78/00:00:83:00:00/40 Emask 0x4 (timeout) Apr 30 02:52:30 Tower kernel: ata8.01: status: { DRDY } Apr 30 02:52:30 Tower kernel: ata8.01: failed command: WRITE FPDMA QUEUED Apr 30 02:52:30 Tower kernel: ata8.01: cmd 61/20:80:80:ab:76/00:00:83:00:00/40 tag 16 ncq dma 16384 out Apr 30 02:52:30 Tower kernel: res 40/00:90:80:91:78/00:00:83:00:00/40 Emask 0x4 (timeout) This is disk2, same Marvell controller but with a SATA port multiplier, could also be a cable issue but Marvell controllers and port multipliers should not be used with Unraid. Intel SATA controller is set to IDE mode and only has one disk connected, set it to AHCI and use those ports before any add-on controller.
  8. shareUserInclude="disk1,disk2,disk3,disk4,disk5,disk6" You're not including disk7 in the user shares, you should select include="all", so any disk you add it the future is automatically part of the users shares.
  9. Backup and recreate the flash drive, from the backup restore just your key and the disk assignments (super.dat), then reconfigure the server or restore the rest of the config files one a a time.
  10. Try using a share for the image without a space in the name.
  11. This is usually a flash drive problem, try re-creating it.
  12. They are, but never heard of unexplained btrfs data corruption, there were never any parity sync errors detected since those files were first written?
  13. These are known to sometimes drop drives without a reason, best bet would be to replace them with LSI HBAs.
  14. Looks like a controller problem to me, the Intel SCU controller is not as robust as the normal Intel SATA controller, if it's an option use the regular SATA ports instead plus an additional controller for the remaining devices.
  15. The 6/10 port Asmedia controller you're using is in fact a 2 port controller with SATA port multipliers, they are known to be very problematic, you should replace it with one of the recommended controllers:
  16. Reboot and if cache mounts run a scrub, if uncorrectable errors are found check the syslog for the affected files, those need to be deleted or replaced, if it doesn't mount there are some recovery options here.
  17. Also see here, Ryzen with overclocked RAM is known to sometimes corrupt data, one of the ways this shows up in Unraid is in sync errors.
  18. Yep, don't see anything logged about that, one more thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
  19. I would wait so you can get the array to a protected state.
  20. Syslog rotated so can't see the beginning of the problem, reboot, run the server for a few minutes and post new diags.
  21. According to the log these are the only two devices with a valid btrfs filesystem, and a different one, so they weren't from a pool: Apr 29 21:00:11 Tower kernel: BTRFS: device fsid bc84f358-49ca-4bfe-abf3-eaed0c9d7c53 devid 1 transid 298261 /dev/sdn1 scanned by udevd (1843) Apr 29 21:00:11 Tower kernel: BTRFS: device fsid 402fa753-15e2-450f-a062-090043d62a5c devid 1 transid 142621 /dev/sdh1 scanned by udevd (1843) You assigned these 3: Apr 29 23:43:10 Tower emhttpd: import 30 cache device: (sdn) PERC_H800_00fed4e00dcc545625005f1746b0ad4b_36a4badb046175f00255654cc0de0d4fe Apr 29 23:43:10 Tower emhttpd: import 31 cache device: (sdu) PERC_H800_00f516110cc70000ff005f1746b0ad4b_36a4badb046175f00ff0000c70c1116f5 Apr 29 23:43:10 Tower emhttpd: import 32 cache device: (sdk) PERC_H800_001dff5c100b8d1d28005f1746b0ad4b_36a4badb046175f00281d8d0b105cff1d So the other two don't have a valid btrfs filesystem, you can try to mount sdn its own with UD, then post new diags if it fails.
  22. Then most likely you didn't follow the instructions correctly, it still works.
  23. https://forums.unraid.net/topic/51703-vm-faq/?do=findComment&comment=557606
  24. So went back to v6.3.5 and the behavior is the same, so looks like it was always like this, maybe I got confused with parity2 being checked (and corrected) during a disk rebuild, that still works, still seems strange parity1 being read and not checked during parity2 sync, but since it was always like that I'm not going to create a bug report, most likely wouldn't get an answer anyway.
×
×
  • Create New...