Jump to content

JorgeB

Moderators
  • Posts

    67,411
  • Joined

  • Last visited

  • Days Won

    705

Everything posted by JorgeB

  1. That's basically it, you can remove disk5 before or after the parity sync.
  2. It appears to be a smartctl issue, newer Unraid is using v7.1 as opposed to v7.0 before, and for some reason v7.1 is not getting those drives temps, but it's not an Unraid problem, it would need to be fixed on smartctl, and likely it will in an upcoming release.
  3. You could for example get and LSI 9305-24i, but it could be much cheaper to use an 8 port LSI with a SAS expander.
  4. The dual controllers in each enclosure are for redundancy/multipath and since Unraod doesn't support SAS multipath you basically have two connection options: 1) single cable from HBA to first controller on the 1st enclosure then daisy chain to 2nd encloure 2) and best option for performance, one cable from each HBA port to each enclosure
  5. IIRC those controllers have performance issues with Unraid, one of the recommended LSI HBAs would be preferred.
  6. You can replace the devices or ignore for now, but it's a hardware problem, unrelated to you config or Unraid.
  7. Both enclosures are being detected correctly: Mar 8 04:54:20 Tower kernel: mpt2sas_cm0: LSISAS2008: FWVersion(05.00.13.00), ChipRevision(0x03), BiosVersion(07.05.04.00) Mar 8 04:54:20 Tower kernel: mpt2sas_cm0: Protocol=( Mar 8 04:54:20 Tower kernel: Initiator Mar 8 04:54:20 Tower kernel: ,Target Mar 8 04:54:20 Tower kernel: ), Mar 8 04:54:20 Tower kernel: Capabilities=( Mar 8 04:54:20 Tower kernel: TLR Mar 8 04:54:20 Tower kernel: ,EEDP Mar 8 04:54:20 Tower kernel: ,Snapshot Buffer Mar 8 04:54:20 Tower kernel: ,Diag Trace Buffer Mar 8 04:54:20 Tower kernel: ,Task Set Full Mar 8 04:54:20 Tower kernel: ,NCQ Mar 8 04:54:20 Tower kernel: ) Mar 8 04:54:20 Tower kernel: scsi host1: Fusion MPT SAS Host Mar 8 04:54:20 Tower kernel: mpt2sas_cm0: sending port enable !! Mar 8 04:54:20 Tower kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0055d6fa0), phys(8) Mar 8 04:54:20 Tower kernel: mpt2sas_cm0: port enable: SUCCESS Mar 8 04:54:20 Tower kernel: scsi 1:0:0:0: Enclosure HP P2000 G3 SAS T200 PQ: 0 ANSI: 5 Mar 8 04:54:20 Tower kernel: scsi 1:0:0:0: SSP: handle(0x0009), sas_addr(0x500c0ff1233b4000), phy(4), device_name(0x0000000000000000) Mar 8 04:54:20 Tower kernel: scsi 1:0:0:0: enclosure logical id (0x500605b0055d6fa0), slot(4) Mar 8 04:54:20 Tower kernel: scsi 1:0:0:0: Power-on or device reset occurred Mar 8 04:54:20 Tower kernel: scsi 1:0:0:0: Attached scsi generic sg2 type 13 Mar 8 04:54:20 Tower kernel: mpt2sas_cm1: LSISAS2008: FWVersion(05.00.13.00), ChipRevision(0x03), BiosVersion(07.05.04.00) Mar 8 04:54:20 Tower kernel: mpt2sas_cm1: Protocol=( Mar 8 04:54:20 Tower kernel: Initiator Mar 8 04:54:20 Tower kernel: ,Target Mar 8 04:54:20 Tower kernel: ), Mar 8 04:54:20 Tower kernel: Capabilities=( Mar 8 04:54:20 Tower kernel: TLR Mar 8 04:54:20 Tower kernel: ,EEDP Mar 8 04:54:20 Tower kernel: ,Snapshot Buffer Mar 8 04:54:20 Tower kernel: ,Diag Trace Buffer Mar 8 04:54:20 Tower kernel: ,Task Set Full Mar 8 04:54:20 Tower kernel: ,NCQ Mar 8 04:54:20 Tower kernel: ) Mar 8 04:54:20 Tower kernel: scsi host3: Fusion MPT SAS Host Mar 8 04:54:20 Tower kernel: mpt2sas_cm1: sending port enable !! Mar 8 04:54:20 Tower kernel: mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b0029eafe0), phys(8) Mar 8 04:54:20 Tower kernel: mpt2sas_cm1: port enable: SUCCESS Mar 8 04:54:20 Tower kernel: scsi 3:0:0:0: Enclosure HP P2000 G3 SAS T200 PQ: 0 ANSI: 5 Mar 8 04:54:20 Tower kernel: scsi 3:0:0:0: SSP: handle(0x0009), sas_addr(0x500c0ff1233b4400), phy(0), device_name(0x0000000000000000) Mar 8 04:54:20 Tower kernel: scsi 3:0:0:0: enclosure logical id (0x500605b0029eafe0), slot(0) Mar 8 04:54:20 Tower kernel: scsi 3:0:0:0: Power-on or device reset occurred Mar 8 04:54:20 Tower kernel: scsi 3:0:0:0: Attached scsi generic sg3 type 13 This suggest the connections are good but the enclosure is not detecting the disk, though you should update both LSIs firmware since they are ancient, current release is 20.00.07.00
  8. We need the diags, most useful if grabbed after running a few days until it happens again, and before rebooting.
  9. It doesn't look like an Unraid problem, strange it was working before, those disks are not reporting temp, e.g.:, disk2: === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 0 C Drive Trip Temperature: 0 C The other disks are, e.g. disk 10: === START OF READ SMART DATA SECTION === SMART Health Status: OK Percentage used endurance indicator: 6% Current Drive Temperature: 30 C Drive Trip Temperature: 63 C That comes from the SMART data, can't see how the upgrade could affect that, can you downgrade back to v6.8.2 to confirm? Pleas post new diags if it is working.
  10. We can't see what happened but likely a controller error, when errors on multiple drives happen at the same time Unraid will disable as many drives as there are parity devices, which drives get disabled is luck of the draw, unlikely to be upgraded related, especially since if I understood correctly problem was before rebooting, and the new release would only be loaded after the reboot, when the disks were already disabled, just re-sync parity and try to get the diags before rebuilding if it happens again.
  11. That's OK if copying from the cache device to a different share, or use /mnt/user0/USER. As for the cache is showing only checksum/corruption errors, you should run a memtest.
  12. Which disk is that one? All array devices appear to be mounting correctly, though there are still ATA errors on at least two devices.
  13. You also have an overheating CPU, check cooling: Mar 5 08:11:15 Valyria kernel: CPU4: Core temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU12: Core temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU14: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU13: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU15: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU12: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
  14. I would guess disk3 is the most likely culprit, but it can get better at any point since these are usually bad areas, though it can also end up giving some read errors. Unrelated to this, you have some filesystem complaining of low space: Mar 6 20:13:58 UNRAID kernel: Filesystem "md11": reserve blocks depleted! Consider increasing reserve pool size. Mar 6 20:13:58 UNRAID kernel: XFS (md11): Per-AG reservation for AG 0 failed. Filesystem may run out of space. Mar 6 20:13:58 UNRAID kernel: XFS (md11): Per-AG reservation for AG 1 failed. Filesystem may run out of space. Mar 6 20:13:58 UNRAID kernel: XFS (md11): Per-AG reservation for AG 2 failed. Filesystem may run out of space. There are a various disks with less than 50MB free, you might have issues if you need for example to run a filesystem check in any of those, you should leave a few GB free, like 10/20GB. Filesystem Size Used Avail Use% Mounted on /dev/md2 2.8T 2.8T 4.4M 100% /mnt/disk2 /dev/md6 2.8T 2.8T 45M 100% /mnt/disk6 /dev/md7 2.8T 2.8T 46M 100% /mnt/disk7 /dev/md9 2.8T 2.8T 48M 100% /mnt/disk9 /dev/md11 2.8T 2.8T 40M 100% /mnt/disk11 /dev/md12 3.7T 3.7T 38M 100% /mnt/disk12 /dev/md16 3.7T 3.7T 35M 100% /mnt/disk16
  15. Those ata errors on disk2 are a hardware problem, likely a connection issue, check/replace both cables.
  16. With just the NVMe device I would use it as cache for now, so you could also cache some writes, assuming the available space is enough, but if you plan to add a cache pool soon probably best to use it as unassigned device from the start, also yes to the second question.
  17. Don't see any clues here, but diags are right after rebooting, whatever happened likely happened before this, assuming that before the pool was re-formatted I agree there's likely some hardware issue, look for a firmware update for the NVMe device.
  18. For some time now Unraid keeps the current profile when adding new devices, it won't revert back to raid1. Pool is correctly configured for raid10, but like mentioned the usable space reported on the GUI will be wrong, this is a btrfs known issue when using different size devices.
  19. None. Yes, mostly because of the read errors on disk4 during initial parity sync.
  20. mdcmd status | egrep "mdResync" It will display some values, including: mdResync=total parity size mdResyncPos=current position
  21. Possibly, try a different one if available.
  22. You should create a bug report, my guess is that it's not converting TB to GB, and anyone with >= 1TB of RAM will have this issue.
×
×
  • Create New...