Jump to content

JorgeB

Moderators
  • Posts

    67,783
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. Unraid doesn't officially support sleep, very difficult to do with all the different types of hardware used, there's a sleep plugin and you can report any issues in the existing plugin support thread.
  2. If it's a Dell server see this, and still worth a try even if it isn't: https://forums.unraid.net/topic/119502-bzimage-checksum-error/?do=findComment&comment=1108354
  3. If you have a monitor connected or IPMI please post a photo/ screenshot of what's showing there.
  4. Yes, BIOS update might also help, but first post new diags after this happens again so we can confirm if that is really the problem.
  5. Yes, start by updating the firmware and improving cooling on the HBA, these are designed for servers with very good cooling, when used in desktop cases they might need some active cooling, or at least a case with very good airflow.
  6. Diags are after rebooting so we can't see what happen, main suspect would be the SATA controller, quite common to have issues with some Ryzen boards, especially when under heavy load.
  7. Please post the diagnostics and the share name you're using.
  8. You should run a correcting check now, you can then run a non correcting one after a couple of days to confirm no more issues.
  9. Disk dropped offline, that's why there's no SMART, check connections and/or power cycle the server and post new diags.
  10. Yes, without this step the replacement won't work, and the old device will still be wiped. If the old device was wiped by Unraid but nothing more was done you should be able to recover the fs by running: btrfs-select-super -s 1 /dev/sdX1 Replace X with the correct letter, note the 1 in the end, if the command completes OK assign that device and start the array, if it mounts you can now try the replacement.
  11. Not sure why you are asking me? I can't help with this since I don't use Macs but you're issue might be different than the one here, you should create a new bug report with a complete description of the problem you're having and don't forget the diagnostics.
  12. Problem is that you're using a very old firmware for the LSI HBA, those ancient firmware versions used the long name for SATA devices, this has been fixed a long time ago, and now it was also fixed in the driver for people still using an old firmware, you just need to do a new config and re-assign all the devices, then check "parity is already valid" before starting the array. I would also recommend updating the LSI firmware to latest, note that if you update the firmware first you'll see the same issue with -rc4 you're seeing now with v6.10.1
  13. Click on cache on the main page then scroll down until the scrub section.
  14. Without pre-existing checksums for the files not much you can do other that correct parity, any files that are corrupt in the pool will be listed in the syslog after the scrub.
  15. You now should also run a correcting scrub on the pool.
  16. Now it's disable, likely there are two different settings in the BIOS, the one you disabled is for VT-x, you can re-enable that and still run VMs as long an no hardware if being passed through (for that you need VT-d).
  17. It was still enable in the last diags posted.
  18. It might be called a different thing, like Intel Virtualization technology or similar, alternatively add intel_iommu=off to syslinux.cfg append line, in either case check after booting that IOMMU is really disable, click on system info, top right of the GUI, you should not use the server until that's done.
  19. TL; DR I would recommend only running v6.10.x on a server with a Brodcom NIC that uses the tg3 driver if VT-d/IOMMU is disable or it might in some cases cause serious stability issues, including possible filesystem corruption. Another update since this is an important issue, there's a new case with an IBM/Lenovo X3100 M5 server, this server uses the same NIC driver as the HP so this appears to confirm the problem is the NIC/NIC driver when IOMMU is enable. Known problematic NICs: HP Microserver Gen8: 03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f] DeviceName: NIC Port 1 Subsystem: Hewlett-Packard Company NC332i Adapter [103c:2133] Kernel driver in use: tg3 IBM/Lenovo X3100 M5: 06:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe [14e4:1655] (rev 10) DeviceName: Broadcom 5717 Subsystem: IBM NetXtreme BCM5717 Gigabit Ethernet PCIe [1014:0490] Kernel driver in use: tg3 HP ProLiant ML350p Gen8 02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01) DeviceName: NIC Port 1 Subsystem: Hewlett-Packard Company NetXtreme BCM5719 Gigabit Ethernet PCIe [103c:3372] Kernel driver in use: tg3 This driver supports many different NICs, unclear for now if all are affected or just some, also unclear if AMD based servers with AMD-Vi/IOMMU enable are affected, but for now I would recommend only running v6.10.x on a server with a Brodcom NIC that uses this driver if VT-d/IOMMU is disable or it might in some cases cause serious stability issues, including possible filesystem corruption. When there is a problem with one of these NICs and VT-d you should see multiple errors similar to below in the logs not long after booting, usually before a couple of hours of uptime: May 21 15:53:05 Tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xb0780 already set (to b0780003 not 28dc74801) May 21 15:53:05 Tower kernel: ------------[ cut here ]------------ May 21 15:53:05 Tower kernel: WARNING: CPU: 1 PID: 557 at drivers/iommu/intel/iommu.c:2408 __domain_mapping+0x2e5/0x390 If you see that stop using the server and disable VT-d/IOMMU ASAP, there's no need to disable VT-x/HVM, i.e., you can still run VMs (but without VT-d/IOMMU can't passthrough any device to one). For Intel CPUs VT-d can usually be disabled in the BIOS, alternatively you can add intel_iommu=off to the syslinux.cfg append line, on the main GUI page click on flash and scroll down to "Syslinux Configuration", then add it to the default boot option, the one in green) : In either case confirm it's really disabled, you can do that by clicking on "system information", top right of the GUI: Original post here: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/?do=findComment&comment=1128822
  20. You should disable Intel VT-d ASAP, looks like the same issue that affects the HP Microserver Gen8, likely because they use the same NIC driver, more info below: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/page/8/#comment-1129501
  21. Run it again without -n or nothing will done.
  22. Disable VT-d in the BIOS, that error would likely already been there since updating to v6.10, and it can cause other issues, more info below: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/?do=findComment&comment=1128822
  23. Thanks, diags confirm that the Samsung devices names changed due to a change starting with -rc8, there's just one underscore before the serial, before there were two, you can correct this by doing the following: -unassign all devices from those two pools -start the array to make Unraid "forget" the old device names -stop array -re-assign the devices to both pools, double check you're assigning the original devices to each pool -start array, existing pools will be imported and new names saved
  24. One more update, @RikStigteris helping me confirm if like suspected having IOMMU enable on these servers is the source of the problem, preliminary results look positive since the usual errors logged on them after updating to v6.10 are gone with VT-d disable, he will now use the server normally for a few days so we can confirm if it remains all good. Issue is possibly caused by the onboard NICs when VT-d is enable, can't tell you if it's a HP problem or some Linux issue with the new kernel, certainly nothing suggests an Unraid problem, but hopefully disabling VT-d for now fixes this, again servers with a Pentium or i3 CPU shouldn't have this issue since they don't support VT-d, though I would still recommend disabling it in the BIOS, since apparently it's enable by default, so if later they are upgraded to a Xeon and this issue still exists there won't be a problem.
×
×
  • Create New...