-
Posts
67,783 -
Joined
-
Last visited
-
Days Won
708
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Everything posted by JorgeB
-
[6.10.1] - After wakeup from sleep: NO GUI, NO SSH, NO PING
JorgeB commented on Zonediver's report in Stable Releases
Unraid doesn't officially support sleep, very difficult to do with all the different types of hardware used, there's a sleep plugin and you can report any issues in the existing plugin support thread. -
If it's a Dell server see this, and still worth a try even if it isn't: https://forums.unraid.net/topic/119502-bzimage-checksum-error/?do=findComment&comment=1108354
-
If you have a monitor connected or IPMI please post a photo/ screenshot of what's showing there.
-
[6.10.1] failed to connect to the hypervisor after 6.10 upgrade
JorgeB commented on gemino's report in Stable Releases
Is it the same if you boot in safe mode? -
Disk errors on all drives except NVME at random
JorgeB replied to jettrainz's topic in General Support
Yes, BIOS update might also help, but first post new diags after this happens again so we can confirm if that is really the problem. -
Yes, start by updating the firmware and improving cooling on the HBA, these are designed for servers with very good cooling, when used in desktop cases they might need some active cooling, or at least a case with very good airflow.
-
Disk errors on all drives except NVME at random
JorgeB replied to jettrainz's topic in General Support
Diags are after rebooting so we can't see what happen, main suspect would be the SATA controller, quite common to have issues with some Ryzen boards, especially when under heavy load. -
Share Allocation, Fill-up is not working as I expected.
JorgeB replied to JKY's topic in General Support
Please post the diagnostics and the share name you're using. -
You should run a correcting check now, you can then run a non correcting one after a couple of days to confirm no more issues.
-
Disk dropped offline, that's why there's no SMART, check connections and/or power cycle the server and post new diags.
-
[SOLVED] Unmountable disk when trying to replace a cache pool disk
JorgeB replied to Jaybau's topic in General Support
Yes, without this step the replacement won't work, and the old device will still be wiped. If the old device was wiped by Unraid but nothing more was done you should be able to recover the fs by running: btrfs-select-super -s 1 /dev/sdX1 Replace X with the correct letter, note the 1 in the end, if the command completes OK assign that device and start the array, if it mounts you can now try the replacement. -
Problem is that you're using a very old firmware for the LSI HBA, those ancient firmware versions used the long name for SATA devices, this has been fixed a long time ago, and now it was also fixed in the driver for people still using an old firmware, you just need to do a new config and re-assign all the devices, then check "parity is already valid" before starting the array. I would also recommend updating the LSI firmware to latest, note that if you update the firmware first you'll see the same issue with -rc4 you're seeing now with v6.10.1
-
Click on cache on the main page then scroll down until the scrub section.
-
Without pre-existing checksums for the files not much you can do other that correct parity, any files that are corrupt in the pool will be listed in the syslog after the scrub.
-
You now should also run a correcting scrub on the pool.
-
Now it's disable, likely there are two different settings in the BIOS, the one you disabled is for VT-x, you can re-enable that and still run VMs as long an no hardware if being passed through (for that you need VT-d).
-
It was still enable in the last diags posted.
-
It might be called a different thing, like Intel Virtualization technology or similar, alternatively add intel_iommu=off to syslinux.cfg append line, in either case check after booting that IOMMU is really disable, click on system info, top right of the GUI, you should not use the server until that's done.
-
TL; DR I would recommend only running v6.10.x on a server with a Brodcom NIC that uses the tg3 driver if VT-d/IOMMU is disable or it might in some cases cause serious stability issues, including possible filesystem corruption. Another update since this is an important issue, there's a new case with an IBM/Lenovo X3100 M5 server, this server uses the same NIC driver as the HP so this appears to confirm the problem is the NIC/NIC driver when IOMMU is enable. Known problematic NICs: HP Microserver Gen8: 03:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f] DeviceName: NIC Port 1 Subsystem: Hewlett-Packard Company NC332i Adapter [103c:2133] Kernel driver in use: tg3 IBM/Lenovo X3100 M5: 06:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5717 Gigabit Ethernet PCIe [14e4:1655] (rev 10) DeviceName: Broadcom 5717 Subsystem: IBM NetXtreme BCM5717 Gigabit Ethernet PCIe [1014:0490] Kernel driver in use: tg3 HP ProLiant ML350p Gen8 02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01) DeviceName: NIC Port 1 Subsystem: Hewlett-Packard Company NetXtreme BCM5719 Gigabit Ethernet PCIe [103c:3372] Kernel driver in use: tg3 This driver supports many different NICs, unclear for now if all are affected or just some, also unclear if AMD based servers with AMD-Vi/IOMMU enable are affected, but for now I would recommend only running v6.10.x on a server with a Brodcom NIC that uses this driver if VT-d/IOMMU is disable or it might in some cases cause serious stability issues, including possible filesystem corruption. When there is a problem with one of these NICs and VT-d you should see multiple errors similar to below in the logs not long after booting, usually before a couple of hours of uptime: May 21 15:53:05 Tower kernel: DMAR: ERROR: DMA PTE for vPFN 0xb0780 already set (to b0780003 not 28dc74801) May 21 15:53:05 Tower kernel: ------------[ cut here ]------------ May 21 15:53:05 Tower kernel: WARNING: CPU: 1 PID: 557 at drivers/iommu/intel/iommu.c:2408 __domain_mapping+0x2e5/0x390 If you see that stop using the server and disable VT-d/IOMMU ASAP, there's no need to disable VT-x/HVM, i.e., you can still run VMs (but without VT-d/IOMMU can't passthrough any device to one). For Intel CPUs VT-d can usually be disabled in the BIOS, alternatively you can add intel_iommu=off to the syslinux.cfg append line, on the main GUI page click on flash and scroll down to "Syslinux Configuration", then add it to the default boot option, the one in green) : In either case confirm it's really disabled, you can do that by clicking on "system information", top right of the GUI: Original post here: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/?do=findComment&comment=1128822
-
You should disable Intel VT-d ASAP, looks like the same issue that affects the HP Microserver Gen8, likely because they use the same NIC driver, more info below: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/page/8/#comment-1129501
-
Run it again without -n or nothing will done.
-
Disable VT-d in the BIOS, that error would likely already been there since updating to v6.10, and it can cause other issues, more info below: https://forums.unraid.net/topic/123620-unraid-os-version-6100-available/?do=findComment&comment=1128822
-
Thanks, diags confirm that the Samsung devices names changed due to a change starting with -rc8, there's just one underscore before the serial, before there were two, you can correct this by doing the following: -unassign all devices from those two pools -start the array to make Unraid "forget" the old device names -stop array -re-assign the devices to both pools, double check you're assigning the original devices to each pool -start array, existing pools will be imported and new names saved
-
One more update, @RikStigteris helping me confirm if like suspected having IOMMU enable on these servers is the source of the problem, preliminary results look positive since the usual errors logged on them after updating to v6.10 are gone with VT-d disable, he will now use the server normally for a few days so we can confirm if it remains all good. Issue is possibly caused by the onboard NICs when VT-d is enable, can't tell you if it's a HP problem or some Linux issue with the new kernel, certainly nothing suggests an Unraid problem, but hopefully disabling VT-d for now fixes this, again servers with a Pentium or i3 CPU shouldn't have this issue since they don't support VT-d, though I would still recommend disabling it in the BIOS, since apparently it's enable by default, so if later they are upgraded to a Xeon and this issue still exists there won't be a problem.