Unraid OS version 6.10.2 available


Recommended Posts

49 minutes ago, John_M said:

 

Watching from the sideline because my Gen8 still has its original non VTd-capable Celeron processor, but wouldn't a better solution be to disable VT-d automatically via syslinux.cfg when the problematic configuration is detected instead of disabling the NIC? It would still take some users by surprise, of course, but at least they'd still be able to connect to their servers.

AFAIK it's not possible to programmatically disable VT-d.  The way the kernel initializes is based on whether VT-d is enabled or not.

 

The current approach was taken in an abundance of caution.  Going into a 3-day holiday here in the US I decided it's better for users to lose network connection (which I agree sucks) than to suffer data loss, when we know about possible data loss (that would suck even more).

 

I've just added some code to the downloaded 'unRAIDServer.plg' file that will detect the combination of 'tg3' module loaded and VT-d enabled, and will bail out of the upgrade unless ./config/modprobe.d/tg.conf file exists.  This should greatly help those upgrading but new users on affected platform will still see no ethernet.

 

This is going to take us some time to get this fixed; probably will have to go purchase a known-affected platform.  The issue is acknowledged here:

https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c04565693

 

Why this has suddenly happened is a mystery.

 

 

  • Like 1
  • Thanks 4
Link to comment
11 minutes ago, limetech said:

've just added some code to the downloaded 'unRAIDServer.plg' file that will detect the combination of 'tg3' module loaded and VT-d enabled, and will bail out of the upgrade unless ./config/modprobe.d/tg.conf file exists.

Be sure to provide some way for those of us who are trying to provide support by changing the version number (6.10.2a or 6.10.3).  There is already enough confusion over this issue. 

Link to comment
4 hours ago, Frank1940 said:

Be sure to provide some way for those of us who are trying to provide support by changing the version number (6.10.2a or 6.10.3).  There is already enough confusion over this issue. 

 

When you click 'Check for Updates" it downloads 'unRAIDServer.plg' file from our download server.  When this file is 'executed' and detects tg3 present and iommu enabled it does this:

 

      echo "NOTE: combination of NIC using tg3 driver and Intel VT-d enabled may cause DATA CORRUPTION on some platforms."
      echo "Please disable VT-d in BIOS or pass 'intel_iommu=off' on syslinux kernel append line."
      echo "Alternaltely create 'config/modprobe.d/tg3.conf' file:"
      echo "  touch /boot/config/modprobe.d/tg3.conf  # if your platform is not affected"
      echo "or"
      echo "  echo 'blacklist tg3' > /boot/config/modprobe.d/tg3.conf  # to blacklist the tg3 driver"
      echo
      exit 1

 

The script only checks for existence of modprobe.d/tg3.conf file, not it's content.  Hence user can choose to blacklist or not.

  • Like 2
  • Thanks 1
Link to comment
17 minutes ago, limetech said:

 

Thanks for that. The document suggests, as an alternative to disabling IOMMU,

 

Quote

Disable HP Shared Memory in the network adapter Option ROM

 

and gives instructions on how to do it. Maybe someone with affected hardware and who is prepared to take a risk could give that a try?

 

  • Thanks 1
Link to comment

Well, wait a minute.

I had blanked out the tg3.conf on my Dell R710 which uses the bnx2 module. Before blanking it out, I was getting lots (screen fulls) of DMAR errors. I get 2 on boot but then none (so far) after that.

Shouldn't I have had NO errors? Is there something else going on here?

Link to comment
1 hour ago, limetech said:
echo "NOTE: combination of NIC using tg3 driver and Intel VT-d enabled may cause DATA CORRUPTION"
      echo "Please disable VT-d in BIOS or pass 'intel_iommu=off' on syslinux kernel append line."
      echo "Alternaltely create 'config/modprobe.d/tg3.conf' file:"
      echo "  touch /boot/config/modprobe.d/tg3.conf  # if your platform is not affected"
      echo "or"
      echo "  echo 'blacklist tg3' > /boot/config/modprobe.d/tg3.conf  # to blacklist the tg3 driver"
      echo
      exit 1

 

 Am I correct in assuming that the upgrade box will remain open with the text display and the actual upgrade process terminated (Normal expectation for a script with an exit status of '1') ?

Link to comment

After reading this forum post I am now too scared to update unraid. Very worrying if we are between no network or possible data loss after an update. The update procedure is already very much a risky process at times since cloning a usb stick means the original is blacklisted so reverting back is not easy (as the cloned stick is useless) and now we are heading into territory where specific hardware can cause catastrophic service impact on things as simple as a NIC.

 

Obviously not trying to say anyone is to blame nor suggest I have a better solution to the problem but as a user of unraid I can see how this would cause reactions I am seeing on the forums. It is very much a product that encourages consolidating many services onto one dedicated box, so when that service is broken, it can be multiple different parts on the network affected, NAS, DHCP, DNS, Web services etc etc etc. I feel like not everyone is a forum user and this could be better communicated or something into the update procedure could be implemented like a "known issues tick this box to proceed the upgrade" type thing where when you update it says "some users may lose connectivity, click I agree here to accept this and continue the update and agree you have read what to do if this affects you (link to instructions here) or something. (Not claiming this is the best solution just saying what came into my head upon reading this).

Edited by PeteAsking
  • Upvote 1
Link to comment

When editing a docker container, Unraid does not remember that I picked the "Advanced View" anymore.

Instead it always shows me the basic view.

Is this intentional?
 

It's a bit cumbersome, having to click this every time, as I regularily use/modify the fields that are hidden in basic view.

 

image.thumb.png.66347dd06880eaeadde7a5fbfa2fc813.png

Link to comment
1 hour ago, PeteAsking said:

After reading this forum post I am now too scared to update unraid. Very worrying if we are between no network or possible data loss after an update.

If you're already on an identified hardware setup and on 6.10 or 6.10.1 you're already in a possible data loss situation. The data loss is not tied to 6.10.2 upgrade.

  • Like 1
Link to comment
58 minutes ago, bonienl said:

 

Yes, configuration is always opened in basic mode.

 

 

Ah I realized that template authoring mode got disabled for some reason.

Probably had to do with me restoring a bunch of settiings from backup due to fixing another issue.

 

Thanks for the replay nonetheless!

Link to comment
2 hours ago, PeteAsking said:

After reading this forum post I am now too scared to update unraid.

Right there with you. I just went back to 6.9.2 until this all gets sorted out. Can't say I was comfortable with the thought that I might not have data corruption. I also had a problem with My Servers that is now resolved now that I went back to 6.9.2. It was giving me an unraid-api error.

 

Thanks and good luck to the team investigating this DMAR error situation!

  • Like 2
Link to comment

(Not sure if we can just be candid and post our thoughts but assuming its ok).

 

been thinking that I am unsure if my worries are unfounded but perhaps if people avoid upgrading due to the process being considered a possible risk then it could be the case that not a lot of people actually try the RC versions of releases as a result. This might make limetechs job difficult when releasing a new version. Maybe we as the members in the unraid community should try to form a beta testing task force that would or possibly could assist limetech if it worked within constraints they provided. 
 

At the moment my impression (which could be incorrect) is that its more of a passive testing. As in, the RC release is posted to the forum and anyone can feel like trying it out and providing feedback. This passive testing might not be as effective as say a committed group of like 100 people who pledge to update and will email their logs, along with hardware info that is relevant, for limetech to review even if no real issues are detected at all. This provides many different real systems both with or without issues to compare and for them to look at. The other problem is many of us (like me) might not know what errors to really look for even of things appear to be working in the short term but a once over from limetech might uncover inconsistencies in logs we dont fully appreciate and provide a more active ‘search and identify’ type beta testing path. 
 

if something like that was desirable then im sure a bunch of us could get together and step up to commit to testing RC versions and giving any info and logs to limetech to review. Pretty sure as long as there was a semi reasonable way to revert in case of a non bootable or unusable system then a lot of people could pledge to commit to testing. Might be like a fun thing for a group of us to get together and do not sure what other people think Im just saying we all hang around the forums anyway. If everyone disagrees thats ok too I was just saying what came into my head. Im not telling anyone what to do. 

  • Like 1
Link to comment
10 minutes ago, trurl said:

Reverting is easy and cloning flash is unnecessary 

If the system does not boot it is not as easy as unplugging a stick and plugging in the cloned stick, regardless of how easy it is claimed to be. Unplugging one thing and plugging in another thing is literally going to be the easiest conceivable option in a disaster situation where your entire network is down with no internet because the single device that homes everything is down. Just saying. That is why enterprise equipment has a flash and a backup flash for example and you can select which flash to boot up from when it starts up eg like netgear switches and whatever else has a duel boot type thing. I think even some home motherboards have like a duel boot bios thing. Same reasoning. Its just how other people solve a problem with 100% guarantee rollback since its a different chip being relied upon, or different flash boot. Or different cloned stick. I feel you have to go the extra mile to reassure people that rollback cant fail rather than just be like ‘yeah you read these instructions and its super easy to follow if something breaks’ as this has an element of user needing not to be an idiot like me to complete the task correctly. Its also a time element. Follow instructions = time. Unplug one thing and plug in another thing = 30 seconds. Very different scenario in a time sensitive window. 

Edited by PeteAsking
  • Like 1
Link to comment

To revert just copy 'previous' folder on flash to the top level. Or if you have a recent flash backup you can install any version of Unraid you want on that original flash drive and get your configuration from the config folder of your backup.

Link to comment

I understand you feel comfortable and have various methods that you have used to do this restore procedure and it seems simple. I am simply giving you my feeling as someone who is a non expert and I feel my experience is both valid and possibly even shared with other users who are also in the same non expert position I am in. However I will also leave it at that. If you are happy with how the update rollout is going and that there is no need to improve the situation then I am happy to leave it to you guys and just get the all clear when its safe to upgrade. Not being mean or anything I really am happy to chill out and just get told when to jog on and hit update. It doesnt phase me and I dont feel any need to be combative about any of it if thats what people like. Peace :)

  • Like 1
Link to comment

The error with "DMAR: ERROR: DMA PTE for vPFN" is also reported on the Ubuntu Bug page.
Affected system: Linux kernel 5.15.0.27.30 on an HPE ProLiant DL20 Gen9 server.

See here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970453

 

 

With the same work around:
Setting the intel_iommu=off kernel boot parameter seems to work around the problem.

 

 

Also an interesting comment in the Bug Report


I hit this bug upgrading my home server (proliant microserver gen9) and it seems to be causing memory corruption when it occurs ( at least in combination with zfs ). Using zfs mirrored root I experienced this issue after only a few minutes uptime, with DMAR messages flooding the log and very high CPU usage.

After rebooting with intel_iommu=off things are back to normal, but a zfs scrub indicated several thousand checksum errors detected on the root volume, some of them unrecoverable that had to be restored from backup, and a separate zfs RAIDZ1 volume experienced corrupted metadata and had to be rolled back with some data loss.

Edited by Thorsten
  • Like 1
  • Upvote 2
Link to comment

I got a weird issue, that maybe ya'all can help me fix. I know I don't have diags, but I don't know if I can find the diags for this issue. 

I was using the built in NoVNC viewer for both my Windows 10 VM, and backlaze_personal_backup docker from CA. Both NoVNC instances worked in 6.10.1, and also 6.10.0. But both can't connect under 6.10.2. 

Windows 10 VM (and also a new Windows 11 VM I was going to set up) say Failed to connect to server. Backblaze_personal_backup says server disconnected, error code 1006. When I try to get to the logs for Backblaze_personal_backup which it used to work fine in 6.10.1, and 6.10, they now just show a blank screen. The windows 10 VM just has logs pertaining to VM settings as far as I can tell. 

I looked through my other dockers and binhex-krusader (which also uses noVNC) works fine. I have also tried rebooting the dockers/vms, and also turning off and then on both the docker service and the VM service. 

I can connect to the noVNC instances via noVNC viewer on my desktop, but I haven't run across this before. Where would I be able to get the logs/diags for these issues to help you guys help me out?

Link to comment
33 minutes ago, urhellishntemre said:

I was using the built in NoVNC viewer for both my Windows 10 VM, and backlaze_personal_backup docker from CA. Both NoVNC instances worked in 6.10.1, and also 6.10.0. But both can't connect under 6.10.2. 

Try clearing the browser history, cookies, etc

Link to comment

I agree, a link to the discussion forum in the little release notes provided in the webGUI (the "i" button) would be very helpful and courteous to users. Additionally, I'd love to have more than just the changelog in there. The same text blurbs that are posted in the forum post for the release notes would also be helpful. 

 

I'm personally pretty diligent about reading the upgrade threads from top to bottom, and I'll admit that though it's great we have organized release threads, it's a little annoying to have to search for the thread to be sure I don't run into any show-stoppers. A link would really be lovely.

Anyway, thanks for the release!

 

Edit: Also, my upgrade from 6.10.1 to 6.10.2 went just fine.

Edited by bitcore
Link to comment
23 hours ago, Frank1940 said:

 

On the banner on top of the first page that comes up with the GUI (most likely the MAIN tab), there is a big 'Upgrade now' button in a banner box.  (I know it is there because I usually wait a bit to upgrade until I can read the release notes and a couple of pages of comments in the release thread.)   I don't recall there even being an 'I' button on that banner.  It seems that too many folks simply will click on anything.  (Malware writers often use this same behavior pattern to have the unsuspecting do their bidding...)

 

So you are saying that we should not trust  on the Unraid's top banner telling us to "update" the system because it can be a Malware?!

This is a joke, right? 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.