Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Linux 2.6.27 kernel released 9 October 2008.

Featured Replies

I like that site. Never seen it before but very readable.

 

Theres ome changes that may benefit unRAID in this kernel so bring it on after that serious bug is squashed

  • Author
There may be one serious problem with it:

 

That's been fixed since rc7.

There may be one serious problem with it:

 

That's been fixed since rc7.

 

http://www.phoronix.com/scan.php?page=news_item&px=Njc2OA

 

Intel Provides Temporary e1000e Fix

http://www.phoronix.com/scan.php?page=news_item&px=Njc1OQ

 

In the Linux 2.6.27 kernel code was a rather serious regression where a faulty driver is killing Intel network hardware. Specifically the e1000 and e1000e network adapters were getting their EEPROM corrupted by the driver, which renders the network interface permanently inoperable unless that non-volatile memory can be restored. The e1000 problem was patched but the Intel e1000e remains problematic. Fortunately, Intel has now provided a workaround so that no further Intel network hardware is damaged.

 

A patch was proposed by Intel last night on the Linux kernel mailing list that prevents the e1000e non-volatile memory (NVM) from being corrupted when the respective Linux driver is loaded. There is no proper fix yet to this situation but Intel is continuing to explore the problem. Intel is also preparing patches that help users with damaged network hardware restore their EEPROM. For the Linux 2.6.28 kernel, Intel will push forward patches that clean up the network driver's use of

the hardware/software semaphore.

We're going to release 4.4-beta3 with 2.6.26.6 kernel & then probably wait until 2.6.27.1.  We have been following the e1000 bug with great interest & they claim it is fixed in 2.6.27, but so far I have not wanted to risk bricking one of our motherboards to test this yet :)

We're going to release 4.4-beta3 with 2.6.26.6 kernel & then probably wait until 2.6.27.1.  We have been following the e1000 bug with great interest & they claim it is fixed in 2.6.27, but so far I have not wanted to risk bricking one of our motherboards to test this yet :)

 

I think this is a wise decision, in fact I was going to wait myself.

  • Author

If you are afraid of bricking an E1000, download the Intel patch first, then upgrade the kernel... if the E1000 gets hosed, you already have the EEPROM patch to fix it.

If you are afraid of bricking an E1000, download the Intel patch first, then upgrade the kernel... if the E1000 gets hosed, you already have the EEPROM patch to fix it.

In my case, the e1000 driver is used on the original unRAID recommended Intel Motherboard...  I've no idea if it is as easy as you describe to update it's EEPROM.  I'm willing to wait a few days longer for a corrected 2.6.27 kernel.

They appear to have figured out the serious bug that was 'bricking' those Intel EEPROM's that use the e1000e driver, see this thread (very interesting):  http://lkml.org/lkml/2008/10/15/337.  And also very interesting is the author's comments on the FTRACE code at fault, found just past half way down here:  http://lwn.net/Articles/303390/.

 

I have to agree with most that there is no way a memory corruption bug should have been able to overwrite an EEPROM.  Seems like poor hardware design here...

Any news here? I see there's already 2.6.27.3 kernel but still no e1000 bug fixes...

  • Author

The E1000 bug was FIXED several releases ago in the 2.6.27 kernel... actually it was fixed in RC7.... only older version, prior to RC7, had the bug.

The E1000 bug was FIXED several releases ago in the 2.6.27 kernel... actually it was fixed in RC7.... only older version, prior to RC7, had the bug.

I don't think so. Look at this:

 

commit bc5b8bb64a2dc740d8b99635931e689a8b13daf2

Author: Greg Kroah-Hartman <[email protected]>

Date:  Wed Oct 15 16:02:53 2008 -0700

 

    Linux 2.6.27.1

 

commit d23d43386311fde5f11e06c16d4185e94a8d6d06

Author: Steven Rostedt <[email protected]>

Date:  Wed Oct 15 18:21:44 2008 -0400

 

    disable CONFIG_DYNAMIC_FTRACE due to possible memory corruption on module unload

   

    While debugging the e1000e corruption bug with Intel, we discovered

    today that the dynamic ftrace code in mainline is the likely source of

    this bug.

   

    For the stable kernel we are providing the only viable fix patch: labeling

    CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)

   

    We will follow up with a backport patch that contains the fixes. But since

    the fixes are not a one liner, the safest approach for now is to

    disable the code in question.

   

    The cause of the bug is due to the way the current code in mainline

    handles dynamic ftrace.  When dynamic ftrace is turned on, it also

    turns on CONFIG_FTRACE which enables the -pg config in gcc that places

    a call to mcount at every function call. With just CONFIG_FTRACE this

    causes a noticeable overhead.  CONFIG_DYNAMIC_FTRACE works to ease this

    overhead by dynamically updating the mcount call sites into nops.

   

    The problem arises when we trace functions and modules are unloaded.

    The first time a function is called, it will call mcount and the mcount

    call will call ftrace_record_ip. This records the calling site and

    stores it in a preallocated hash table. Later on a daemon will

    wake up and call kstop_machine and convert any mcount callers into

    nops.

   

    The evolution of this code first tried to do this without the kstop_machine

    and used cmpxchg to update the callers as they were called. But I

    was informed that this is dangerous to do on SMP machines if another

    CPU is running that same code. The solution was to do this with

    kstop_machine.

   

    We still used cmpxchg to test if the code that we are modifying is

    indeed code that we expect to be before updating it - as a final

    line of defense.

   

    But on 32bit machines, ioremapped memory and modules share the same

    address space. When a module would load its code into memory and execute

    some code, that would register the function.

   

    On module unload, ftrace incorrectly did not zap these functions from

    its hash (this was the bug). The cmpxchg could have saved us in most

    cases (via luck) - but with ioremap-ed memory that was exactly the wrong

    thing to do - the results of cmpxchg on device memory are undefined.

    (and will likely result in a write)

   

    The pending .28 ftrace tree does not have this bug anymore, as a general push

    towards more robustness of code patching, this is done differently: we do not

    use cmpxchg and we do a WARN_ON and turn the tracer off if anything deviates

    from its expected state. Furthermore, patch sites are statically identified

    during build time so there's no runtime discovery of dynamic code areas

    anymore, and no room for code unmaps to cause the hash to become out of date.

   

    We believe the fragility of dynamic patching has been sufficiently

    addressed in the development code via the static patching method, but further

    suggestions to make it more robust are welcome.

  • Author

That references the underlying source of the problem... but a fix to prevent the problem for manifesting itself was already in place by RC7.

 

One is the disease, one is the symptom.

 

correct me if I'm wrong, but isn't the current fix like putting a bandaid on a wart?

  • Author

Actually, a bandaid with a drop of salicylic acid on it will cure a wart.

 

It's more like putting a lock on the door, instead of shooting the burglar.

correct me if I'm wrong, but isn't the current fix like putting a bandaid on a wart?

The current "fix" is a disabling of an option that causes the problem until the correct fix can be designed and developed.

 

Apparently, self-modifying code, intended to improve performance, went about modifying some of the network card's eeprom that had been mapped to a portion of memory.  This caused the corruption.

 

The self-modifying feature is simply not enabled at this time in the current 2.6.27 release.  If a custom kernel is compiled with the feature enabled, it will still be broken, and potentially trash memory once again until a true fix is coded.

 

Joe L. 

and potentially trash memory once again until a true fix is coded.

 

That's what I thought. So it's not really fixed, just disabled/bypassed.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.