Linux 2.6.27 kernel released 9 October 2008.

October 10, 200817 yr

http://kernelnewbies.org/LinuxChanges

Quote

October 10, 200817 yr

There may be one serious problem with it: http://www.phoronix.com/scan.php?page=news_item&px=Njc0Nw.

Quote

October 10, 200817 yr

I like that site. Never seen it before but very readable.

Theres ome changes that may benefit unRAID in this kernel so bring it on after that serious bug is squashed

Quote

October 10, 200817 yr

Author

There may be one serious problem with it:

That's been fixed since rc7.

Quote

October 10, 200817 yr

There may be one serious problem with it:

That's been fixed since rc7.

http://www.phoronix.com/scan.php?page=news_item&px=Njc2OA

Intel Provides Temporary e1000e Fix

http://www.phoronix.com/scan.php?page=news_item&px=Njc1OQ

In the Linux 2.6.27 kernel code was a rather serious regression where a faulty driver is killing Intel network hardware. Specifically the e1000 and e1000e network adapters were getting their EEPROM corrupted by the driver, which renders the network interface permanently inoperable unless that non-volatile memory can be restored. The e1000 problem was patched but the Intel e1000e remains problematic. Fortunately, Intel has now provided a workaround so that no further Intel network hardware is damaged.

A patch was proposed by Intel last night on the Linux kernel mailing list that prevents the e1000e non-volatile memory (NVM) from being corrupted when the respective Linux driver is loaded. There is no proper fix yet to this situation but Intel is continuing to explore the problem. Intel is also preparing patches that help users with damaged network hardware restore their EEPROM. For the Linux 2.6.28 kernel, Intel will push forward patches that clean up the network driver's use of

the hardware/software semaphore.

Quote

October 10, 200817 yr

We're going to release 4.4-beta3 with 2.6.26.6 kernel & then probably wait until 2.6.27.1. We have been following the e1000 bug with great interest & they claim it is fixed in 2.6.27, but so far I have not wanted to risk bricking one of our motherboards to test this yet

Quote

October 10, 200817 yr

We're going to release 4.4-beta3 with 2.6.26.6 kernel & then probably wait until 2.6.27.1. We have been following the e1000 bug with great interest & they claim it is fixed in 2.6.27, but so far I have not wanted to risk bricking one of our motherboards to test this yet

I think this is a wise decision, in fact I was going to wait myself.

Quote

October 15, 200817 yr

Author

If you are afraid of bricking an E1000, download the Intel patch first, then upgrade the kernel... if the E1000 gets hosed, you already have the EEPROM patch to fix it.

Quote

October 15, 200817 yr

If you are afraid of bricking an E1000, download the Intel patch first, then upgrade the kernel... if the E1000 gets hosed, you already have the EEPROM patch to fix it.

In my case, the e1000 driver is used on the original unRAID recommended Intel Motherboard... I've no idea if it is as easy as you describe to update it's EEPROM. I'm willing to wait a few days longer for a corrected 2.6.27 kernel.

Quote

October 21, 200817 yr

They appear to have figured out the serious bug that was 'bricking' those Intel EEPROM's that use the e1000e driver, see this thread (very interesting): http://lkml.org/lkml/2008/10/15/337. And also very interesting is the author's comments on the FTRACE code at fault, found just past half way down here: http://lwn.net/Articles/303390/.

I have to agree with most that there is no way a memory corruption bug should have been able to overwrite an EEPROM. Seems like poor hardware design here...

Quote

October 23, 200817 yr

Any news here? I see there's already 2.6.27.3 kernel but still no e1000 bug fixes...

Quote

October 23, 200817 yr

Author

The E1000 bug was FIXED several releases ago in the 2.6.27 kernel... actually it was fixed in RC7.... only older version, prior to RC7, had the bug.

Quote

October 23, 200817 yr

The E1000 bug was FIXED several releases ago in the 2.6.27 kernel... actually it was fixed in RC7.... only older version, prior to RC7, had the bug.

I don't think so. Look at this:

commit bc5b8bb64a2dc740d8b99635931e689a8b13daf2

Author: Greg Kroah-Hartman <[email protected]>

Date: Wed Oct 15 16:02:53 2008 -0700

Linux 2.6.27.1

commit d23d43386311fde5f11e06c16d4185e94a8d6d06

Author: Steven Rostedt <[email protected]>

Date: Wed Oct 15 18:21:44 2008 -0400

disable CONFIG_DYNAMIC_FTRACE due to possible memory corruption on module unload

While debugging the e1000e corruption bug with Intel, we discovered

today that the dynamic ftrace code in mainline is the likely source of

this bug.

For the stable kernel we are providing the only viable fix patch: labeling

CONFIG_DYNAMIC_FTRACE as broken. (see the patch below)

We will follow up with a backport patch that contains the fixes. But since

the fixes are not a one liner, the safest approach for now is to

disable the code in question.

The cause of the bug is due to the way the current code in mainline

handles dynamic ftrace. When dynamic ftrace is turned on, it also

turns on CONFIG_FTRACE which enables the -pg config in gcc that places

a call to mcount at every function call. With just CONFIG_FTRACE this

causes a noticeable overhead. CONFIG_DYNAMIC_FTRACE works to ease this

overhead by dynamically updating the mcount call sites into nops.

The problem arises when we trace functions and modules are unloaded.

The first time a function is called, it will call mcount and the mcount

call will call ftrace_record_ip. This records the calling site and

stores it in a preallocated hash table. Later on a daemon will

wake up and call kstop_machine and convert any mcount callers into

nops.

The evolution of this code first tried to do this without the kstop_machine

and used cmpxchg to update the callers as they were called. But I

was informed that this is dangerous to do on SMP machines if another

CPU is running that same code. The solution was to do this with

kstop_machine.

We still used cmpxchg to test if the code that we are modifying is

indeed code that we expect to be before updating it - as a final

line of defense.

But on 32bit machines, ioremapped memory and modules share the same

address space. When a module would load its code into memory and execute

some code, that would register the function.

On module unload, ftrace incorrectly did not zap these functions from

its hash (this was the bug). The cmpxchg could have saved us in most

cases (via luck) - but with ioremap-ed memory that was exactly the wrong

thing to do - the results of cmpxchg on device memory are undefined.

(and will likely result in a write)

The pending .28 ftrace tree does not have this bug anymore, as a general push

towards more robustness of code patching, this is done differently: we do not

use cmpxchg and we do a WARN_ON and turn the tracer off if anything deviates

from its expected state. Furthermore, patch sites are statically identified

during build time so there's no runtime discovery of dynamic code areas

anymore, and no room for code unmaps to cause the hash to become out of date.

We believe the fragility of dynamic patching has been sufficiently

addressed in the development code via the static patching method, but further

suggestions to make it more robust are welcome.

Quote

October 23, 200817 yr

Author

That references the underlying source of the problem... but a fix to prevent the problem for manifesting itself was already in place by RC7.

One is the disease, one is the symptom.

Quote

October 23, 200817 yr

correct me if I'm wrong, but isn't the current fix like putting a bandaid on a wart?

Quote

October 23, 200817 yr

Author

Actually, a bandaid with a drop of salicylic acid on it will cure a wart.

It's more like putting a lock on the door, instead of shooting the burglar.

Quote

October 23, 200817 yr

correct me if I'm wrong, but isn't the current fix like putting a bandaid on a wart?

The current "fix" is a disabling of an option that causes the problem until the correct fix can be designed and developed.

Apparently, self-modifying code, intended to improve performance, went about modifying some of the network card's eeprom that had been mapped to a portion of memory. This caused the corruption.

The self-modifying feature is simply not enabled at this time in the current 2.6.27 release. If a custom kernel is compiled with the feature enabled, it will still be broken, and potentially trash memory once again until a true fix is coded.

Joe L.

Quote

October 23, 200817 yr

and potentially trash memory once again until a true fix is coded.

That's what I thought. So it's not really fixed, just disabled/bypassed.

Quote

Linux 2.6.27 kernel released 9 October 2008.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)