unRAID Server Release 4.7 "final" Available

dgaschk · November 30, 2011

Does removing that one line fix the problem?

WeeboTech · November 30, 2011

To have the bug affect you, you would have to write to the exact same set of blocks (a stripe) as being calculated at that specific moment. As mentioned in some other thread, this bug has been in the "md" driver in all versions of linux for years.

It's pretty scary when you think about it silently corrupting something.

it disturbs me because the last issue I had happened to occur in the superblock near the start of the drive.

The superblock was also over 1GB in size because there were over 250,000 files on that drive.

Anytime you come back from an abnormal start up, writes occur when the filesystem is mounted and transactions are replayed.

Joe L. · November 30, 2011

To have the bug affect you, you would have to write to the exact same set of blocks (a stripe) as being calculated at that specific moment. As mentioned in some other thread, this bug has been in the "md" driver in all versions of linux for years.

It's pretty scary when you think about it silently corrupting something.

it disturbs me because the last issue I had happened to occur in the superblock near the start of the drive.

The superblock was also over 1GB in size because there were over 250,000 files on that drive.

Anytime you come back from an abnormal start up, writes occur when the filesystem is mounted and transactions are replayed.

I agree. It is exactly that set of simultaneous writes to the initial blocks that would trip the bug.

The parity check that results from a non-clean shutdown is actually a re-construction of parity if parity is out of sync... It could potentially clobber parity, but not data. A subsequent parity check should fix it.

The bigger issue is when re-constructing a replacement data drive. It is there that you can get into trouble.

Joe L.

abs0lut.zer0 · November 30, 2011

SOooo ??? ??? as a novice, the global moderators are starting to scare me...

is there ANY way to avoid this error or what are best practices.?

thanks

Joe L. · November 30, 2011

SOooo ??? ??? as a novice, the global moderators are starting to scare me...

is there ANY way to avoid this error or what are best practices.?

thanks

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

lionelhutz · November 30, 2011

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

I didn't, and I still saw the bug in action. 3 parity errors during the parity check after rebuilding.

Peter

abs0lut.zer0 · November 30, 2011

SOooo ??? ??? as a novice, the global moderators are starting to scare me...

is there ANY way to avoid this error or what are best practices.?

thanks

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

thanks will do that then

question: delete is same as a write ?

WeeboTech · November 30, 2011

SOooo ??? ??? as a novice, the global moderators are starting to scare me...

is there ANY way to avoid this error or what are best practices.?

thanks

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

thanks will do that then

lionelhutz points out that even that is not safe. It really needs to be resolved.

The whole md5sum database idea I had seems to be crucial now for verifying your file integrity.

abs0lut.zer0 · November 30, 2011

SOooo ??? ??? as a novice, the global moderators are starting to scare me...

is there ANY way to avoid this error or what are best practices.?

thanks

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

thanks will do that then

lionelhutz points out that even that is not safe. It really needs to be resolved.

The whole md5sum database idea I had seems to be crucial now for verifying your file integrity.

so when can we expect it implemented

MortenSchmidt · November 30, 2011

Don't write to a disk you are re-constructing or replacing until the re-construction is complete.

Like lionelhurtz, I was also not deliberately writing to the disk. Sabnzbd, Transmission etc. were shut down (otherwise my array won't stop), it was late at night so noone else in the house was accessing anything. I experienced this problem 2 separate times (first rebuild failed in parity checks, same with 2.nd rebuild, it only started working when I stripped my go script, reboot and did the rebuild again. To make things more confusing the rebuild also worked one time with a full go script.

I had the errors in the first 0.1%, so may be related to this superblock thing WeeboTech writes about - except I didn't have any 'abnormal' shutdown or startup. I just killed sabnzbs, transmission and twonkyserver, then stopped the array, selected the new disk and started up again.

Could this be provoked by any of the 'performance tweak' vm.dirty_xxx settings? I have these in my go script, partly inherited from Purko as I recall:

# Performance Tweaks
for i in /sys/block/[hs]d? ; do echo 128 > $i/queue/max_sectors_kb ; done 2>/dev/null
for i in /sys/block/[hs]d? ; do echo cfq > $i/queue/scheduler ; done 2>/dev/null
sysctl -w vm.min_free_kbytes=8192           # sl:2497 
sysctl -w vm.dirty_expire_centisecs=900     # sl:3000  tm:100 
sysctl -w vm.dirty_writeback_centisecs=300  # sl:500   tm:50 
sysctl -w vm.dirty_ratio=20                 # sl:10    tm:10 
sysctl -w vm.dirty_background_ratio=10      # sl:5     tm:5

More details and syslog from the issues I had are here: http://lime-technology.com/forum/index.php?topic=12884.msg132178#msg132178

WeeboTech · November 30, 2011

As I mentioned before, if a disk is offline for any reason.

There mere act of unmounting or mounting writes to the disk.

Any kind of journal transactions replayed writes to the disk. (thus updating the superblock at the start of the disk).

lionelhutz · November 30, 2011

Isn't this bug new, or at least something else has been changed so it manifests now? I have done a number of disk upgrades on earlier versions and I never had any post parity issues. The first disk upgrade on 4.7 caused 3 parity errors in the post parity check.

I am in the camp that says 4.7 as the "stable" release shouldn't have this bug.

dgaschk · November 30, 2011

This bug may be manifesting because the unRAID user base is growing.

bcbgboy13 · November 30, 2011

Or perhaps the statistics related to the incidence of ECC errors will manifest due to:

1. generally increased size of memory used in the newer systems;

2. increased time for performing these critical operation as the size of the HD has been increased tremendously and the possibility for bit-flips due to either natural occurrence or power glitches on non-UPS protected systems.

In that relation I should mention the very high percentage of bad hard drives experienced by some users contrary to limited snippets of industry info - but if one preclears a 2TB HD 3 times it will read 12TB of data (coincidentally this is exactly the statistical value for NRRE - 10E14 bits for the consumer level disks). And this procedure will take 3 and a half days - enough time for power glitches, bit-flips to manifest themself... Now imagine the persons claiming to perform a 6, 7 or more passes on their older hardware...without ECC and UPS....

Joe L. · November 30, 2011

Isn't this bug new, or at least something else has been changed so it manifests now? I have done a number of disk upgrades on earlier versions and I never had any post parity issues. The first disk upgrade on 4.7 caused 3 parity errors in the post parity check.

I am in the camp that says 4.7 as the "stable" release shouldn't have this bug.

This bug has been in every version of Linux in all the "md" drivers that have been in use for years and (apparently) just recently identified. I'm not even sure it is fixed in the most recent kernels in stock linux "md" driver. If you've run any version of Linux "raid" driver in past years, you too had the same potential to hit this bug.

unRAID has had this code (and the bug it inherited) from its very first 1.050930 release version until it was fixed in the recent 5.0beta series.

I think it is showing itself more frequently because the hardware is faster, and disks are bigger, the user-base of unRAID is larger, and we are learning more what to look for.

WeeboTech · November 30, 2011

I think it is showing itself more frequently because the hardware is faster, and disks are bigger, the user-base of unRAID is larger, and we are learning more what to look for.

I would also add, Larger arrays and higher chances of a failure and the need to rebuild a failed disk.

Plus the automation of emhttp in that it automounts the disks, thus creating a write immediately even if a parity sync or update is going on.

I would love to have "start/stop" array and "mount/unmount" array. as separate options.

If a disk is disabled, it requires a start and a mount that way you can decide how you want to handle it.

Zaxxan · December 4, 2011

Tom has disclosed a major bug in 4.7 http://lime-technology.com/forum/index.php?topic=13866.0

I believe I've encountered this bug see here: http://lime-technology.com/forum/index.php?topic=12884.msg132178#msg132178

So, where is 4.7.1? It's been 4½ months now, and honestly that's just about 4½ months too long to fix a bug of this severity in the "stable" release branch. I have a drive that's starting to reallocate sectors now and want to rebuild it. Not a happy camper here.

Tom hasn't posted in this thread since July 13th so it doesn't look like he has any interest in this version.

Joe L. · December 4, 2011

Tom has disclosed a major bug in 4.7 http://lime-technology.com/forum/index.php?topic=13866.0

I believe I've encountered this bug see here: http://lime-technology.com/forum/index.php?topic=12884.msg132178#msg132178

So, where is 4.7.1? It's been 4½ months now, and honestly that's just about 4½ months too long to fix a bug of this severity in the "stable" release branch. I have a drive that's starting to reallocate sectors now and want to rebuild it. Not a happy camper here.

Tom hasn't posted in this thread since July 13th so it doesn't look like he has any interest in this version.

I'd be willing to guess it is the one he is currently selling if you order a flash drive with unRAID installed.

I do agree though... The two known bugs should be fixed in a 4.7.1 patch release.

(and that should have occurred months ago when the initial parity/disk-reconstruction bug was discovered and fixed in 5.0beta8)

JackBauer · December 4, 2011

A while ago I would have said we should wait it out - get 5 released, then fix 4.7.1.

But with the linux kernel problems - it might make sense to take a detour to 4.7.1 while the kernel problems work themselves out.

(And I REALLY want 5 - have two 3tb's waiting to be used, and don't want to until 5.0 beta is generating very few issues)

glave · December 5, 2011

Is it possible to downgrade to 4.7 from 5.0 beta (14)?

NFS issues and drives not spinning down have me wanting to back out.

dgaschk · December 5, 2011

Yes it is. Just use the backup of your flash made before the upgrade. Or reverse the install instructions.

glave · December 5, 2011

Reverse the install instructions? When I upgraded all I did was replace the bzimage and bzroot, reboot, and then do the permissions setup.

Unfortunately, I had a very narrow sighted moment and did not backup the original flash.

Safiraya · December 11, 2011

marcusone · January 24, 2012

Thought I'd bump this up and see if there are any updates on a 4.7.1 release?

abs0lut.zer0 · January 25, 2012

Thought I'd bump this up and see if there are any updates on a 4.7.1 release?

+1

unRAID Server Release 4.7 "final" Available

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation