Jump to content

unRAID6-beta7/8 POSSIBLE DATA CORRUPTION ISSUE: PLEASE READ


limetech

Recommended Posts

It's more likely to affect small files than larger ones, but writing to the filesystem in general can potentially cause corruption to other files on the device.  That is why we are recommending that everyone stop writing to their reiserfs disks until beta 9 can be released with the appropriate fix.

 

Uh-oh. That's bad. Checking files written since Beta 8 was installed is fairly straightforward, albeit time-consuming.

 

It'd be nice if you could do some extensive testing (even if it's after Beta 9 is out) so you can verify that corruption of other files on the device happens. If it doesn't, that'd save a lot of time.

 

If it is the case, how can we tell if other files are corrupted? I take it timestamps on those files won't change?

Link to comment
  • Replies 239
  • Created
  • Last Reply
It'd be nice if you could do some extensive testing (even if it's after Beta 9 is out) so you can verify that corruption of other files on the device happens. If it doesn't, that'd save a lot of time.

 

Corruption of 'other' files is definitely mentioned in the original bug report.  My guess, but it's only a 'gut' feeling, is that truncation occurs on files as they're written, while garbage at the beginning of files is the corruption of a file not being written.

 

If it is the case, how can we tell if other files are corrupted? I take it timestamps on those files won't change?

 

The only way to tell is to check the contents of each file.  There is no tell-tale change at the filesystem level.

Link to comment

It'd be nice if you could do some extensive testing (even if it's after Beta 9 is out) so you can verify that corruption of other files on the device happens. If it doesn't, that'd save a lot of time.

 

Corruption of 'other' files is definitelymentioned in the original bug report.  My guess, but it's only a 'gut' feeling, is that truncation occurs on files as they're written, while garbage at the beginning of files is the corruption of a file not being written.

 

I took "other" files to be referring to other files on the device not being written. Which is much worse. I can deal with checking all the files written since Beta 8. Checking all the other files on the device (drive?) would be impossible.

Link to comment

im assuming a md5 check of all files would be able to detect any corruption (thinking lesson learned for me), im also assuming that running a parity check would not be able to detect any corruption introduced as the corruption would also of been written to the parity disk and thus no inconsistency would show up?, im basically after two things, firstly do i have any corruption on any of my files, and secondly (more difficult to achieve) which files are corrupt if any (no md5 check currently present), ive resigned myself to the fact that i probably will never know either of these until i come across a dodgy file during read.

 

 

Link to comment
I took "other" files to be referring to other files on the device not being written. Which is much worse.

 

Exactly! By my guess, it is files being written which are susceptible to truncation - I have found one such corruption on my system - and existing files on the drive being written to which are at risk of having the data at the beginning of the file overwritten.

 

I can deal with checking all the files written since Beta 8. Checking all the other files on the device (drive?) would be impossible.

 

Quite so ... unless you have recorded a hash key for each file.

 

I suspect that we'll all be very keen to try jbartletts bitrot script in case such an event should overtake us again!

Link to comment

I took "other" files to be referring to other files on the device not being written. Which is much worse.

 

Exactly! By my guess, it is files being written which are susceptible to truncation - I have found one such corruption on my system - and existing files on the drive being written to which are at risk of having the data at the beginning of the file overwritten.

 

OK, well let's distinguish between existing files being overwritten (easy to check, they will be timestamped) and existing files not being overwritten. If the latter are being corrupted, that's a much bigger problem. That needs clarifying.

 

While I think of it, if the latter is in effect, could deleting files on a device potentially cause corruption on the device (drive)?

 

I'm going through all my drives to check the last time they were written to and one has not had new files written to it, but has had files deleted. Do I need to worry about the other files on that drive?

Link to comment
OK, well let's distinguish between existing files being overwritten (easy to check, they will be timestamped) and existing files not being overwritten. If the latter are being corrupted, that's a much bigger problem. That needs clarifying.

 

Well, as I said, corruption of existing files was discovered by one of those who made the original bug reports.

 

While I think of it, if the latter is in effect, could deleting files on a device potentially cause corruption on the device (drive)?

 

I'm going through all my drives to check the last time they were written to and one has not had new files written to it, but has had files deleted. Do I need to worry about the other files on that drive?

 

From my understanding - No, not if you simply delete the file without obliterating the data.  The corruption occurs during the data write, not the filesystem management.

Link to comment

OMG.

I've used BTSync to sync all my data from my main server (5.0.5) to my backup/test server ( 6beta8 ).

Unfortunately, I've used two-way sync.

As soon as I saw this thread I've shutdown my backup server.

Is there a way I can check my data on the main server to find corruptions?

Thanks...

Anyone?

Does running a parity check on my 5.0.5 server will detect corruption?

OMG.

I've used BTSync to sync all my data from my main server (5.0.5) to my backup/test server ( 6beta8 ).

Unfortunately, I've used two-way sync.

As soon as I saw this thread I've shutdown my backup server.

Is there a way I can check my data on the main server to find corruptions?

Thanks...

 

Link to comment

Running parity check should not detect the corruption since all write operations on the drive are reflected in the parity drive. This includes the corrupt writes.

 

The only ways at detecting the file corruption is verifying files by checksum and/or manual inspection.

Link to comment

has anybody begun a md5 checksum of their files and if so has anybody found any mismatches?, im just trying to get a feel for the number of corrupt files and the file size of the files being affected, as tom posted earlier stating it generally affects "smaller" files.

Link to comment

how about drive outside the array? I have a disk mounted from my go file (my VM disk)

I suppose it will be affected?

 

If it is formatted ReiserFS, yes.  This bug has nothing to do with unRAID, or the array.  It is, purely and simply, a problem for any ReiserFS device, when used with a 3.16.0 - 3.16.2 Linux kernel.

Link to comment

Looks like I'll be running a checksum verify once b9 is released :o

 

I've been working on a script called "bitrot" that generates a SHA256 key fore files and stores it as an extended attribute on the file itself. I've got my media scanned, I'll be running a verify after beta 9. I'll try to get the script posted soon, was working on adding nice-to-have features.

 

This is brilliant, Had I known this was feasible I would have written my program to use this in addition to the SQLlite storage.

Can you start a new thread on this subject matter and explain how you are doing it?

 

I'm almost done with my SQLlite locate/stat/hash database.

Since I walk a tree and have the ability to calculate the hash, storing it in two places seems like a brilliant idea.

 

I bet there is a way we can export and/or import regular hashum type files to/from the attributes.

I'm not just educated on the extended attribute part yet.

Anyway the thread I've been working from is here.

 

RFC: MD5 checksum/Hash Software

http://lime-technology.com/forum/index.php?topic=34988.msg325400#msg325400

Link to comment

It's made very clear in the OP that this is beta software and some features won't work, and there may be bugs.  Everyone that installed a beta is taking on the risk for themselves.  If you can't stomach these kinds of problems, you should not be installing a beta, especially on a production machine where loss of data is a problem.

 

It looks to me that some have become a little complacent because LT does such a fabulous job at wringing out issues before a beta is released, and confidence levels are pretty high.  While this is good for those beta testing, beta software should not be used in a situation where loss of data would be a problem.

 

LT has to deal with the new versions of Linux that are not fully wrung out.  They want to include the latest features of Linux and deliver the best product possible, but there is the risk like in this case where a bug creeps in that they don't have any control over.

 

I get very concerned when I read in the beta posts when people ask "Is this beta safe" and some answer "No problem".  No beta is ever "safe" and I don't think this recommendation is appropriate, especially when I see newer members of the forum ask.  They should not be installing beta software unless they just want to play and check the features.

 

I think LT should re-think the beta program and close it down to the general public and have a select few beta testers.  I know I am opening myself to some criticism, but as a plugin maintainer, I find it extremely difficult to work on plugins for a beta and then get all sorts of support questions that are not appropriate for beta software.

Link to comment

It's made very clear in the OP that this is beta software and some features won't work, and there may be bugs.  Everyone that installed a beta is taking on the risk for themselves.

 

I think everyone understands this, but a bug at the file system level is an entirely different issue.  I imagine that even the unraid developers were shocked at a bug being introduced into the file system.  BTRFS and XFS, sure, but ReiserFS.  Wow.  Bugs in an UnRaid beta are a given but ReiserFS is an entirely different story.

Link to comment

IMHO the only bad thing about this is we currently have no way for users to identify corrupt files. A bug is a bug and thats life but not being able to define the scope of the problem is not ideal.

 

At the very least we should construct a find command that lists all the files that have been written to during the b7/b7 cycle and those that havent.

 

Also on the assumption that most people "bytes" are video related we could potential get clever with some generic video file checkers (and any other common file types). << these would be useful tools regardless

Link to comment

It's made very clear in the OP that this is beta software and some features won't work, and there may be bugs.  Everyone that installed a beta is taking on the risk for themselves.

 

I think everyone understands this.

 

I am not directing my comments at those here that understand what they are doing.  I wanted to take this opportunity to suggest that the beta testing be limited. 

 

When I see a post "How do I upgrade to beta8?"  Then they have no business running the beta.  While I recognize most beta testers are sophisticated users, there are those that have no business running a beta version.

 

I maintain some plugins and spend too much time supporting those who shouldn't be using beta software.  I see LT and forum members spending too much time in the forums helping new/inexperienced users with a beta.

 

With that being said, I understand that anyone potentially affected with file issues from the RFS bug would like some assurances that they don't have any corrupted files.  I'm all for that if it can be done.

Link to comment

At the very least we should construct a find command that lists all the files that have been written to during the b7/b7 cycle and those that havent.

 

With touch -t and -newer you can use any time as of CCYYMMDDHHMM to find modified files.

Here's what I quickly constructed grabbing the release dates from the first post.

 

This puts them in crlf dos format so you can look at them in note pad.

Move them to your favorite share or flash and review with your favorite editor.

 

touch -t 201408230000 /tmp/beta7.timestamp

find /mnt/disk3  -newer /tmp/beta7.timestamp -fprintf /var/log/beta7.filelist.txt  "%h/%f\r\n"

 

touch -t 201408310000 /tmp/beta8.timestamp

find /mnt/disk3  -newer /tmp/beta8.timestamp -fprintf /var/log/beta8.filelist.txt  "%h/%f\r\n"

 

wc -l /var/log/beta*.filelist.txt

 

If you want you can add -ls for a fuller review then pipe that through todos.

replace disk3 with your associated disks or * if you want a list on all your disks.

 

find /mnt/disk3 -newer /tmp/beta8.timestamp -ls | todos > /var/log/beta8.filelist.txt

Link to comment

Archived

This topic is now archived and is closed to further replies.


×
×
  • Create New...