unRAID6-beta7/8 POSSIBLE DATA CORRUPTION ISSUE: PLEASE READ


limetech

Recommended Posts

Thanks for this WeeboTech!, one question though, will the actual file datestamp change if filesystem corruption happens?, im assuming this is useful only for checking newly created files, not for identifying files "modified" i.e. corrupted (truncated/garbage)

 

There's no way to tell if a different file is corrupted.

The commands provided will only show files that are modified at the application layer. (since some point in time)

 

At the very least this provides a list of suspect files for review.

There's no way to tell about other files unless you have stored hash values somewhere.

 

You can look for empty files with

find /mnt/disk* -empty -ls

 

This may or may not reveal truncated files. It's unknown how this affects the metadata.

 

I suppose we could write some ftw64() function that stats every file, then opens and reads it in entirely to verify the size of the file.  I don't know if this will work without actually having a corrupted filesystem to work with.

 

Maybe someone else can chime in there.

Link to comment
  • Replies 239
  • Created
  • Last Reply

Top Posters In This Topic

Guys, if you have things you want me to test for check for corruption on test systems here, let me know what you want me to do.  Right now I'm testing two systems with beta9 to make sure the big bug is squashed.

 

If you KNOW a set of file(s) ARE corrupt, try doing a cat & cp of the file to another file to /tmp

 

cat corruptfile > /tmp/corruptfile.cat

and do a

cp corruptfile /tmp/corruptfile.cp

 

Basically copying the file

 

Then do ls -l and stat on each file

 

ls -l corruptfile /tmp/corruptfile.cat /tmp/corruptfile.cp

stat  corruptfile /tmp/corruptfile.cat /tmp/corruptfile.cp

 

Let's see if a variance of size or data read can be used as a determining factor.

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

Link to comment

Guys, if you have things you want me to test for check for corruption on test systems here, let me know what you want me to do.  Right now I'm testing two systems with beta9 to make sure the big bug is squashed.

 

 

various file system mix/match  (rfs/brtfs cache with rfs array)

mover script

cache-less writes to the rfs array

parity corrections? (not sure how you'd invoke that)

 

I have no idea what your test plans look like but those jump out at me.

 

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

 

 

XFS is not affected by the bug.

RFS is, and anything written to RFS after the beta dates are suspect.

If a whole drive was restored to RFS, after any of the respective beta dates, then all data is suspect.

 

 

How did you offload and reload the data?

With Teracopy it has the option to CRC the data after it's written.

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

 

And, it's still unanswered as to whether writing new files to a drive corrupts existing files* on the drive. That needs to be tested (by LT) and made clear ASAP!  :-\

 

 

*NOT overwriting files.

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

 

 

XFS is not affected by the bug.

RFS is, and anything written to RFS after the beta dates are suspect.

If a whole drive was restored to RFS, after any of the respective beta dates, then all data is suspect.

 

 

How did you offload and reload the data?

With Teracopy it has the option to CRC the data after it's written.

 

I don't think it affected me.  I did this procedure before writing to the array under beta8.  I skipped beta7 because of the Xen bug.

 

My procedure was to offload disk at a time to a NTFS drive using Windows, changing that disk's format, then writing back to that disk.  All this done with beta8 on the server.  So I only read RFS files in beta8.

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

 

And, it's still unanswered as to whether writing new files to a drive corrupts existing files* on the drive. That needs to be tested (by LT) and made clear ASAP!  :-\

 

 

*NOT overwriting files.

 

I don't think it's realistic to expect LT to have these answers.  They need to work on getting beta9 tested and released so the bug is no longer there.

Link to comment

And, it's still unanswered as to whether writing new files to a drive corrupts existing files* on the drive. That needs to be tested (by LT) and made clear ASAP!  :-\

 

*NOT overwriting files.

 

 

We're still trying to learn more about this issue and will share as we discover, but for now, here's what we know:It's more likely to affect small files than larger ones, but writing to the filesystem in general can potentially cause corruption to other files on the device.  That is why we are recommending that everyone stop writing to their reiserfs disks until beta 9 can be released with the appropriate fix.
Link to comment

Guys, if you have things you want me to test for check for corruption on test systems here, let me know what you want me to do.  Right now I'm testing two systems with beta9 to make sure the big bug is squashed.

 

Both corruptions of which I'm aware have been within my appdata share (associated with docker).  One was a truncated source file in an LMS plugin.  The second was a corrupted (or missing?) .conf file for a deluge plugin.

 

The first one was, almost certainly, corrupted as I copied it from my previous LMS configuration into appdata.  The second one was a file created under beta8 and I believe that it had existed but became corrupted after creation.  Both were relatively small text files.

 

I have not encountered any problems with larger media files (mainly .mkv) which were downloaded by deluge and subsequently processed by mkvmerge.

 

Does this help to identify where to concentrate your testing?

 

Anyway, I'm off to bed now .. perhaps you'll be close to releasing by the time I'm up and about in the morning!

Link to comment

WeeboTech's solution is a good one, but it assumes that the array has been static and only new files were written after the potential bug releases.

 

I have off-loaded and re-loaded all my drives to convert to XFS, so his script would not help me because all the files on my array were written after the beta7/8 release date.

 

And, it's still unanswered as to whether writing new files to a drive corrupts existing files* on the drive. That needs to be tested (by LT) and made clear ASAP!  :-\

 

 

*NOT overwriting files.

 

I don't think it's realistic to expect LT to have these answers.  They need to work on getting beta9 tested and released so the bug is no longer there.

 

Per my earlier post:

 

It'd be nice if you could do some extensive testing (even if it's after Beta 9 is out) so you can verify that corruption of other files on the device happens. If it doesn't, that'd save a lot of time.

 

I'd be happy with "We don't know yet", if they don't know. It's a bit ambiguous at this point.

Link to comment

Gosh, this is big. My HTPC has been hanging and I could not understand why! I spent more than 16 hrs trying to debug the software on the HTPC over the last couple of days. As a last resort I was going to blame Windoze and restore/reformat my machine. I understand this is beta and I am not pointing finger on that but if a bug is found as big as this, PLEASE LET ALL USERS KNOW IMMEDIATELY. I think I am a registered user and get occasional emails from Limetech but haven't seen any in the last couple of days. I cannot wait to get back home and stop all writes. How do I make the system read-only and not write data?

 

Limetech has been great in communicating and working with users but please don't drop the ball on communicating since the potential recovery and fixes could be significant if someone doesn't check here regularly.

Link to comment

OMG.... I have been copying TERRABYTES of data from REISERFS to REISERFS to make room for migrating disks to XFS :-(

 

Ok, relax...breathe...  We will get through this together.  I am in a similar situation with multiple systems.

 

What type of data do you have?  Is this just video content or photos + documents + other things?

 

We're trying to get more information on what this bug directly affects, but large files seem to be unaffected so far.

Link to comment

Gosh, this is big. My HTPC has been hanging and I could not understand why! I spent more than 16 hrs trying to debug the software on the HTPC over the last couple of days. As a last resort I was going to blame Windoze and restore/reformat my machine. I understand this is beta and I am not pointing finger on that but if a bug is found as big as this, PLEASE LET ALL USERS KNOW IMMEDIATELY. I think I am a registered user and get occasional emails from Limetech but haven't seen any in the last couple of days. I cannot wait to get back home and stop all writes. How do I make the system read-only and not write data?

 

Limetech has been great in communicating and working with users but please don't drop the ball on communicating since the potential recovery and fixes could be significant if someone doesn't check here regularly.

 

We posted in here about this bug the second that we discovered it and immediately began working on a fix.  We also have been responding in here the best we can while we work through the bug and the fix.  Keep in mind, we this isn't affecting unRAID 5 users and this is a beta.  It is everyone's responsibility participating in a beta to keep checking here for updates on it.  We are definitely thinking about ways to change our beta programs in the future to better communicate with testers separately from our stable users.

Link to comment

neilt0, It's already been stated that corruption to OTHER files occurs.

 

Nope: This states it's a potential problem: http://lime-technology.com/forum/index.php?topic=35161.msg327143#msg327143

 

I know other people have reported file corruption, but those reports are vague and are conflating overwriting with corrupting other non-written files.

 

It'd be nice to have that tested, once the LT peeps are done with Beta 9.

Link to comment

peter, quick question before you hit the hay, when you say the file was truncated was the file 0 bytes or reduced size?, might help identify corrupted files if the corrupted files are of 0 byte length.

 

It was reduced in size (the end was missing), not completely empty.

 

 

was the size reported via ls the same size reported by the application that read it?

vi or something else?

 

 

I'm trying to determine if the metadata or stat information is intact enough to use for a comparison.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.