unRAID6-beta7/8 POSSIBLE DATA CORRUPTION ISSUE: PLEASE READ


limetech

Recommended Posts

  • Replies 239
  • Created
  • Last Reply

Top Posters In This Topic

People are monitoring this thread actively for real updates (for obvious reasons). As such please stay strictly on topic all.

 

thanks

 

Just a reminder. People are actively subscribing to this thread to get JUST the relevant info. All other OT will be actively removed from this point on.

 

 

Link to comment

Regarding current status of beta 9, I think it's safe to say we have the bug squashed, but Tom has been exchanging e-mails with the maintainer of REISERFS as well as other Linux devs on this topic to ensure the fix applied is solid and covers all the bases.  We are also trying to ascertain the extent of the bug and how users might go about detecting if they may be affected.  The unfortunate nature of this bug is that it was a silent corruption in that the system had know idea that files were actually being corrupted while it was happening.

 

The key parties on the matter seemed to believe that only small files should have been affected (e.g. metadata predominantly) but some users here are seeming to report that other files have had some corruption.

 

The reason for the delay on releasing beta9 is that before we roll out a patch to fix such a nasty bug, we want to make sure that it is truly solid.

 

Providing updates takes time away from that task, so we're back to it now...

 

Also, it's worth noting:  users that attempt to circumvent the rules of our forum after being banned will be subsequently re-banned.  This is not up for debate or discussion.  Thanks.

Link to comment

The key parties on the matter seemed to believe that only small files should have been affected (e.g. metadata predominantly) but some users here are seeming to report that other files have had some corruption.

 

But does that mean the metadata relating to filenames? In a way, that's worse, as you can't "see" the corruption (or can you?) Having filenames swapped over everywhere would be a disaster.

Link to comment

What is the race condition of XFS? I am looking but out side of the thread I cant pin it down.

 

Thank you Thornwood.

 

Several of us were having issues running rsync's migrating data to XFS volumes, and it would intermittently crash the server.  Tom indicated he found a race condition using XFS.  He said it wasn't XFS itself, but some interaction with unRaid and XFS.

Link to comment

The key parties on the matter seemed to believe that only small files should have been affected (e.g. metadata predominantly) but some users here are seeming to report that other files have had some corruption.

 

But does that mean the metadata relating to filenames? In a way, that's worse, as you can't "see" the corruption (or can you?) Having filenames swapped over everywhere would be a disaster.

 

When I said metadata, I meant metadata as in small files typically are for things like application data such as Plex's Media Library, etc.  Large files as in media content shouldn't be as affected by this.

 

The worst part of this bug is that it's a silent corruption in that there is no identifying it with reiserfsck.

Link to comment

I was doing some testing and found a small number of corruptions. I had copied a lot of LARGE files (~10TB) of 4GB+ size so this was not the case mentioned of small files.  When I detected corruption a lot of the time it was just be a few odd bytes in a file of 4GB, and in media files these would probably not get noticed.  A couple of other times there was more serious corruption that would make the file unusable (this seemed to typically be a case of part of the file getting zeroed out).

Link to comment

The key parties on the matter seemed to believe that only small files should have been affected (e.g. metadata predominantly) but some users here are seeming to report that other files have had some corruption.

 

But does that mean the metadata relating to filenames? In a way, that's worse, as you can't "see" the corruption (or can you?) Having filenames swapped over everywhere would be a disaster.

 

When I said metadata, I meant metadata as in small files typically are for things like application data such as Plex's Media Library, etc.  Large files as in media content shouldn't be as affected by this.

 

The worst part of this bug is that it's a silent corruption in that there is no identifying it with reiserfsck.

 

OK, thanks. You might want to make that crystal clear in future posts. Some users have reported file pointers/actual ReiserFS metada being corrupted, although we don't know whether that's coincidental: http://lime-technology.com/forum/index.php?topic=35161.msg327479#msg327479

 

Any news on whether the bug affects existing files on the drive -- i.e. files not being written or overwritten?

 

Cheers,

 

Neil.

Link to comment

The key parties on the matter seemed to believe that only small files should have been affected (e.g. metadata predominantly) but some users here are seeming to report that other files have had some corruption.

 

But does that mean the metadata relating to filenames? In a way, that's worse, as you can't "see" the corruption (or can you?) Having filenames swapped over everywhere would be a disaster.

 

When I said metadata, I meant metadata as in small files typically are for things like application data such as Plex's Media Library, etc.  Large files as in media content shouldn't be as affected by this.

 

The worst part of this bug is that it's a silent corruption in that there is no identifying it with reiserfsck.

 

OK, thanks. You might want to make that crystal clear in future posts. Some users have reported file pointers/actual ReiserFS metada being corrupted, although we don't know whether that's coincidental: http://lime-technology.com/forum/index.php?topic=35161.msg327479#msg327479

 

Any news on whether the bug affects existing files on the drive -- i.e. files not being written or overwritten?

 

Cheers,

 

Neil.

 

Yes. As has been stated clearly several times already, it can impact any and all files a on a drive that has had writes performed on a ReiserFS.

Link to comment

In the middle of checking all of my md5 checksums, and what I've found so far is that none of my media files (mkv) are corrupt.  However, it looks like most of my .nfo, .tbn, and .jpg are mismatching the stored checksum.  I can't however remember if I've ever told xbmc to overwrite them during a backup...  But at least none of the mkv's are damaged.  I can always just delete all of the auxillary files, and xbmc will recreate them.

 

That being said, the checking process will probably take me days if not weeks to complete.  But things look promising so far.

Link to comment

ok, beside the fact that it is a potentially big problem - it says also sth along the lines... there is a possibility that...

obviously you want to do what was mentioned (don't move data etc...). but before we are starting a mass hysteria here - how many of you actually discovered problems at this point?

everybody? some of you? just 2 or 3 ppl?

i have to say, i can't find any problems yet - but i have done also just spot checks so far. so i have some hope the bug passed me.

i highly appreciate tom's and LT's staff working in overdrive to provide a solution for all of us using the latest beta versions.

but back to my question - who actually has so far real problems, which can be traced back to the issue?

we might blow sth out of proportion here at the moment. at least i hope we do and end of the day the damage is minimal for everybody, or non-existent would be even better.

best wishes in this regards to everybody!

looking forward to b9 at the moment...

 

cheers, L

Link to comment

- how many of you actually discovered problems at this point?

everybody? some of you? just 2 or 3 ppl?

Hi, Lars - here's my experience so far.

 

I had a few disks involved... I had just updated from 5.05 to 6.0b8 and moved 350GB of data from one drive to three others, cleaning it off so I could switch my first drive to XFS.

 

I don't have MD5SUM hashes of those files, but I've checked a bunch of music files with the FooBar2000 validation plugin, used 7-Zip to test a bunch of archive files, browsed large folders of photos, and manually checked the start and end of a bunch of videos.  So far, I am not seeing any corruptions.

 

Although we would need hashes made prior to unRAID 6.0b7 (which I don't have) to rule out scrambled files, based on what I've read, I suspect the issue will hit hardest those folks who were running applications that read and write to a bunch of small files.  LT reported their source code and compiler work files got truncated or otherwise damaged, and other reports suggest damaged fan art and other meta data files from scrapers that download fan art for XBMC or the like.  It sounds like large files would most likely be damaged at the start or end of the file, so that's what I've been testing, and I haven't seen any corruptions.

 

I suspect some reports of corruptions in larger files may have happened long ago, but they are just being discovered due to increased scrutiny.

 

I'm keeping my fingers crossed, and checking those files that I can't easily recreate.  I also reverted my server back to 5.05 until 6.0 Stable is released, and I set up a test server for the beta version, as I should have all along.

 

-- stewartwb

Link to comment

I made md5 sums on all my files pre beta 7.  Thus far, I have only seen md5 failures on .nfo, .tbn, and .jpg files generated by xbmc.  However I cannot rule out the fact that I may have overwritten those files in question (and thereby invalidating the md5)

 

Thus far, I have had no corruptions in media files except for 1-2 files on a certain drive, which I knew had some corruptions on it from way before beta 7, so I'm not surprised.

 

Since my share levels are set to most-free, all of my drives are written to, so all of them are possibly at risk, but thus far my results are promising.  I'm personally not that worried about it.  If one or two movies out of 5000 are corrupted i'm not going to lose sleep over it.  Yes I've stopped writing to the the server, and am writing to another one in the meantime which has 5.05 on it, but I'm not going to bother reverting back to beta 6.  My docker drive is btrfs, so all I had to do was change the parameters being passed to the containers and all is well.

 

I'll probably find out in a week or so what my final stats are on the md5 check.  If nothing else, at least this bug is getting me to check.  I've been meaning to for quite a while (as I knew that I had corruption on one of the drives pre beta7), but just never got around to it.  So, in that respect, this is a good thing.

 

The best thing that will come out of this is that people are now thinking about md5 / sha checksums.

Link to comment

I made md5 sums on all my files pre beta 7.  Thus far, I have only seen md5 failures on .nfo, .tbn, and .jpg files generated by xbmc.  However I cannot rule out the fact that I may have overwritten those files in question (and thereby invalidating the md5)

 

Thus far, I have had no corruptions in media files except for 1-2 files on a certain drive, which I knew had some corruptions on it from way before beta 7, so I'm not surprised.

 

Since my share levels are set to most-free, all of my drives are written to, so all of them are possibly at risk, but thus far my results are promising.  I'm personally not that worried about it.  If one or two movies out of 5000 are corrupted i'm not going to lose sleep over it.  Yes I've stopped writing to the the server, and am writing to another one in the meantime which has 5.05 on it, but I'm not going to bother reverting back to beta 6.  My docker drive is btrfs, so all I had to do was change the parameters being passed to the containers and all is well.

 

I'll probably find out in a week or so what my final stats are on the md5 check.  If nothing else, at least this bug is getting me to check.  I've been meaning to for quite a while (as I knew that I had corruption on one of the drives pre beta7), but just never got around to it.  So, in that respect, this is a good thing.

 

The best thing that will come out of this is that people are now thinking about md5 / sha checksums.

 

I've always appended checksums to my files and I think you are forgetting that XBMC is using the filename of the MKV to create the other files including the checksum you had appended to the end of the MKV... so those files are never going to have a valid CRC32 as it's not their CRC, it's the MKVs.

 

 

Link to comment

Fwiw, I have checksums of my important data that where created before beta 7 or 8 and whenever beta 9 comes out I'll start running the checks. I'll report back with my findings. I upgraded to beta 8 right after it was released so my server is (unfortunately) a pretty good candidate to run the checksums to see if there was any corruptions. Another positive is I verified all my checksums about a month ago so if there are any corrupt files it is highly likely to be from the beta 8.

Link to comment

 

I've always appended checksums to my files and I think you are forgetting that XBMC is using the filename of the MKV to create the other files including the checksum you had appended to the end of the MKV... so those files are never going to have a valid CRC32 as it's not their CRC, it's the MKVs.

 

I have separate md5's for all of the other files.

Link to comment

I have only checked my music library because of the ease with foorbar. I did move my music library from one disk to another while on beta8 but before this announcement. If the 19,000+ songs I have over 6000 now report minor issues, or unrecoverable issues. (about 5700 minor, and 300 unrecoverable).

 

I am going to likely be a higher risk case as I was shuffling data between multiple drives for the week leading up to this report and I likely moved 8-10TB of tv, movies and music. Thankfully I didn't move any pictures around as I would have been devastated to lose those (I backup to CrashPlan, but assume any corruption in pictures would have just replicated to my backup).

 

I have not started looking at video files yet, but have noticed that while I moved a ton of TV shows around using MC it's left phantom folders on the source drive all over the place. I don't know if this means specifically that there was an issue - but this is not common. Usually moving this way is very clean.

 

I would say the risk is obviously relational to the amount of data movement you've done over the last 6 weeks. Even no movement has risk, but it grows and grows the more you move.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.