Jump to content

Can't complete reiserfsck --rebuild-tree


Recommended Posts

I'm at a dead end. One of my drives, I still don't know why, experienced some fs corruption.

 

I tried to launch reiserfsck --fix-fixable which fixed some of the errors but not all. It asked me then to run the procedure with --rebuild-tree option. Unfortunately the procedure stopped soon after passing Pass 0 phase indicating as a possible problem memory. I ran memtest but the report doesn't show any errors and so does the syslog. I even tried to mount a swap file as indicated in another thread but without success. Any time I run the test it always stops there (please check the screenshot).

 

At the current status the drive, since reiserfsck hasn't completed its process, shows up in the web console as unformatted. Can someone give me any advice on how to recover this drive before starting thinking I lost everything on it?

Screen_shot_2011-06-25_at_17_00_51.png.79d05e752f80e7cde57456a9c8511913.png

Link to comment
  • 1 month later...

After a while I come back asking for advices.

 

I indeed lost almost everything of the data stored on the disk that experienced the FS corruption I spoke about in the first post. Being unable to fix the corruption through the command line I had no other choice but buying a program (UFS explorer I think) and getting back all I could out of the process. But, surprise, everything the program recovered were just pieces of scrambled bites: although at first sight most of the files seemed intact, no one survived. For example, any time I played any of the movies I had stored there I could clearly see symptoms of corruption (scrambled frames, macro-blocks etc). I did not go through all of them but I ended up thinking I could not trust any of the data I got out of that disk at all.

 

Very fortunately, exception made for some photos, the disk contained data that could be recreated: software downloads and a hundred of ripped DVDs from my physical collection (although the latter will take longer to recover since the ripping process time imposed to me is unavoidable).

 

At first I could not find a reasonable cause for such corruption; I never experienced power outages or wrote too much on that disk, the data was pretty "stable". Just reads.

Reluctantly I got over it and started again by formatting the disk, running the parity check and finally writing some data back.

 

Everything seemed fine until, after having rebuilt my software collection, I wanted to update a program I had downloaded since it was a more recent version of the one I had installed.

Guess what? My Mac said the dmg archive was corrupted and could not open it. I started shivering again.

 

I made some tests. I redownloaded that file again, did a md5 check on it from my Mac terminal window and compared the hash with the unRAID disk one's. The hash was different. I deleted the latter and I made two copies, the first choosing as destination my now become creeping disk and the second for one of the other disks composing my unRAID array. For the latter the md5 hash was the same, the other for my fancy disk was different again. Only at the third try I got the same md5 hash.

 

Obviously, you realize, I cannot trust such disk anymore. I'm scared at death of using it and surely I will replace it as soon as possible. But I'm scared as well of knowing that unRAID doesn't provide the most secure way for safe-guarding your data. I thought I had to guard only against hardware failures like a disk that breaks up entirely but I wasn't contemplating the most subtle form of hardware/software failure a person can face like the one that drives to FS corruption; it's easy, I never experienced one before.

 

So now are my questions looking for advices:

 

Is my disk really faulty? How can I check that? Can it be done through a S.M.A.R.T. check? Do I have to hook the disk up to another machine and make more tests?

 

Is Lime Tech aware of this possible data security/integrity compromising problem and willing to address it for solutions, like choosing a less creepy FS like ZFS?

 

Is any third party slackware package or unRAID tailored solution available for avoiding such problem?

 

Looking forward to having some good pieces of wisdom from yours I thank you all in advance for your attention,

 

A now become unRAID scared user.

Link to comment

I thought it was the ReiserFS creator, not the filesystem itself that was creepy ;)

 

Do you do a correcting parity check or a non-correcting parity check?

Do you get parity errors when you do a parity check?

 

If you have been writing good data to that disk then the combination of parity and other disks should be able to rebuild onto a new disk and recover the data. The parity is created based on the data written, not what is actually stored (it doesn't read back the data when building parity, it bases parity on what was supposed to be written, even if it never gets stored on the disk properly). However, if you do correcting parity check and they always correct errors, then you've lost some if not all data.

 

It's also possible that memory, the network port or some other component is messing with your data and the disk is actually storing good data. Do non-correcting parity checks and see if you get parity errors appearing. Do not do a correcting parity check.

 

You could test the parity by disconnecting that drive and starting the array. Then see if the data on the disk appears to be healthy.

 

Peter

 

Link to comment

Interesting. Thanks for the clarification. If the problem really concerns that specific disk I guess I could get all my data back by simply replacing the drive and rebuilding the data from the parity without going through all what I went through...

 

I'll try that as soon as I get home and post back the results.

Link to comment

The parity check was finding lots of errors. I stopped it.

I removed the disk from the configuration. I started the array again.

Now I cannot access the share that's using the disk. It's just invisible.

I tried to access some files from /mnt/disk3, and copy them to another share. I got some SERIOUS error. I can't really tell what's going on. In case it could be useful I'm attaching my syslog file.

syslog.txt

Link to comment

I think the parity disk's data is corrupted as well but I would exclude the problem is related to the network port or the SATA controller. Otherwise, why would writing data on the other two disks of the array work? I don't understand.

 

The fact is that, as far as I can tell, I lost everything on that disk again and the parity can't help me rebuild a healthy disk. Just for curiosity I reran the reiserfsck --check routine and it confirmed my disk is definitively compromised: 1 found corruption can be fixed only when running with --rebuild-tree. Sigh...

reiserfsck.txt

Link to comment

Well ya, if you do a parity check with a bad data disk then the parity is updated to match the data disk.  I did mention that if you do a correcting parity check and find errors that you'll lose some or all of your data...

 

You don't remove the disk from the configuration, you just physically unhook it so it is missing.

 

Peter

Link to comment

The parity check was running without correcting errors as suggested by you when I said it was finding lots of errors. So I suppose the parity was untouched when I stopped the process, removed the disk, started the array and tried to see if that was working. Unfortunately it wasn't, it was everything corrupted as that of the regular disk counterpart.

 

I ran reiserfschk --rebuild-tree as suggested by dgaschk and, for the first time, it went through all the process - I suppose because the disk is containing 1/10 of what it contained before. A lost+found folder with a bunch of files inside it was created. I reassigned the disk to the array and started it again. Now it seems I'm not getting any strange errors anymore.

 

All my files, though, are corrupt, all of them. I suppose I have to redownload everything again.

Now I just made a test with a GoogleChrome.dmg file and the md5 after the copy did not change.

I'm facing a dilemma though: what should I do now?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...