October 21, 200916 yr I've been running 4.5-beta7 for a few days & now am getting "REISERFS error" in my syslog. I do not have a clue on how to fix this problem. Any help would be appreciated. Thanks in advance... Phil
October 21, 200916 yr I've been running 4.5-beta7 for a few days & now am getting "REISERFS error" in my syslog. I do not have a clue on how to fix this problem. Any help would be appreciated. Thanks in advance... Phil Look here for how to perform the file-system repair. http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems All the examples in the wiki show "md1" (corresponding to disk1). Your disk with the errors is "/dev/md2" Follow the directions and you should be able to correct the problem easily. The series of commands for you would be (the first three only need to be done once. They stop SAMBA from keeping the disks busy and un-mount the disk being checked. If you have any other add-on programs that are keeping disks busy they might need to be stopped to allow the disk to be un-mounted.): cd samba stop umount /dev/md2 reiserfsck /dev/md2 When reiserfsck prompts, answer "Yes" (Capital "Y", lower case "es") This above command does not do any repair, but will alert you of the proper "reiserfsck" command to run next. It will probably suggest you re-run it as reiserfsck --fix-fixable /dev/md2, but let it guide you. If you have questions, post its output here and you can get guidance. You only need to stop samba once, and un-mount the drive once. Once the reiserfsck repair is complete, you must re-mount the disk and re-start samba as described in the wiki. Joe L.
October 21, 200916 yr Author Thanks Joe L. for the quick reply. Can always count on you.. You got me in the right direction.
October 21, 200916 yr Author Everything came out fine. Only had one file in lost+found. Thanks for the help Joe L.. It is appreciated.. Phil
October 21, 200916 yr Author Now I'm getting the same errors in the syslog again. Time to rerun this again. What usually causes this if anyone knows? Thanks
October 21, 200916 yr Now I'm getting the same errors in the syslog again. Time to rerun this again. What usually causes this if anyone knows? Thanks I'd re-run the file system check on all your drives...is it the same drive reporting errors? and also I'd run a memory test... if your RAM is acting up, any kind of corruption you can think of is possible. Lastly, run a long SMART test on the drive itself. It might be the problem.
October 21, 200916 yr Author Now I'm getting the same errors in the syslog again. Time to rerun this again. What usually causes this if anyone knows? Thanks I'd re-run the file system check on all your drives...is it the same drive reporting errors? and also I'd run a memory test... if your RAM is acting up, any kind of corruption you can think of is possible. Lastly, run a long SMART test on the drive itself. It might be the problem. I'll do everything you suggested. Yes it is the same drive as before.. Another WD 500Gb drive. This would be 3 out of the 4 that I bought a couple of years ago. I'm running the file system check now. I ran it once & it came up with 40 errors. I changed the cable & am now rerunning it. It is finding the same errors, so after it's done I'll do the --rebuild-tree again..
October 21, 200916 yr Now I'm getting the same errors in the syslog again. Time to rerun this again. What usually causes this if anyone knows? Thanks I'd re-run the file system check on all your drives...is it the same drive reporting errors? and also I'd run a memory test... if your RAM is acting up, any kind of corruption you can think of is possible. Lastly, run a long SMART test on the drive itself. It might be the problem. I'll do everything you suggested. Yes it is the same drive as before.. Another WD 500Gb drive. This would be 3 out of the 4 that I bought a couple of years ago. I'm running the file system check now. I ran it once & it came up with 40 errors. I changed the cable & am now rerunning it. It is finding the same errors, so after it's done I'll do the --rebuild-tree again.. ONLY do the rebuild if it asks you to do it. You must first do the basic test, and then do as it instructs. Last thing to do is again a basic test. It is very possible the disk itself is starting to fail... or, as you suspect, it could be the cable. Are you seeing disk "read" errors? or Parity errors? They are very different. Joe L.
October 21, 200916 yr Author No parity errors or read errors. Just errors in the syslog. I guess after the last hard drives that I had that went south it has me a little paranoid. Got me looking at the syslog more often to try & prevent or catch problems before they get too big.. lol
October 22, 200916 yr Author During the rebuild tree the server locked up. Deader than a doornail. Had to unplug & reboot the machine. Now Disk 2 shows up as unformatted. Starting to get a little nervous. I've attached the syslog. Dinner is ready so I gotta run. Let me know if a screenshot is necessary.. Thanks
October 22, 200916 yr Author It allowed me to run the command below.. cd samba stop umount /dev/md2 (It told me the disk was not mounted) reiserfsck /dev/md2 (This aborted itself towards the beginning saying that the --rebuild-tree did not complete) I then ran: reiserfsck --rebuild-tree /dev/md2 It is running as I type this. It is on Pass 0. Will this fix my problem if it completes? If it gets my files back I'm moving them to another drive & throwing this drive out the window.. (just kidding) well maybe... Like I said earlier, this will be the 3rd drive out of 4 that has failed all within the same month. Never going to buy that many drives at one time from one place.. Crossing fingers.. Thanks for the help..
October 22, 200916 yr Author The rebuild tree completed. I am rerunning the file check & it is still finding errors.. It found 31 errors. It recommends using the --rebuild-tree switch. I've done that 2 times now. It's not working. I'm getting corruption only on this drive. I've changed data cables & sata ports. In the syslog I'm still getting communication problems with this drive. "Hard Resetting Link" & such. I guess it's time to copy the data off. I do have room on another drive in the array. Remove the drive & replace it with the RMA I have coming tomorrow.
October 22, 200916 yr Copying the data off to another drive is certainly your safest path, and may be your best step, so you will probably want to ignore the speculation that follows. I noticed something that is new in your syslog, new to this release. The following appears in both syslogs: kernel: REISERFS (device md2): Remounting filesystem read-only This happens quickly on discovering the file system problems, and is quickly followed by "reiserfs_read_locked_inode" errors. While this does seem on the surface quite an obvious error message, it is a change, and that bothers me a bit. It does make me wonder if reverting to a previous version with an older ReiserFS module would behave the same, perhaps correctly allow the issues to be fixed. I can't help wondering if this read lock is occurring during the reiserfsck fixing, blocking further fixes, so the same problems appear to be new problems on subsequent runs. I also wonder if running this on sdi would have a different result, going around the unRAID module's monitoring, perhaps there is a new and unknown interaction. It would however not update the parity info, is probably not worth testing ...
October 23, 200916 yr Author Copying the data off to another drive is certainly your safest path, and may be your best step, so you will probably want to ignore the speculation that follows. I noticed something that is new in your syslog, new to this release. The following appears in both syslogs: kernel: REISERFS (device md2): Remounting filesystem read-only This happens quickly on discovering the file system problems, and is quickly followed by "reiserfs_read_locked_inode" errors. While this does seem on the surface quite an obvious error message, it is a change, and that bothers me a bit. It does make me wonder if reverting to a previous version with an older ReiserFS module would behave the same, perhaps correctly allow the issues to be fixed. I can't help wondering if this read lock is occurring during the reiserfsck fixing, blocking further fixes, so the same problems appear to be new problems on subsequent runs. I also wonder if running this on sdi would have a different result, going around the unRAID module's monitoring, perhaps there is a new and unknown interaction. It would however not update the parity info, is probably not worth testing ... I've also noticed the read only state. I was trying 4.5-beta7. I have since reverted back to 4.4.2. I'm copying off what I can. There are a few files I can't delete or do anything with. These are the files that get the ""reiserfs_read_locked_inode" errors. Once everything is off that I can get I will rerun reiserfsck in unRAID 4.4.2. I never had this problem before so I don't know if this is because of the new beta or if the drive is just flaky. I did a short & long smart test on the drive & it passed. No errors or reallocated sectors either. All of the corruption is localized to this one drive.. Just strange.
October 23, 200916 yr Copying the data off to another drive is certainly your safest path, and may be your best step, so you will probably want to ignore the speculation that follows. I noticed something that is new in your syslog, new to this release. The following appears in both syslogs: kernel: REISERFS (device md2): Remounting filesystem read-only This happens quickly on discovering the file system problems, and is quickly followed by "reiserfs_read_locked_inode" errors. While this does seem on the surface quite an obvious error message, it is a change, and that bothers me a bit. It does make me wonder if reverting to a previous version with an older ReiserFS module would behave the same, perhaps correctly allow the issues to be fixed. I can't help wondering if this read lock is occurring during the reiserfsck fixing, blocking further fixes, so the same problems appear to be new problems on subsequent runs. I also wonder if running this on sdi would have a different result, going around the unRAID module's monitoring, perhaps there is a new and unknown interaction. It would however not update the parity info, is probably not worth testing ... I've also noticed the read only state. I was trying 4.5-beta7. I have since reverted back to 4.4.2. I'm copying off what I can. There are a few files I can't delete or do anything with. These are the files that get the ""reiserfs_read_locked_inode" errors. Once everything is off that I can get I will rerun reiserfsck in unRAID 4.4.2. I never had this problem before so I don't know if this is because of the new beta or if the drive is just flaky. I did a short & long smart test on the drive & it passed. No errors or reallocated sectors either. All of the corruption is localized to this one drive.. Just strange. When you do the reiserfsck you un-mount the partition, therefore I do not think the read-only state would then apply. Joe L.
October 23, 200916 yr Author I finally copied everything I could off of this drive. I ran resiserfsck /dev/md2 & it found the same 31 corrupt files. I then ran reiserfsck --rebuild-tree /dev/md2 & it fixed all corruption. I recovered all of the 31 files!! I did this by reverting back to 4.4.2. Maybe this is something Tom needs to look into. The drive is still slow to respond sometimes & the link needs to be reset, but beyond that the drive checks out fine. I'll probably still change the drive out & put this drive into my htpc for more recording space. It will probably be fine for that.
October 23, 200916 yr When you do the reiserfsck you un-mount the partition, therefore I do not think the read-only state would then apply. I agree, in the normal situation, plus your instructions specifically include an unmount command. I did not want to go into more detail about my 'speculative' thinking. What caused me to raise it though was the 'Change' made, although it appears so simple on the surface. You and I both know that a coding project, large or small or apparently trivial, is not finished until it has been fully tested internally and in its wider sphere of application (and I would add all related documentation). When I see a software Change, and this is the first usage of it, then that software is immediately suspect, until I have seen more time and usage, until I feel satisfied about how comprehensively it has been tested. Now we constantly see changes, and we would never get any sleep if we worried about all of them, but it is still something we keep in the back of our minds, and will pop up forcefully if we then see any new and unusual behavior. And that is what we had here, fixes that would not fix. Again just speculating, but I couldn't help thinking of a way this seemingly innocent Change could be causing this. While we can assume that reiserfsck is mostly independent code, it is highly logical that it uses a common library of ReiserFS helper routines. While reiserfsck itself is surely programmed to laugh at any read-only barriers, it calls these helper routines, and perhaps one of them raised the read-only signal on dealing with damage found, and perhaps another, such as a low-level node update routine, was blocked from making changes. How well was reiserfsck tested? I'm sure they tested the file system aspects of the Change, but do they have a comprehensive test suite that includes reiserfsck and tests for all known types of file damage? They should, but we all know how poor most 'test suites' are, especially when it involves volunteer effort. Test suites aren't fun work.
Archived
This topic is now archived and is closed to further replies.