February 15, 201016 yr I am running 4.5.1 on backup server had a drive fail replaced it with a new drive and did a rebuild. Now an error on screen: reiserfs error (devices md2): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 97648640 is corrupted: first bit muar be 1 reiserfs error (device md2) vs-5150 search_by_key: invalid format found in block 93890322.Fsck? reiserfs error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 1402 0x0 SD] Including syslog Any advice...... syslog1.txt
February 15, 201016 yr Follow the guidelines as given here in the wiki : http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems For your file-system, you would use /dev/md2 and /mnt/disk2 in place of md1 and disk1 as shown on the wiki. Joe L.
February 15, 201016 yr Author I run reiserfsck on md2 It came back it found 4 corruptions and can be fixed only running --rebuilding-tree should I run this?
February 15, 201016 yr I run reiserfsck on md2 It came back it found 4 corruptions and can be fixed only running --rebuilding-tree should I run this? If that is what it suggests, then yes. Run it. It will guide you.
February 16, 201016 yr Author Joe, Thanks for the help, I did run the reiserfsck --rebuild-tree I just let it run I needed to go to a meeting. It finished. Then ran reiserfsck this time no error, so I am running a parity check will let you know....
February 16, 201016 yr Author Parity check ran without sync errors, then try to copy data from prod to backup server same error on screen ran reiserfsck /dev/md2 this time 5 corruptions was asked to rebuild-tree again......
February 16, 201016 yr Parity check ran without sync errors, then try to copy data from prod to backup server same error on screen ran reiserfsck /dev/md2 this time 5 corruptions was asked to rebuild-tree again...... You have hardware issues. Once you ran a parity calc any subsequent check should be error free unless you lost power ans had a un-clean shutdown. It is very difficult to find the bad piece of hardware. It could be memory (most likely) a bad disk (possible) a bad motherboard (possible) or even the power supply (if noisy or marginal) Time to post a new syslog, before you reboot. Also, get a smartctl report on the drive with the errors. (In case it is having trouble reading sectors) Make sure the memory voltage, timing, and clock speed are set for your specific make and model ram strips. Do this before anything else. Run a memory test, preferable overnight. If memory is failing, all bets are off for calculating parity consistently. The last two people on the forum with a similar situation (Consecutive parity calc / parity check has parity errors) had a bad disk-controller card, and a bad motherboard (Actually, they gave up on the entire old MB/disk controller and just went for a new MB) Joe L.
February 17, 201016 yr Author Joe, I did a rebuild-tree again on md2, I also did reiserfsck on md1 and md3. The disk md1 was fine with no corruption how ever md3 had one corrupt file ran reiserfsck --fix-fixable as requested. All seems back to normal I was able to backup about 150Gig with no errors showing up on screen or syslog. I also ran smartctl report on the suspect drive md2 and it passed. I will run a memory test tonight to see if that turns up anything.... Thanks Joe I wish all vendors that I deal with was as responsive.... SV
February 17, 201016 yr Joe, I did a rebuild-tree again on md2, I also did reiserfsck on md1 and md3. The disk md1 was fine with no corruption how ever md3 had one corrupt file ran reiserfsck --fix-fixable as requested. All seems back to normal I was able to backup about 150Gig with no errors showing up on screen or syslog. I also ran smartctl report on the suspect drive md2 and it passed. I will run a memory test tonight to see if that turns up anything.... Thanks Joe I wish all vendors that I deal with was as responsive.... SV Until you can perform several consecutive parity checks on that server, and have them all come up with no errors, I would not trust it for anything. Unless you know what to look for in the smartctl report it will say it passed until a given parameter goes below its affiliated error threshold.. Most larger disks have several thousand spare sectors in reserve, but most people will replace the drive once it starts using up a few percent the reserve. Were there any sectors pending re-allocation? Or sectors that were re-allocated? Joe L.
February 18, 201016 yr Author Joe, I have ran two parity checks with no errors, I have included smartctl test and syslog. I will run memory test tonight... smarttest.txt syslog.txt
February 18, 201016 yr Author Joe, The memory test did 27 passes with no errors, anything else I should check...
February 18, 201016 yr If you see no more errors I'd say you are good to go. Keep an eye out for any other errors. With any luck you won't see any. For the short term I'd verify files you copy to the server just to ensure all is well. I'm really suspicious of file-system corruption re-occurrence after you've already fixed it once (If I understood your earlier post correctly). Joe L.
February 18, 201016 yr Author Yes, I am a little nervous about this server, I have tested a few files with know issues. I will run reiserfsck on my disks just to check for corruption. Thanks again Joe SV
Archived
This topic is now archived and is closed to further replies.