flambot Posted January 10, 2011 Share Posted January 10, 2011 Greetings, I am currently running a parity check (to check the parity sync) after removing a drive. I decided to check the syslog using unMenu and discovered a whole lot of errors in it - all the same as this sequence (repeating). It started immediately upon startup this morning and continued all day. I just noticed this after the parity check has been running approx 9 hours. There are NO sync errors showing on the WebGUI. Fearful I would use up all the RAM, I turned Cache_dirs off and this error immediately stopped. Not sure what I should do now?? The relevant part of the syslog is attached. Jan 10 10:08:00 Tower cache_dirs: cache_dirs process ID 2409 started, To terminate it, type: cache_dirs -q Jan 10 10:09:55 Tower kernel: attempt to access beyond end of device Jan 10 10:09:55 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:09:55 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1299 0x0 SD] Jan 10 10:09:55 Tower kernel: REISERFS (device md15): Remounting filesystem read-only Jan 10 10:09:55 Tower kernel: attempt to access beyond end of device Jan 10 10:09:55 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:09:55 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1300 0x0 SD] Jan 10 10:10:25 Tower kernel: ata11.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 10 10:10:25 Tower kernel: ata11.00: failed command: READ DMA EXT Jan 10 10:10:25 Tower kernel: ata11.00: cmd 25/00:08:37:89:e5/00:00:1d:00:00/e0 tag 0 dma 4096 in Jan 10 10:10:25 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 10 10:10:25 Tower kernel: ata11.00: status: { DRDY } Jan 10 10:10:25 Tower kernel: ata11: hard resetting link Jan 10 10:10:26 Tower kernel: ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 10 10:10:26 Tower kernel: ata11.00: configured for UDMA/133 Jan 10 10:10:26 Tower kernel: ata11.00: device reported invalid CHS sector 0 Jan 10 10:10:26 Tower kernel: ata11: EH complete Jan 10 10:10:26 Tower kernel: attempt to access beyond end of device Jan 10 10:10:26 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:10:26 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1301 0x0 SD] Jan 10 10:10:40 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 52428 does not match to the expected one 1 Jan 10 10:10:40 Tower kernel: REISERFS error (device md15): vs-5150 search_by_key: invalid format found in block 176063846. Fsck? Jan 10 10:10:40 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1380 1248 0x0 SD] Jan 10 10:11:54 Tower kernel: attempt to access beyond end of device Jan 10 10:11:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:11:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1299 0x0 SD] Jan 10 10:11:54 Tower kernel: attempt to access beyond end of device Jan 10 10:11:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:11:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1300 0x0 SD] Jan 10 10:11:54 Tower kernel: attempt to access beyond end of device Jan 10 10:11:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:11:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1301 0x0 SD] Jan 10 11:12:49 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 52428 does not match to the expected one 1 Jan 10 11:12:49 Tower kernel: REISERFS error (device md15): vs-5150 search_by_key: invalid format found in block 176063846. Fsck? Jan 10 11:12:49 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1380 1248 0x0 SD] Jan 10 11:12:54 Tower kernel: attempt to access beyond end of device Jan 10 11:12:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 11:12:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1299 0x0 SD] Jan 10 11:12:54 Tower kernel: attempt to access beyond end of device Jan 10 11:12:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 11:12:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1300 0x0 SD] Jan 10 11:12:54 Tower kernel: attempt to access beyond end of device Jan 10 11:12:54 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 11:12:54 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1301 Jan 10 19:00:01 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 52428 does not match to the expected one 1 Jan 10 19:00:01 Tower kernel: REISERFS error (device md15): vs-5150 search_by_key: invalid format found in block 176063846. Fsck? Jan 10 19:00:01 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1380 1248 0x0 SD] Jan 10 19:00:05 Tower kernel: attempt to access beyond end of device Jan 10 19:00:05 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1299 0x0 SD] Jan 10 19:00:05 Tower kernel: attempt to access beyond end of device Jan 10 19:00:05 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1300 0x0 SD] Jan 10 19:00:05 Tower kernel: attempt to access beyond end of device Jan 10 19:00:05 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1301 0x0 SD] Jan 10 19:00:05 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 52428 does not match to the expected one 1 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-5150 search_by_key: invalid format found in block 176063846. Fsck? Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1380 1248 0x0 SD] Jan 10 19:00:08 Tower cache_dirs: killing cache_dirs process 2409 Quote Link to comment
GK20 Posted January 10, 2011 Share Posted January 10, 2011 I turned Cache_dirs off and this error immediately stopped. Not sure what I should do now?? Jan 10 19:00:05 Tower kernel: attempt to access beyond end of device Jan 10 19:00:05 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1301 0x0 SD] Jan 10 19:00:05 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 52428 does not match to the expected one 1 Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-5150 search_by_key: invalid format found in block 176063846. Fsck? Jan 10 19:00:05 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1380 1248 0x0 SD] Jan 10 19:00:08 Tower cache_dirs: killing cache_dirs process 2409 parity checking is at disk block level. You should perform file system check on md15 disk. looks to me, you might have file system corruption. http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Quote Link to comment
Joe L. Posted January 10, 2011 Share Posted January 10, 2011 Let the parity calc finish, ad, as said, perform a file system check on disk15 ( /dev/md15 ) following the instructions in the wiki. The file-system on that disk was re-mounted as read-only to prevent more corruption from occurring. Jan 10 10:09:55 Tower kernel: attempt to access beyond end of device Jan 10 10:09:55 Tower kernel: md15: rw=0, want=27487790696, limit=1953525104 Jan 10 10:09:55 Tower kernel: REISERFS error (device md15): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1294 1299 0x0 SD] Jan 10 10:09:55 Tower kernel: REISERFS (device md15): Remounting filesystem read-only Joe L. Quote Link to comment
flambot Posted January 10, 2011 Author Share Posted January 10, 2011 Woke up this morning and the parity check was complete - WITHOUT any sync errors. This surprised me - that something is obviously wrong with disk15 and there weren't any errors. I was expecting there to be. Another thing that is surprising. The log for this session isn't on the flash drive. The last thing the log says under syslog (unmenu) is "Jan 11 04:40:01 Tower syslogd 1.4.1: restart." - with only a few short entries - all about disk spindown as the smaller ones finished parity check. I guess the log got full?? My did the errors stop when I shutoff cache_dirs? Better go and do the file system check. Thx for the help. Quote Link to comment
flambot Posted January 10, 2011 Author Share Posted January 10, 2011 The file system check just finished and report no corruption found. Here is the output reiserfsck --check started at Tue Jan 11 09:51:29 2011 ########### Replaying journal: Trans replayed: mountid 813, transid 34649, desc 3366, len 1, commit 3368, next trans offset 3351 Trans replayed: mountid 813, transid 34650, desc 3369, len 1, commit 3371, next trans offset 3354 Replaying journal: Done. Reiserfs journal '/dev/md15' in blocks [18..8211]: 2 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 231066 Internal nodes 1409 Directories 42 Other files 1747 Data block pointers 233533315 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jan 11 11:24:41 2011 Not sure what is going on. Where do I go from here? Quote Link to comment
Joe L. Posted January 10, 2011 Share Posted January 10, 2011 The file system check just finished and report no corruption found. Here is the output reiserfsck --check started at Tue Jan 11 09:51:29 2011 ########### Replaying journal: Trans replayed: mountid 813, transid 34649, desc 3366, len 1, commit 3368, next trans offset 3351 Trans replayed: mountid 813, transid 34650, desc 3369, len 1, commit 3371, next trans offset 3354 Replaying journal: Done. Reiserfs journal '/dev/md15' in blocks [18..8211]: 2 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 231066 Internal nodes 1409 Directories 42 Other files 1747 Data block pointers 233533315 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jan 11 11:24:41 2011 Not sure what is going on. Where do I go from here? Perform a memory test. It is most suspect ow. Quote Link to comment
flambot Posted January 10, 2011 Author Share Posted January 10, 2011 The file system check just finished and report no corruption found. Here is the output reiserfsck --check started at Tue Jan 11 09:51:29 2011 ########### Replaying journal: Trans replayed: mountid 813, transid 34649, desc 3366, len 1, commit 3368, next trans offset 3351 Trans replayed: mountid 813, transid 34650, desc 3369, len 1, commit 3371, next trans offset 3354 Replaying journal: Done. Reiserfs journal '/dev/md15' in blocks [18..8211]: 2 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 231066 Internal nodes 1409 Directories 42 Other files 1747 Data block pointers 233533315 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jan 11 11:24:41 2011 Not sure what is going on. Where do I go from here? Perform a memory test. It is most suspect ow. Will do. Thx. Currently running a long smart test on the drive. AND... Should there be any reason the errors stopped when I disabled cache_dirs?? Quote Link to comment
Joe L. Posted January 10, 2011 Share Posted January 10, 2011 The file system check just finished and report no corruption found. Here is the output reiserfsck --check started at Tue Jan 11 09:51:29 2011 ########### Replaying journal: Trans replayed: mountid 813, transid 34649, desc 3366, len 1, commit 3368, next trans offset 3351 Trans replayed: mountid 813, transid 34650, desc 3369, len 1, commit 3371, next trans offset 3354 Replaying journal: Done. Reiserfs journal '/dev/md15' in blocks [18..8211]: 2 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 231066 Internal nodes 1409 Directories 42 Other files 1747 Data block pointers 233533315 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jan 11 11:24:41 2011 Not sure what is going on. Where do I go from here? Perform a memory test. It is most suspect ow. Will do. Thx. Currently running a long smart test on the drive. AND... Should there be any reason the errors stopped when I disabled cache_dirs?? Don't forget to disable the spin-down on the drive. That will abort the "long" test which will probably take 4 or 5 hours. The errors did not go away, it is just that the bad file-system was not being accessed, so no more messages. Cache_dirs basically does a periodic directory listing, same as you would if you did it from file-explorer. It just does it every few seconds. Quote Link to comment
flambot Posted January 10, 2011 Author Share Posted January 10, 2011 The file system check just finished and report no corruption found. Here is the output reiserfsck --check started at Tue Jan 11 09:51:29 2011 ########### Replaying journal: Trans replayed: mountid 813, transid 34649, desc 3366, len 1, commit 3368, next trans offset 3351 Trans replayed: mountid 813, transid 34650, desc 3369, len 1, commit 3371, next trans offset 3354 Replaying journal: Done. Reiserfs journal '/dev/md15' in blocks [18..8211]: 2 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 231066 Internal nodes 1409 Directories 42 Other files 1747 Data block pointers 233533315 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Tue Jan 11 11:24:41 2011 Not sure what is going on. Where do I go from here? Perform a memory test. It is most suspect ow. Will do. Thx. Currently running a long smart test on the drive. AND... Should there be any reason the errors stopped when I disabled cache_dirs?? Don't forget to disable the spin-down on the drive. That will abort the "long" test which will probably take 4 or 5 hours. The errors did not go away, it is just that the bad file-system was not being accessed, so no more messages. Cache_dirs basically does a periodic directory listing, same as you would if you did it from file-explorer. It just does it every few seconds. Thx Joe. I didn't know about the spin-down - first time I've done a long test. The cache_dirs explanation also makes sense. I'll crank up a memory test on next boot. Appreciate it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.