December 13, 201213 yr My servers is running a Parity check. I see the following errors in my syslog during this: Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Device not ready (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Result: hostbyte=0x00 driverbyte=0x08 (System) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Sense Key : 0x2 [current] (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] ASC=0x4 ASCQ=0x2 (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] CDB: cdb[0]=0x28: 28 00 00 d6 70 98 00 00 08 00 (Drive related) Dec 13 22:10:30 Tower1 kernel: end_request: I/O error, dev sdn, sector 14053528 (Errors) Dec 13 22:10:30 Tower1 kernel: md: disk10 read error (Errors) Dec 13 22:10:30 Tower1 kernel: handle_stripe read error: 14053464/10, count: 1 (Errors) Dec 13 22:10:30 Tower1 kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 51362 does not match to the expected one 2 (Minor Issues) Dec 13 22:10:30 Tower1 kernel: REISERFS error (device md10): vs-5150 search_by_key: invalid format found in block 1756683. Fsck? (Errors)Dec 13 22:10:30 Tower1 kernel: REISERFS error (device md10): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [4 10 0x0 SD] (Errors) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Device not ready (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Result: hostbyte=0x00 driverbyte=0x08 (System) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] Sense Key : 0x2 [current] (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] ASC=0x4 ASCQ=0x2 (Drive related) Dec 13 22:10:30 Tower1 kernel: sd 10:0:0:0: [sdn] CDB: cdb[0]=0x28: 28 00 64 6f 73 d0 00 00 08 00 (Drive related) Dec 13 22:10:30 Tower1 kernel: end_request: I/O error, dev sdn, sector 1685025744 (Errors) Dec 13 22:10:30 Tower1 kernel: md: disk10 read error (Errors)Dec 13 22:10:30 Tower1 kernel: handle_stripe read error: 1685025680/10, count: 1 (Errors)Dec 13 22:10:30 Tower1 kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 4380 does not match to the expected one 2 (Minor Issues) Dec 13 22:10:30 Tower1 kernel: REISERFS error (device md10): vs-5150 search_by_key: invalid format found in block 210628210. Fsck? (Errors) Dec 13 22:10:30 Tower1 kernel: REISERFS error (device md10): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [3228 3229 0x0 SD] (Errors) Is there something wrong with the disk? Is it a cable issue? Is it wise to let the Parity check finish? See also the enclosed syslog. The disk sits in a CSE-M35T cage and is connected to a M1015 controller. syslog-2012-12-13.zip
December 13, 201213 yr Author Upon further investigation disk 10 sdn is spundown. Still there are reads shown on the disk every 2 or 3 minutes in unraid main menu. What is going on here?
December 13, 201213 yr Upon further investigation disk 10 sdn is spundown. Still there are reads shown on the disk every 2 or 3 minutes in unraid main menu. What is going on here? The reads are being satisfied from the disk buffer cache, so the disk does not need to be spun up.
December 14, 201213 yr Author Parity check finished. I took the array off line, the disk shows as disabled now in mymain of unmenu, probably will be red balled when the array is started again. Shall I start the array in maintanance mode and run reiserfsck? Or is it saver to unassign the disk, mount iit outside the array, copy data of it and replace the disk. I guess Parity data is invalid now and its better to rebuild the parity? Maybe it is better to un-assing the parity and start the array without parity disk and also take out the bad disk. Will the bad disk show up in maintenance mode, so I can run Reiserfsck?
December 14, 201213 yr Author Upon further investigation disk 10 sdn is spundown. Still there are reads shown on the disk every 2 or 3 minutes in unraid main menu. What is going on here? The reads are being satisfied from the disk buffer cache, so the disk does not need to be spun up. Crap! Such things allways happen at the baddest moment, in the middle of a Parity build. No harm done yet I think, just back to square one! Maybe start over again, unassign parity, set new config,reassign the disks, start the array, stop the array, assign parity, start the array, hopefully parity rebuild starts and finishes without errors. But first run reiserfsck on all disks ( 14 and counting). Takes time, but thats the risk.
December 14, 201213 yr Author It's out of the question to replace the bad disk with a new one, and let it rebuilt, because I don't trust the parity data at the moment. Simply a chicken and egg situation with the chicken on life-support.
December 14, 201213 yr Author Reiser fsck just finished on the bad disk. No corruption found. What to do now? reiserfsck --check started at Fri Dec 14 10:17:02 2012 ########### Replaying journal: Trans replayed: mountid 53, transid 13998, desc 2868, len 1, commit 2870, next trans offset 2853 Trans replayed: mountid 53, transid 13999, desc 2871, len 1, commit 2873, next trans offset 2856 Trans replayed: mountid 53, transid 14000, desc 2874, len 1, commit 2876, next trans offset 2859 Replaying journal: Done. Reiserfs journal '/dev/md10' in blocks [18..8211]: 3 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 239475 Internal nodes 1495 Directories 580 Other files 3303 Data block pointers 241844901 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Fri Dec 14 11:46:31 2012
December 14, 201213 yr Author Can I un-assign the bad disk, then start the array and leave the slot where the dis was open? Then start a new parity build. Mount the bad disk outside the array and copy data off it to a new disk? What would be the best and most risk-free way to do this? How to start a new parity build?
December 14, 201213 yr Author Parity sync running without the bad disk now. Hope all will be ok this time. Will see after the bad disk later. Big advantage over a RAID systems, you cant "juggle" with disks in such a system, well I'm still a Noob with capital N. What I don't understand are the following lines in the syslog: Dec 14 15:36:20 Tower1 kernel: CIFS VFS: did not end path lookup where expected namelen is 0 Dec 14 15:37:24 Tower1 last message repeated 2 times Dec 14 15:38:40 Tower1 last message repeated 3 times Dec 14 15:39:53 Tower1 last message repeated 3 times Dec 14 15:41:44 Tower1 last message repeated 8 times Dec 14 15:42:50 Tower1 last message repeated 4 times Has anyone an explanation for that?
Archived
This topic is now archived and is closed to further replies.