bland328

Members
  • Posts

    105
  • Joined

  • Last visited

Everything posted by bland328

  1. Thanks for your responses, gentlemen. The first thing I did upon seeing this result the first time was to run a 24-hour Memtest; no errors were reported. Also, I did attach SMART results to my original post. I assume this more complex drive test you both refer to is "reiserfsck --check". I just did four back-to-back runs on the data drive, and the results are disturbing: Run 1: Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \/105 (of 145\/ 68 (of 170\bad_indirect_item: block 212886871: The item (1163 1165 0x1d4cb001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (213008304), which is in tree already finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Checking Semantic tree: finished 3 found corruptions can be fixed when running with --fix-fixable Run 2: Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \/ 64 (of 145// 30 (of 170\bad_indirect_item: block 47209215: The item (359 369 0xf104001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (47270912), w/ 95 (of 145-/152 (of 170-bad_indirect_item: block 182910983: The item (1151 1152 0x17d6001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (182919168), which is in tree already /102 (of 145-/ 45 (of 170\bad_indirect_item: block 212107611: The item (1163 1164 0x551de001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (212462848), which is in tree already finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Checking Semantic tree: finished 5 found corruptions can be fixed when running with --fix-fixable Run 3: Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \/ 13 (of 145\/128 (of 170\bad_indirect_item: block 83331021: The item (5 15 0x44d1d001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (83616512), whi/ 71 (of 145-/ 3 (of 170/bad_indirect_item: block 66715745: The item (428 429 0x1778c001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (66813956), which is in tree already / 95 (of 145-/154 (of 170|bad_indirect_item: block 182910985: The item (1151 1152 0x1fbe001 IND (1), len 4048, location 48 entry count 0, fsck need 0, format new) has the bad pointer (117) to the block (182921216), which is in tree already /126 (of 145-/ 86 (of 170|bad_indirect_item: block 231112707: The item (1231 1257 0x1 IND (1), len 396, location 336 entry count 0, fsck need 0, format new) has the bad pointer (45) to the block (231118080), which is in tree already finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Checking Semantic tree: finished 6 found corruptions can be fixed when running with --fix-fixable Run 4: Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 24105 Internal nodes 146 Directories 46 Other files 1500 Data block pointers 24307524 (0 of them are zero) Safe links 0 So...3 unfixed corruptions found on the first run, then 5 on the next run, then 6 on the next run, then 0 on the last run. I emphasize that I never repaired any of the reported file system problems, which means likely means disk reads are sometimes corrupted, but that the disk is not actually corrupted. The disk passes a long SeaTools test, and passes preclear. RAM tested clean after 24 hours of continuous testing. Replace the SATA cables? Replace the power supply? I'm a capable troubleshooter, but not a Linux or unRAID pro, and it feels to me like everything is suspect. Any thoughts?
  2. Summary I recently built my first unRAID server, and I’m having bizarre parity issues, despite my best efforts and a lot of troubleshooting. Configuration unRAID 4.7 Motherboard: MSI 870S-G46 CPU: AMD Athlon II X2 215 2.7 GHz ADX215OCK22GQ RAM: 2GB Kingston PC3-10600 DDR3 dual channel (KVR1333D3K2/2GR) PS: hec Zephyr MX 750 Parity drive: 2TB Western Digital WD20EARS Disk1 drive: 1TB Seagate 31000528AS (I kept it to just the two drives at first, so I could get to know unRAID a bit.) Problem The problem is that after preclearing both drives, firing up the array, copying ~100GB of big multimedia files to the server, seeing zero errors on both drives according to the rightmost column on the Main page, and then clicking Check to start a Parity-Check, I’m told there are dozens of parity errors. Things you might want to know I’m running unRAID 4.7 The drives are reporting 25-26°C Both drives are “MBR: 4K-aligned” I've attached SMART reports for both disks, as well as a lengthy, messy syslog (sorry about that) Troubleshooting I’ve done Memtest ran for over 24 hours with no errors reported. Western Digital and Seagate drive utilities report nothing strange about either drive. Each drive has been precleared at least three times, never with what I interpret as an error or failure report. My testing I’ve seen this problem consistently after preclearing both drives and starting from scratch three times--yes, that's many days of preclearing! During the last preclear cycle, I precleared the 2TB drive once while preclearing the 1TB drive twice back-to-back, just to make sure there was plenty of activity involving both drives. After that last round of preclears, issued an “initconfig” before repopulating the devices. No problems were reported during the initial Parity-Sync. The copying of the 100GB of big files was started after Parity-Sync completed, and seemingly went smoothly--the Windows 7 box pushing the files didn’t complain, anyway. After the copy was complete, I clicked the Check button, and eventually 39 parity sync errors were fixed! My questions What do I try next regarding the parity failures? This is feeling like a hardware failure outside the drives. I’m fearing I’ve chosen the wrong motherboard, or have a lemon. Am I right that it is fine to preclear all drives with the -A option? My understanding is that the Seagate drive doesn’t need this, but it is fine to do so, other than some space lost if I have a huge number of files. For the sake of elegance, simplicity, and never getting it wrong on any given drive in the future, I’m hoping always doing this is fine. bland328_SMART_sda.txt bland328_SMART_sdb.txt bland328_syslog_with_a_few_notes_at_the_top.txt