January 7, 200917 yr Hi All, I couldnt see a 4.4 forum so I have posted this in here. Recently I upgraded from a perfectly working unraid 4.3 + BubbaRaid -> unraid 4.4 + BubbaRaid (the version that works with 4.4). This seemed to be working without any issues until a few days ago, I replaced my 120gig IDE drive (cache) with an 80gig SATA drive (cache) and also added a 200gig SATA drive too. I now use only 1x IDE drive. I noticed "1" error on the stats page yesterday so I did a parity check... the partity check was going at 1,500 kb/s (compared to the normal 55,000 kb/s) so I rebooted. Before I rebooted, I checked 'top' in telnet and noted that a process called 'kblockd' was using 100% of the CPU. Google says this is a kernel memory thing, however, this server has 1gb of RAM and is only running unraid + bubbaraid, so it shouldnt be running out of RAM. After the reboot, I did a parity check and it said it had found 6 errors, but the old IDE drive (although its only ~1 yrs old) showed 1 error and no other drives showed an error. So... I ran the parity check again, this time it said there were '5' sync errors, and 2 errors were on the old IDE drive (with 0 errors on any other drive). Again, thinking this was weird, I ran the parity check... and yep, 100% cpu usage and a really slow check (1,500kb/s).. this time the process was used by unraidd. I'll do another reboot tonight and re-run parity, but does it seem like the old IDE might be on its way out? oh snap.. I just found a bug... if you take the array offline ('stop') whilst doing a parity check, it sets the other drives as unformatted.. quite dangerous really (can someone else try this?). Syslog is attached, however, interesting bits are below: Jan 5 15:31:31 TANK kernel: ReiserFS: md2: journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Jan 6 07:06:06 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 07:06:06 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 07:06:06 TANK kernel: ide: failed opcode was: unknown Jan 6 07:10:48 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 07:10:48 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 07:10:48 TANK kernel: ide: failed opcode was: unknown This drive has been working perfectly before changing IDE drives... would this be just a coincidence or could it be related to using 4.4? Thanks
January 15, 200917 yr Author Bump :-( This is still an issue. <snip> Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected Jan 10 21:42:35 TANK kernel: ide0: reset: success Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown <snip> Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty?
January 15, 200917 yr Bump :-( This is still an issue. <snip> Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected Jan 10 21:42:35 TANK kernel: ide0: reset: success Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown <snip> Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty? The errors you are showing are frequently affiliated with bad hardware . It could be the drive, but could as easily be the drive cable or the disk controller. You said you moved drives around. Did you use a 80 conductor flat IDE cable, or did you use an older 40 conductor one you had laying around. (Older cables made for floppy disks cannot handle the higher speed of today's disk drives) Did you use a "round" cable, or a cable longer than 24 inches. Good possibility neither will meet the proper specs for reliable high speed operation. The only other test you can run is a "SMART" test using "smartctl" Details in the "Troubleshooting" section in the wiki, but for drive hda the command would be smartctl -a -d ata /dev/hda If smartctl complains about a missing library you'll need to download and install it. Details here: in this post Joe L.
January 15, 200917 yr Author I didn't change the IDE cable at all except for removing the second IDE device. Smart did complain about a missing binary, which I have now fixed. If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error). The cable is an 80 pin flat. smart status: root@TANK:/boot/packages# smartctl -a -d ata /dev/hda smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630A Serial Number: 9QG1TV5X Firmware Version: 3.AAE User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 16 01:09:11 2009 GMT-10 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 082 006 Pre-fail Always - 191221885 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2132 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 6 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 273225045 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10613 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 186 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 048 045 Old_age Always - 30 (Lifetime Min/Max 24/32) 194 Temperature_Celsius 0x0022 030 052 000 Old_age Always - 30 (0 17 0 0) 195 Hardware_ECC_Recovered 0x001a 061 053 000 Old_age Always - 95733308 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 102 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 101 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:17.347 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:16.023 SET MULTIPLE MODE Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.162 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:16.463 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:16.023 SET FEATURES [set transfer mode] Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.172 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:14.162 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:14.152 SET FEATURES [set transfer mode] 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:14.162 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:14.152 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.141 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 10432 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
January 15, 200917 yr I didn't change the IDE cable at all except for removing the second IDE device. Smart did complain about a missing binary, which I have now fixed. If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error). The cable is an 80 pin flat. smart status: root@TANK:/boot/packages# smartctl -a -d ata /dev/hda smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630A Serial Number: 9QG1TV5X Firmware Version: 3.AAE User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 16 01:09:11 2009 GMT-10 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 082 006 Pre-fail Always - 191221885 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2132 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 6 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 273225045 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10613 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 186 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 048 045 Old_age Always - 30 (Lifetime Min/Max 24/32) 194 Temperature_Celsius 0x0022 030 052 000 Old_age Always - 30 (0 17 0 0) 195 Hardware_ECC_Recovered 0x001a 061 053 000 Old_age Always - 95733308 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 102 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 101 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:17.347 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:16.023 SET MULTIPLE MODE Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.162 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:16.463 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:16.023 SET FEATURES [set transfer mode] Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.172 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:14.162 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:14.152 SET FEATURES [set transfer mode] 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:14.162 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:14.152 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.141 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 10432 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. One thing that seems high is the UDMA CRC error count which points to a bad cable/hardware... see the above post by Joe L I would try and change the cables and see if that fixes it Cheers, Matt
January 15, 200917 yr I didn't change the IDE cable at all except for removing the second IDE device. Smart did complain about a missing binary, which I have now fixed. If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error). The cable is an 80 pin flat. Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors. smart status: root@TANK:/boot/packages# smartctl -a -d ata /dev/hda smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.10 family Device Model: ST3500630A Serial Number: 9QG1TV5X Firmware Version: 3.AAE User Capacity: 500,107,862,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Jan 16 01:09:11 2009 GMT-10 SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 163) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 082 006 Pre-fail Always - 191221885 3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2132 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 6 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 273225045 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10613 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 186 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 048 045 Old_age Always - 30 (Lifetime Min/Max 24/32) 194 Temperature_Celsius 0x0022 030 052 000 Old_age Always - 30 (0 17 0 0) 195 Hardware_ECC_Recovered 0x001a 061 053 000 Old_age Always - 95733308 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 102 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 101 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:17.347 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:16.023 SET MULTIPLE MODE Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.162 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:16.463 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:16.023 SET FEATURES [set transfer mode] Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT c6 00 10 00 00 00 e0 00 08:03:14.172 SET MULTIPLE MODE 00 00 40 00 00 00 00 06 08:03:14.162 NOP [Abort queued commands] ef 03 40 00 00 00 e0 02 08:03:14.152 SET FEATURES [set transfer mode] 25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT 10 00 3f 00 00 00 e0 00 08:03:14.162 RECALIBRATE [OBS-4] 25 00 08 c7 87 5d e0 00 08:03:14.152 READ DMA EXT 25 00 08 c7 87 5d e0 00 08:03:14.141 READ DMA EXT SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 10432 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors. If those errors occurred recently it might be an indication the drive is in need of some attention. Joe L.
January 15, 200917 yr Very early in my life as an unRAID user, I had issues using the round IDE cables. Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently. DO NOT USE THEM! Using the infuriating flat IDE cables is what is required. Once I switched everything was stable.
January 15, 200917 yr You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors. If those errors occurred recently it might be an indication the drive is in need of some attention. Joe L. they were indeed fairly recent. i guess... the current power on time when the smart report was taken was at 10613 hours, the errors were recorded at 10525 hours.. so about 100 or so power on hours before the test was taken. (taken from the smart report) If that helps, Matt
January 15, 200917 yr Author Very early in my life as an unRAID user, I had issues using the round IDE cables. Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently. DO NOT USE THEM! Using the infuriating flat IDE cables is what is required. Once I switched everything was stable. I am using a flat cable :-) I would try and change the cables and see if that fixes it I will after switching the connector :-) Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors. Pretty sure it's connected to the middle connector, i'll switch that before changing cables. Thanks guys for your help, i'll try out a few things and update this thread. ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)? Cheers
January 15, 200917 yr ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)? Linux is very persistent in its attempt to communicate with the drives. It will try progressively slower methods to communicate until it eventually settles on a very slow PIO mode. Yes, PIO mode would cause everything you described... stutter, slow parity checks, etc. It is interesting in that these exact same issues probably occur in the windows PCs we have, and that we just are not informed the drive is in PIO mode, but just see the performance degrade. Eventually we buy a faster, newer machine to read our mail, etc. Joe L.
January 16, 200917 yr Author Your syslog will clearly show drive errors, and speed/mode changes to PIO. Interesting.. I guess I should have looked for it but it only occured to me whilst posting my 'symptoms'. snippits from old syslog when this issue occured: Jan 5 15:31:29 TANK kernel: hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4 Jan 5 15:31:29 TANK kernel: hda: UDMA/100 mode selected Jan 5 15:31:29 TANK kernel: Probing IDE interface ide1... Jan 5 15:31:29 TANK kernel: ide0 at 0xaf00-0xaf07,0xae02 on irq 18 Jan 5 15:31:29 TANK kernel: ide1 at 0xad00-0xad07,0xac02 on irq 18 Jan 5 15:31:29 TANK kernel: i801_smbus 0000:00:1f.3: PCI INT B -> GSI 19 (level, low) -> IRQ 19 Jan 6 20:00:18 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 20:00:18 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 20:00:18 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:18 TANK kernel: hda: UDMA/44 mode selected Jan 6 20:00:20 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 20:00:20 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 20:00:20 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:20 TANK kernel: hda: UDMA/33 mode selected Jan 6 20:00:22 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:22 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 20:00:22 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 20:00:22 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:22 TANK kernel: hda: UDMA/25 mode selected Jan 6 20:00:29 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 20:00:29 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 20:00:29 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:29 TANK kernel: hda: UDMA/16 mode selected Jan 6 20:00:31 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } Jan 6 20:00:31 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Jan 6 20:00:31 TANK kernel: ide: failed opcode was: unknown Jan 6 20:00:31 TANK kernel: hda: no DMA mode selected Jan 6 20:00:31 TANK kernel: ide0: reset: success Looks like it was gradually slowing down to me! Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array? FWIW, I removed BubbaRaid and upgraded unRaid to v4.2.2 and so far have not seen another error. I highly doubt Bubbaraid was causing any issues but nonetheless, I wanted to ensure I'm not running anything 'un-necessary' whilst diagnosing this issue.
January 16, 200917 yr Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array? Once DMA is disabled, then you are using a PIO mode, and there should have been a message to that effect. It only affects this drive, not the others. It only affects operations that include access to this drive, such as parity checks, but could slow it down to the speed of the slowest drive. PIO modes tend to result in speeds in the low single digits, around 3MB/s is typical. By the way, I heartily recommend installing UnMENU and using the MyMain plugin. There is a very under-emphasized feature there, perhaps undiscovered by most, that allows you to examine just the syslog messages that pertain to a single drive. Just click the SY link at the far right to see them. Another great idea from Brian!
January 19, 200917 yr Author Just an update.. I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one. I have been running 2 days so far without any errors but I do see a lot of this in the syslog: Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0 Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck? Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD] Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$ Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck? Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD] Should I be worried about that?
January 19, 200917 yr Just an update.. I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one. I have been running 2 days so far without any errors but I do see a lot of this in the syslog: Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0 Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck? Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD] Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$ Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck? Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD] Should I be worried about that? You probably need to run a reiserfsck on that drive. (The drive assigned to disk3 in your array.) http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Joe L.
January 20, 200917 yr Author root@TANK:~# samba stop root@TANK:~# umount /dev/md3 root@TANK:~# reiserfsck /dev/md3 reiserfsck 3.6.19 (2003 www.namesys.com) ************************************************************* ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will read-only check consistency of the filesystem on /dev/md3 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Wed Jan 21 01:39:41 2009 ########### Replaying journal.. Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed Checking internal tree../ 2 (of 5)/140 (of 155)/ 45 (of 170)block 106846153: The number of items (3) is incorrect, should be (0) the problem in the internal node occured (106846153), whole subtree is skipped finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 1 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Wed Jan 21 01:53:38 2009 ########### root@TANK:~# I'll do the next part now
January 21, 200917 yr Author All done: root@TANK:~# reiserfsck --rebuild-tree /dev/md3 reiserfsck 3.6.19 (2003 www.namesys.com) ************************************************************* ** Do not run the program with --rebuild-tree unless ** ** something is broken and MAKE A BACKUP before using it. ** ** If you have bad sectors on a drive it is usually a bad ** ** idea to continue using it. Then you probably should get ** ** a working hard drive, copy the file system from the bad ** ** drive to the good one -- dd_rescue is a good tool for ** ** that -- and only then run this program. ** ** If you are using the latest reiserfsprogs and it fails ** ** please email bug reports to [email protected], ** ** providing as much information as possible -- your ** ** hardware, kernel, patches, settings, all reiserfsck ** ** messages (including version), the reiserfsck logfile, ** ** check the syslog file for any related information. ** ** If you would like advice on using this program, support ** ** is available for $25 at www.namesys.com/support.html. ** ************************************************************* Will rebuild the filesystem (/dev/md3) tree Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal.. Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Wed Jan 21 01:56:03 2009 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 107213798 blocks marked used Skipping 11937 blocks (super block, journal, bitmaps) 107201861 blocks will be read 0%....20%....40%block 106846153: The number of items (3) is incorrect, should be (0) - corrected block 106846153: The free space (0) is incorrect, should be (4072) - corrected left 0, 16031 /secc 20919 directory entries were hashed with "r5" hash. "r5" hash is selected Flushing..finished Read blocks (but not data blocks) 107201861 Leaves among those 107905 - leaves all contents of which could not be saved and deleted 1 Objectids found 20921 Pass 1 (will try to insert 107904 leaves): ####### Pass 1 ####### Looking for allocable blocks .. finished 0%....20%....40%....60%....80%....100% left 0, 88 /sec Flushing..finished 107904 leaves read 107805 inserted 99 not inserted ####### Pass 2 ####### Pass 2: 0%....20%....40%....60%....80%....100% left 0, 66 /sec Flushing..finished Leaves inserted item by item 99 Pass 3 (semantic): ####### Pass 3 ######### ... ard Top 100 Songs - 1951 - 2000/1968/1968-061 Donovan - Hurdy Gurdy Man.mp3vpf-10680: The file [2395 2456] has the wrong block count in the StatData (9 0) - corrected to (3696) /MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968rebuild_semantic_pass: The entry [2395 2457] ("1968-062 Steppenwolf - Magic Carpet Ride.mp3") in direc ry [1162 2395] points to nowhere - is removed /MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968vpf-10650: The directory [1162 2395] has the wrong size in the StatData (6888) - corrected to (6824/19 Flushing..finished Files found: 20007 Directories found: 913 Names pointing to nowhere Pass 3a (looking for lost dir/fil ####### Pass 3a (lost+found pass) Looking for lost directories: Flushing..finished36, 67 /sec Pass 4 - finished done 0, 0 Deleted unreachable items Flushing..finished Syncing..finished ########### reiserfsck finished at Wed Jan 21 ########### root@TANK:~#
January 22, 200917 yr That looks like it may have created a mess! I really hope you made a backup of the drive. If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup. I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean.
January 22, 200917 yr Author That looks like it may have created a mess! I really hope you made a backup of the drive. If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup. I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean. Unfortunately, no, I didn't make a backup of the drive (although, it has prompted me to go out and buy a portable HDD to keep 'offsite'). I haven't seen a single error or syslog entry since that check so it's looking good so far! Thanks guys heaps for your help, I'm in debt to these forums! nb: I have 100Mbit colo with unlimited outgoing, so if anyone wants me to help share new releases, let me know! nnb: I'll run another scan as suggested and post the results. Cheers EDIT: after running a check again: root@TANK:~# samba stop root@TANK:~# umount /dev/md3 root@TANK:~# reiserfsck /dev/md3 reiserfsck 3.6.19 (2003 www.namesys.com) <snip> <snip> Will read-only check consistency of the filesystem on /dev/md3 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Fri Jan 23 00:33:18 2009 ########### Replaying journal.. Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed Checking internal tree..finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 107899 Internal nodes 706 Directories 913 Other files 20007 Data block pointers 107091505 (0 of them are zero) Safe links 0 ########### reiserfsck finished at Fri Jan 23 01:00:55 2009 ########### root@TANK:~# ;D
Archived
This topic is now archived and is closed to further replies.