UNRAID 4.4 - Failed Drive? Slow Parity Check? Sync Errors? - unRAID Server 4.3 [No new topics]

January 7, 200917 yr

Hi All,

I couldnt see a 4.4 forum so I have posted this in here.

Recently I upgraded from a perfectly working unraid 4.3 + BubbaRaid -> unraid 4.4 + BubbaRaid (the version that works with 4.4). This seemed to be working without any issues until a few days ago, I replaced my 120gig IDE drive (cache) with an 80gig SATA drive (cache) and also added a 200gig SATA drive too. I now use only 1x IDE drive.

I noticed "1" error on the stats page yesterday so I did a parity check... the partity check was going at 1,500 kb/s (compared to the normal 55,000 kb/s) so I rebooted. Before I rebooted, I checked 'top' in telnet and noted that a process called 'kblockd' was using 100% of the CPU. Google says this is a kernel memory thing, however, this server has 1gb of RAM and is only running unraid + bubbaraid, so it shouldnt be running out of RAM.

After the reboot, I did a parity check and it said it had found 6 errors, but the old IDE drive (although its only ~1 yrs old) showed 1 error and no other drives showed an error. So... I ran the parity check again, this time it said there were '5' sync errors, and 2 errors were on the old IDE drive (with 0 errors on any other drive).

Again, thinking this was weird, I ran the parity check... and yep, 100% cpu usage and a really slow check (1,500kb/s).. this time the process was used by unraidd. I'll do another reboot tonight and re-run parity, but does it seem like the old IDE might be on its way out?

oh snap.. I just found a bug... if you take the array offline ('stop') whilst doing a parity check, it sets the other drives as unformatted.. quite dangerous really (can someone else try this?).

Syslog is attached, however, interesting bits are below:

Jan 5 15:31:31 TANK kernel: ReiserFS: md2: journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Jan 6 07:06:06 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 07:06:06 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 07:06:06 TANK kernel: ide: failed opcode was: unknown

Jan 6 07:10:48 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 07:10:48 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 07:10:48 TANK kernel: ide: failed opcode was: unknown

This drive has been working perfectly before changing IDE drives... would this be just a coincidence or could it be related to using 4.4?

Thanks

January 7, 200917 yr

Author

The forum won't let me add attachments (the syslog is 12kb zipped)...

January 15, 200917 yr

Author

Bump :-(

This is still an issue.

<snip>

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected

Jan 10 21:42:35 TANK kernel: ide0: reset: success

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

<snip>

Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty?

January 15, 200917 yr

Bump :-(

This is still an issue.

<snip>

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected

Jan 10 21:42:35 TANK kernel: ide0: reset: success

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

<snip>

Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty?

The errors you are showing are frequently affiliated with bad hardware . It could be the drive, but could as easily be the drive cable or the disk controller.

You said you moved drives around. Did you use a 80 conductor flat IDE cable, or did you use an older 40 conductor one you had laying around. (Older cables made for floppy disks cannot handle the higher speed of today's disk drives)

Did you use a "round" cable, or a cable longer than 24 inches. Good possibility neither will meet the proper specs for reliable high speed operation.

The only other test you can run is a "SMART" test using "smartctl" Details in the "Troubleshooting" section in the wiki, but for drive hda the command would be

smartctl -a -d ata /dev/hda

If smartctl complains about a missing library you'll need to download and install it. Details here: in this post

Joe L.

January 15, 200917 yr

Author

I didn't change the IDE cable at all except for removing the second IDE device.

Smart did complain about a missing binary, which I have now fixed.

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

The cable is an 80 pin flat.

smart status:

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family: Seagate Barracuda 7200.10 family

Device Model: ST3500630A

Serial Number: 9QG1TV5X

Firmware Version: 3.AAE

User Capacity: 500,107,862,016 bytes

Device is: In smartctl database [for details use: -P show]

ATA Version is: 7

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Fri Jan 16 01:09:11 2009 GMT-10

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 430) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 163) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 105 082 006 Pre-fail Always - 191221885

3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2132

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 6

7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 273225045

9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10613

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 186

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 070 048 045 Old_age Always - 30 (Lifetime Min/Max 24/32)

194 Temperature_Celsius 0x0022 030 052 000 Old_age Always - 30 (0 17 0 0)

195 Hardware_ECC_Recovered 0x001a 061 053 000 Old_age Always - 95733308

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 102

200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0

202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1

ATA Error Count: 101 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:17.347 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:16.023 SET MULTIPLE MODE

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:14.162 SET MULTIPLE MODE

00 00 40 00 00 00 00 06 08:03:16.463 NOP [Abort queued commands]

ef 03 40 00 00 00 e0 02 08:03:16.023 SET FEATURES [set transfer mode]

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:14.172 SET MULTIPLE MODE

00 00 40 00 00 00 00 06 08:03:14.162 NOP [Abort queued commands]

ef 03 40 00 00 00 e0 02 08:03:14.152 SET FEATURES [set transfer mode]

25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:14.162 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:14.152 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.141 READ DMA EXT

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 10432 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

January 15, 200917 yr

I didn't change the IDE cable at all except for removing the second IDE device.

Smart did complain about a missing binary, which I have now fixed.

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

The cable is an 80 pin flat.

smart status:

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family: Seagate Barracuda 7200.10 family

Device Model: ST3500630A

Serial Number: 9QG1TV5X

Firmware Version: 3.AAE

User Capacity: 500,107,862,016 bytes

Device is: In smartctl database [for details use: -P show]

ATA Version is: 7

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Fri Jan 16 01:09:11 2009 GMT-10

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: ( 430) seconds.

Offline data collection

capabilities: (0x5b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

No Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1) minutes.

Extended self-test routine

recommended polling time: ( 163) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 105 082 006 Pre-fail Always - 191221885

3 Spin_Up_Time 0x0003 093 093 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2132

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 6

7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 273225045

9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10613

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 186

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0022 070 048 045 Old_age Always - 30 (Lifetime Min/Max 24/32)

194 Temperature_Celsius 0x0022 030 052 000 Old_age Always - 30 (0 17 0 0)

195 Hardware_ECC_Recovered 0x001a 061 053 000 Old_age Always - 95733308

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 102

200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0

202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

SMART Error Log Version: 1

ATA Error Count: 101 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:17.347 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:16.905 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:16.905 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:16.463 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:16.023 SET MULTIPLE MODE

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:14.162 SET MULTIPLE MODE

00 00 40 00 00 00 00 06 08:03:16.463 NOP [Abort queued commands]

ef 03 40 00 00 00 e0 02 08:03:16.023 SET FEATURES [set transfer mode]

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

c6 00 10 00 00 00 e0 00 08:03:14.172 SET MULTIPLE MODE

00 00 40 00 00 00 00 06 08:03:14.162 NOP [Abort queued commands]

ef 03 40 00 00 00 e0 02 08:03:14.152 SET FEATURES [set transfer mode]

25 00 08 c7 87 5d e0 00 08:03:16.023 READ DMA EXT

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

84 51 00 00 00 00 e0 Error: ICRC, ABRT at LBA = 0x00000000 = 0

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

25 00 08 c7 87 5d e0 00 08:03:14.192 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.172 READ DMA EXT

10 00 3f 00 00 00 e0 00 08:03:14.162 RECALIBRATE [OBS-4]

25 00 08 c7 87 5d e0 00 08:03:14.152 READ DMA EXT

25 00 08 c7 87 5d e0 00 08:03:14.141 READ DMA EXT

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 10432 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

One thing that seems high is the UDMA CRC error count which points to a bad cable/hardware... see the above post by Joe L

I would try and change the cables and see if that fixes it

Cheers,

Matt

January 15, 200917 yr

I didn't change the IDE cable at all except for removing the second IDE device.

Smart did complain about a missing binary, which I have now fixed.

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

The cable is an 80 pin flat.

Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors.

smart status:

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3500630A
Serial Number:    9QG1TV5X
Firmware Version: 3.AAE
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jan 16 01:09:11 2009 GMT-10
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   082   006    Pre-fail  Always       -       191221885
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2132
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       6
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       273225045
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10613
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       186
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   048   045    Old_age   Always       -       30 (Lifetime Min/Max 24/32)
194 Temperature_Celsius     0x0022   030   052   000    Old_age   Always       -       30 (0 17 0 0)
195 Hardware_ECC_Recovered  0x001a   061   053   000    Old_age   Always       -       95733308
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       102
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 101 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:17.347  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:16.023  SET MULTIPLE MODE

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:14.162  SET MULTIPLE MODE
  00 00 40 00 00 00 00 06      08:03:16.463  NOP [Abort queued commands]
  ef 03 40 00 00 00 e0 02      08:03:16.023  SET FEATURES [set transfer mode]

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:14.172  SET MULTIPLE MODE
  00 00 40 00 00 00 00 06      08:03:14.162  NOP [Abort queued commands]
  ef 03 40 00 00 00 e0 02      08:03:14.152  SET FEATURES [set transfer mode]
  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:14.162  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:14.152  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.141  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10432         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors. If those errors occurred recently it might be an indication the drive is in need of some attention.

Joe L.

January 15, 200917 yr

Very early in my life as an unRAID user, I had issues using the round IDE cables. Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently. DO NOT USE THEM! Using the infuriating flat IDE cables is what is required. Once I switched everything was stable.

January 15, 200917 yr

You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors. If those errors occurred recently it might be an indication the drive is in need of some attention.

Joe L.

they were indeed fairly recent. i guess... the current power on time when the smart report was taken was at 10613 hours, the errors were recorded at 10525 hours.. so about 100 or so power on hours before the test was taken. (taken from the smart report)

If that helps,

Matt

January 15, 200917 yr

Author

Very early in my life as an unRAID user, I had issues using the round IDE cables. Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently. DO NOT USE THEM! Using the infuriating flat IDE cables is what is required. Once I switched everything was stable.

I am using a flat cable :-)

I would try and change the cables and see if that fixes it

I will after switching the connector :-)

Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors.

Pretty sure it's connected to the middle connector, i'll switch that before changing cables.

Thanks guys for your help, i'll try out a few things and update this thread.

ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)?

Cheers

January 15, 200917 yr

ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)?

Linux is very persistent in its attempt to communicate with the drives. It will try progressively slower methods to communicate until it eventually settles on a very slow PIO mode.

Yes, PIO mode would cause everything you described... stutter, slow parity checks, etc.

It is interesting in that these exact same issues probably occur in the windows PCs we have, and that we just are not informed the drive is in PIO mode, but just see the performance degrade. Eventually we buy a faster, newer machine to read our mail, etc.

Joe L.

January 15, 200917 yr

Your syslog will clearly show drive errors, and speed/mode changes to PIO.

January 16, 200917 yr

Author

Your syslog will clearly show drive errors, and speed/mode changes to PIO.

Interesting.. I guess I should have looked for it but it only occured to me whilst posting my 'symptoms'.

snippits from old syslog when this issue occured:

Jan 5 15:31:29 TANK kernel: hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4

Jan 5 15:31:29 TANK kernel: hda: UDMA/100 mode selected

Jan 5 15:31:29 TANK kernel: Probing IDE interface ide1...

Jan 5 15:31:29 TANK kernel: ide0 at 0xaf00-0xaf07,0xae02 on irq 18

Jan 5 15:31:29 TANK kernel: ide1 at 0xad00-0xad07,0xac02 on irq 18

Jan 5 15:31:29 TANK kernel: i801_smbus 0000:00:1f.3: PCI INT B -> GSI 19 (level, low) -> IRQ 19

Jan 6 20:00:18 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 20:00:18 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 20:00:18 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:18 TANK kernel: hda: UDMA/44 mode selected

Jan 6 20:00:20 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 20:00:20 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 20:00:20 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:20 TANK kernel: hda: UDMA/33 mode selected

Jan 6 20:00:22 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:22 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 20:00:22 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 20:00:22 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:22 TANK kernel: hda: UDMA/25 mode selected

Jan 6 20:00:29 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 20:00:29 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 20:00:29 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:29 TANK kernel: hda: UDMA/16 mode selected

Jan 6 20:00:31 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 6 20:00:31 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 6 20:00:31 TANK kernel: ide: failed opcode was: unknown

Jan 6 20:00:31 TANK kernel: hda: no DMA mode selected

Jan 6 20:00:31 TANK kernel: ide0: reset: success

Looks like it was gradually slowing down to me!

Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array?

FWIW, I removed BubbaRaid and upgraded unRaid to v4.2.2 and so far have not seen another error. I highly doubt Bubbaraid was causing any issues but nonetheless, I wanted to ensure I'm not running anything 'un-necessary' whilst diagnosing this issue.

January 16, 200917 yr

Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array?

Once DMA is disabled, then you are using a PIO mode, and there should have been a message to that effect. It only affects this drive, not the others. It only affects operations that include access to this drive, such as parity checks, but could slow it down to the speed of the slowest drive. PIO modes tend to result in speeds in the low single digits, around 3MB/s is typical.

By the way, I heartily recommend installing UnMENU and using the MyMain plugin. There is a very under-emphasized feature there, perhaps undiscovered by most, that allows you to examine just the syslog messages that pertain to a single drive. Just click the SY link at the far right to see them. Another great idea from Brian!

January 19, 200917 yr

Author

Just an update..

I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one.

I have been running 2 days so far without any errors but I do see a lot of this in the syslog:

Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Should I be worried about that?

January 19, 200917 yr

Just an update..

I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one.

I have been running 2 days so far without any errors but I do see a lot of this in the syslog:

Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Should I be worried about that?

You probably need to run a reiserfsck on that drive. (The drive assigned to disk3 in your array.)

http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

Joe L.

January 20, 200917 yr

Author

root@TANK:~# samba stop

root@TANK:~# umount /dev/md3

root@TANK:~# reiserfsck /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and it fails **

** please email bug reports to [email protected], **

** providing as much information as possible -- your **

** hardware, kernel, patches, settings, all reiserfsck **

** messages (including version), the reiserfsck logfile, **

** check the syslog file for any related information. **

** If you would like advice on using this program, support **

** is available for $25 at www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md3

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Wed Jan 21 01:39:41 2009

###########

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Checking internal tree../ 2 (of 5)/140 (of 155)/ 45 (of 170)block 106846153: The number of items (3) is incorrect, should be (0)

the problem in the internal node occured (106846153), whole subtree is skipped finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

1 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Wed Jan 21 01:53:38 2009

###########

root@TANK:~#

I'll do the next part now

January 21, 200917 yr

Author

All done:

root@TANK:~# reiserfsck --rebuild-tree /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

*************************************************************

** Do not run the program with --rebuild-tree unless **

** something is broken and MAKE A BACKUP before using it. **

** If you have bad sectors on a drive it is usually a bad **

** idea to continue using it. Then you probably should get **

** a working hard drive, copy the file system from the bad **

** drive to the good one -- dd_rescue is a good tool for **

** that -- and only then run this program. **

** If you are using the latest reiserfsprogs and it fails **

** please email bug reports to [email protected], **

** providing as much information as possible -- your **

** hardware, kernel, patches, settings, all reiserfsck **

** messages (including version), the reiserfsck logfile, **

** check the syslog file for any related information. **

** If you would like advice on using this program, support **

** is available for $25 at www.namesys.com/support.html. **

*************************************************************

Will rebuild the filesystem (/dev/md3) tree

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

###########

reiserfsck --rebuild-tree started at Wed Jan 21 01:56:03 2009

###########

Pass 0:

####### Pass 0 #######

Loading on-disk bitmap .. ok, 107213798 blocks marked used

Skipping 11937 blocks (super block, journal, bitmaps) 107201861 blocks will be read

0%....20%....40%block 106846153: The number of items (3) is incorrect, should be (0) - corrected

block 106846153: The free space (0) is incorrect, should be (4072) - corrected

left 0, 16031 /secc

20919 directory entries were hashed with "r5" hash.

"r5" hash is selected

Flushing..finished

Read blocks (but not data blocks) 107201861

Leaves among those 107905

- leaves all contents of which could not be saved and deleted 1

Objectids found 20921

Pass 1 (will try to insert 107904 leaves):

####### Pass 1 #######

Looking for allocable blocks .. finished

0%....20%....40%....60%....80%....100% left 0, 88 /sec

Flushing..finished

107904 leaves read

107805 inserted

99 not inserted

####### Pass 2 #######

Pass 2:

0%....20%....40%....60%....80%....100% left 0, 66 /sec

Flushing..finished

Leaves inserted item by item 99

Pass 3 (semantic):

####### Pass 3 #########

... ard Top 100 Songs - 1951 - 2000/1968/1968-061 Donovan - Hurdy Gurdy Man.mp3vpf-10680: The file [2395 2456] has the wrong block count in the StatData (9 0) - corrected to (3696)

/MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968rebuild_semantic_pass: The entry [2395 2457] ("1968-062 Steppenwolf - Magic Carpet Ride.mp3") in direc ry [1162 2395] points to nowhere - is removed

/MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968vpf-10650: The directory [1162 2395] has the wrong size in the StatData (6888) - corrected to (6824/19 Flushing..finished

Files found: 20007

Directories found: 913

Names pointing to nowhere

Pass 3a (looking for lost dir/fil

####### Pass 3a (lost+found pass)

Looking for lost directories:

Flushing..finished36, 67 /sec

Pass 4 - finished done 0, 0

Deleted unreachable items

Flushing..finished

Syncing..finished

###########

reiserfsck finished at Wed Jan 21

###########

root@TANK:~#

January 22, 200917 yr

That looks like it may have created a mess! I really hope you made a backup of the drive. If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup. I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean.

January 22, 200917 yr

Author

That looks like it may have created a mess! I really hope you made a backup of the drive. If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup. I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean.

Unfortunately, no, I didn't make a backup of the drive (although, it has prompted me to go out and buy a portable HDD to keep 'offsite').

I haven't seen a single error or syslog entry since that check so it's looking good so far!

Thanks guys heaps for your help, I'm in debt to these forums!

nb: I have 100Mbit colo with unlimited outgoing, so if anyone wants me to help share new releases, let me know!

nnb: I'll run another scan as suggested and post the results.

Cheers

EDIT:

after running a check again:

root@TANK:~# samba stop

root@TANK:~# umount /dev/md3

root@TANK:~# reiserfsck /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

<snip>

Will read-only check consistency of the filesystem on /dev/md3

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Fri Jan 23 00:33:18 2009

###########

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

Leaves 107899

Internal nodes 706

Directories 913

Other files 20007

Data block pointers 107091505 (0 of them are zero)

Safe links 0

###########

reiserfsck finished at Fri Jan 23 01:00:55 2009

###########

root@TANK:~#

;D

January 23, 200917 yr

Tom... this thread should be moved to the appropriate 4.4 section...

UNRAID 4.4 - Failed Drive? Slow Parity Check? Sync Errors?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)