Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

UNRAID 4.4 - Failed Drive? Slow Parity Check? Sync Errors?

Featured Replies

Hi All,

 

I couldnt see a 4.4 forum so I have posted this in here.

 

Recently I upgraded from a perfectly working unraid 4.3 + BubbaRaid -> unraid 4.4 + BubbaRaid (the version that works with 4.4). This seemed to be working without any issues until a few days ago, I replaced my 120gig IDE drive (cache) with an 80gig SATA drive (cache) and also added a 200gig SATA drive too. I now use only 1x IDE drive.

 

I noticed "1" error on the stats page yesterday so I did a parity check... the partity check was going at 1,500 kb/s (compared to the normal 55,000 kb/s) so I rebooted. Before I rebooted, I checked 'top' in telnet and noted that a process called 'kblockd' was using 100% of the CPU. Google says this is a kernel memory thing, however, this server has 1gb of RAM and is only running unraid + bubbaraid, so it shouldnt be running out of RAM.

 

After the reboot, I did a parity check and it said it had found 6 errors, but the old IDE drive (although its only ~1 yrs old) showed 1 error and no other drives showed an error. So... I ran the parity check again, this time it said there were '5' sync errors, and 2 errors were on the old IDE drive (with 0 errors on any other drive).

 

Again, thinking this was weird, I ran the parity check... and yep, 100% cpu usage and a really slow check (1,500kb/s).. this time the process was used by unraidd. I'll do another reboot tonight and re-run parity, but does it seem like the old IDE might be on its way out?

 

oh snap.. I just found a bug... if you take the array offline ('stop') whilst doing a parity check, it sets the other drives as unformatted.. quite dangerous really (can someone else try this?).

 

Syslog is attached, however, interesting bits are below:

 

Jan  5 15:31:31 TANK kernel: ReiserFS: md2: journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30

Jan  6 07:06:06 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 07:06:06 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 07:06:06 TANK kernel: ide: failed opcode was: unknown

Jan  6 07:10:48 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 07:10:48 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 07:10:48 TANK kernel: ide: failed opcode was: unknown

 

This drive has been working perfectly before changing IDE drives... would this be just a coincidence or could it be related to using 4.4?

 

 

Thanks

 

  • Author

The forum won't let me add attachments (the syslog is 12kb zipped)...

  • 2 weeks later...
  • Author

Bump :-(

 

This is still an issue.

 

<snip>

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected

Jan 10 21:42:35 TANK kernel: ide0: reset: success

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

<snip>

 

Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty?

Bump :-(

 

This is still an issue.

 

<snip>

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:34 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:34 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:34 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:35 TANK kernel: hda: UDMA/66 mode selected

Jan 10 21:42:35 TANK kernel: ide0: reset: success

Jan 10 21:42:35 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:35 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:35 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

Jan 10 21:42:36 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan 10 21:42:36 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan 10 21:42:36 TANK kernel: ide: failed opcode was: unknown

<snip>

 

Main status page is only showing 1 error for this drive though... Are there any tests I can perform to see if this drive is faulty?

The errors you are showing are frequently affiliated with bad hardware .  It could be the drive, but could as easily be the drive cable or the disk controller.

 

You said you moved drives around.  Did you use a 80 conductor flat IDE cable, or did you use an older 40 conductor one you had laying around. (Older cables made for floppy disks cannot handle the higher speed of today's disk drives)

Did you use a "round" cable, or a cable longer than 24 inches.  Good possibility neither will meet the proper specs for reliable high speed operation.

 

The only other test you can run is a "SMART" test using "smartctl"  Details in the "Troubleshooting" section in the wiki, but for drive hda the command would be

smartctl -a -d ata /dev/hda

 

If smartctl complains about a missing library you'll need to download and install it.  Details here: in this post

 

Joe L.

  • Author

I didn't change the IDE cable at all except for removing the second IDE device.

 

Smart did complain about a missing binary, which I have now fixed.

 

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

 

The cable is an 80 pin flat.

 

smart status:

 

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Model Family:    Seagate Barracuda 7200.10 family

Device Model:    ST3500630A

Serial Number:    9QG1TV5X

Firmware Version: 3.AAE

User Capacity:    500,107,862,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:  7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jan 16 01:09:11 2009 GMT-10

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (  0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                ( 430) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (  1) minutes.

Extended self-test routine

recommended polling time:        ( 163) minutes.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate    0x000f  105  082  006    Pre-fail  Always      -      191221885

  3 Spin_Up_Time            0x0003  093  093  000    Pre-fail  Always      -      0

  4 Start_Stop_Count        0x0032  098  098  020    Old_age  Always      -      2132

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      6

  7 Seek_Error_Rate        0x000f  084  060  030    Pre-fail  Always      -      273225045

  9 Power_On_Hours          0x0032  088  088  000    Old_age  Always      -      10613

10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0

12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      186

187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0

189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0

190 Airflow_Temperature_Cel 0x0022  070  048  045    Old_age  Always      -      30 (Lifetime Min/Max 24/32)

194 Temperature_Celsius    0x0022  030  052  000    Old_age  Always      -      30 (0 17 0 0)

195 Hardware_ECC_Recovered  0x001a  061  053  000    Old_age  Always      -      95733308

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      102

200 Multi_Zone_Error_Rate  0x0000  100  253  000    Old_age  Offline      -      0

202 TA_Increase_Count      0x0032  100  253  000    Old_age  Always      -      0

 

SMART Error Log Version: 1

ATA Error Count: 101 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:17.347  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

 

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:16.023  SET MULTIPLE MODE

 

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:14.162  SET MULTIPLE MODE

  00 00 40 00 00 00 00 06      08:03:16.463  NOP [Abort queued commands]

  ef 03 40 00 00 00 e0 02      08:03:16.023  SET FEATURES [set transfer mode]

 

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:14.172  SET MULTIPLE MODE

  00 00 40 00 00 00 00 06      08:03:14.162  NOP [Abort queued commands]

  ef 03 40 00 00 00 e0 02      08:03:14.152  SET FEATURES [set transfer mode]

  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

 

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:14.162  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:14.152  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.141  READ DMA EXT

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline      Completed without error      00%    10432        -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

I didn't change the IDE cable at all except for removing the second IDE device.

 

Smart did complain about a missing binary, which I have now fixed.

 

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

 

The cable is an 80 pin flat.

 

smart status:

 

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

 

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda 7200.10 family

Device Model:     ST3500630A

Serial Number:    9QG1TV5X

Firmware Version: 3.AAE

User Capacity:    500,107,862,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jan 16 01:09:11 2009 GMT-10

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

 

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

 

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 ( 430) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        ( 163) minutes.

 

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   105   082   006    Pre-fail  Always       -       191221885

  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2132

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       6

  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       273225045

  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10613

10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       186

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   070   048   045    Old_age   Always       -       30 (Lifetime Min/Max 24/32)

194 Temperature_Celsius     0x0022   030   052   000    Old_age   Always       -       30 (0 17 0 0)

195 Hardware_ECC_Recovered  0x001a   061   053   000    Old_age   Always       -       95733308

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       102

200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

 

SMART Error Log Version: 1

ATA Error Count: 101 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:17.347  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

 

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:16.023  SET MULTIPLE MODE

 

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:14.162  SET MULTIPLE MODE

  00 00 40 00 00 00 00 06      08:03:16.463  NOP [Abort queued commands]

  ef 03 40 00 00 00 e0 02      08:03:16.023  SET FEATURES [set transfer mode]

 

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  c6 00 10 00 00 00 e0 00      08:03:14.172  SET MULTIPLE MODE

  00 00 40 00 00 00 00 06      08:03:14.162  NOP [Abort queued commands]

  ef 03 40 00 00 00 e0 02      08:03:14.152  SET FEATURES [set transfer mode]

  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

 

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT

  10 00 3f 00 00 00 e0 00      08:03:14.162  RECALIBRATE [OBS-4]

  25 00 08 c7 87 5d e0 00      08:03:14.152  READ DMA EXT

  25 00 08 c7 87 5d e0 00      08:03:14.141  READ DMA EXT

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     10432         -

 

SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

 

 

One thing that seems high is the UDMA CRC error count which points to a bad cable/hardware... see the above post by Joe L

 

I would try and change the cables and see if that fixes it

 

Cheers,

Matt

I didn't change the IDE cable at all except for removing the second IDE device.

 

Smart did complain about a missing binary, which I have now fixed.

 

If it helps, it seems like all drives would go into PIO mode once I get a single error (I could do a parity check at full speed on any given day until I see one error).

 

The cable is an 80 pin flat.

Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors.

smart status:

root@TANK:/boot/packages# smartctl -a -d ata /dev/hda
smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3500630A
Serial Number:    9QG1TV5X
Firmware Version: 3.AAE
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Jan 16 01:09:11 2009 GMT-10
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 163) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   082   006    Pre-fail  Always       -       191221885
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2132
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       6
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       273225045
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10613
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       186
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   048   045    Old_age   Always       -       30 (Lifetime Min/Max 24/32)
194 Temperature_Celsius     0x0022   030   052   000    Old_age   Always       -       30 (0 17 0 0)
195 Hardware_ECC_Recovered  0x001a   061   053   000    Old_age   Always       -       95733308
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       102
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 101 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 101 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:17.347  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

Error 100 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:16.905  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:16.905  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:16.463  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:16.023  SET MULTIPLE MODE

Error 99 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:14.162  SET MULTIPLE MODE
  00 00 40 00 00 00 00 06      08:03:16.463  NOP [Abort queued commands]
  ef 03 40 00 00 00 e0 02      08:03:16.023  SET FEATURES [set transfer mode]

Error 98 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  c6 00 10 00 00 00 e0 00      08:03:14.172  SET MULTIPLE MODE
  00 00 40 00 00 00 00 06      08:03:14.162  NOP [Abort queued commands]
  ef 03 40 00 00 00 e0 02      08:03:14.152  SET FEATURES [set transfer mode]
  25 00 08 c7 87 5d e0 00      08:03:16.023  READ DMA EXT

Error 97 occurred at disk power-on lifetime: 10525 hours (438 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 c7 87 5d e0 00      08:03:14.192  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.172  READ DMA EXT
  10 00 3f 00 00 00 e0 00      08:03:14.162  RECALIBRATE [OBS-4]
  25 00 08 c7 87 5d e0 00      08:03:14.152  READ DMA EXT
  25 00 08 c7 87 5d e0 00      08:03:14.141  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10432         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors.  If those errors occurred recently it might be an indication the drive is in need of some attention.

 

Joe L.

Very early in my life as an unRAID user, I had issues using the round IDE cables.  Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently.  DO NOT USE THEM!  Using the infuriating flat IDE cables is what is required.  Once I switched everything was stable.

You drive shows a series of 101 ATA errors, the last 5 being logged, and 6 re-allocated sectors.  If those errors occurred recently it might be an indication the drive is in need of some attention.

 

Joe L.

 

they were indeed fairly recent. i guess... the current power on time when the smart report was taken was at 10613 hours, the errors were recorded at 10525 hours.. so about 100 or so power on hours before the test was taken. (taken from the smart report)

 

If that helps,

Matt

  • Author

Very early in my life as an unRAID user, I had issues using the round IDE cables.  Everything seemed to be fine at first, but then I had problems where the drives would not mount consistently.  DO NOT USE THEM!  Using the infuriating flat IDE cables is what is required.  Once I switched everything was stable.

 

I am using a flat cable :-)

 

I would try and change the cables and see if that fixes it

 

I will after switching the connector :-)

 

Is the remaining drive connected to the end connector? Connecting a drive to the middle with the end disconnected could also cause errors.

 

Pretty sure it's connected to the middle connector, i'll switch that before changing cables.

 

 

Thanks guys for your help, i'll try out a few things and update this thread.

 

ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)?

 

 

Cheers

 

ps: Is it possible that unraid/linux could have thrown these drives into PIO mode once an error is found? Everything kept getting really slow (multiple streaming would stutter, parity checks slow etc)?

Linux is very persistent in its attempt to communicate with the drives.  It will try progressively slower methods to communicate until it eventually settles on a very slow PIO mode.

 

Yes, PIO mode would cause everything you described... stutter, slow parity checks, etc.

 

It is interesting in that these exact same issues probably occur in the windows PCs we have, and that we just are not informed the drive is in PIO mode, but just see the performance degrade.  Eventually we buy a faster, newer machine to read our mail, etc. 

 

Joe L.

Your syslog will clearly show drive errors, and speed/mode changes to PIO.

  • Author

Your syslog will clearly show drive errors, and speed/mode changes to PIO.

 

Interesting.. I guess I should have looked for it but it only occured to me whilst posting my 'symptoms'.

 

snippits from old syslog when this issue occured:

 

Jan  5 15:31:29 TANK kernel: hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4

Jan  5 15:31:29 TANK kernel: hda: UDMA/100 mode selected

Jan  5 15:31:29 TANK kernel: Probing IDE interface ide1...

Jan  5 15:31:29 TANK kernel: ide0 at 0xaf00-0xaf07,0xae02 on irq 18

Jan  5 15:31:29 TANK kernel: ide1 at 0xad00-0xad07,0xac02 on irq 18

Jan  5 15:31:29 TANK kernel: i801_smbus 0000:00:1f.3: PCI INT B -> GSI 19 (level, low) -> IRQ 19

 

Jan  6 20:00:18 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 20:00:18 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 20:00:18 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:18 TANK kernel: hda: UDMA/44 mode selected

 

Jan  6 20:00:20 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 20:00:20 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 20:00:20 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:20 TANK kernel: hda: UDMA/33 mode selected

 

Jan  6 20:00:22 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:22 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 20:00:22 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 20:00:22 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:22 TANK kernel: hda: UDMA/25 mode selected

 

Jan  6 20:00:29 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 20:00:29 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 20:00:29 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:29 TANK kernel: hda: UDMA/16 mode selected

 

Jan  6 20:00:31 TANK kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

Jan  6 20:00:31 TANK kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Jan  6 20:00:31 TANK kernel: ide: failed opcode was: unknown

Jan  6 20:00:31 TANK kernel: hda: no DMA mode selected

Jan  6 20:00:31 TANK kernel: ide0: reset: success

 

Looks like it was gradually slowing down to me!

 

Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array?

 

FWIW, I removed BubbaRaid and upgraded unRaid to v4.2.2 and so far have not seen another error. I highly doubt Bubbaraid was causing any issues but nonetheless, I wanted to ensure I'm not running anything 'un-necessary' whilst diagnosing this issue.

 

 

Does no DMA mode mean it's running in PIO mode? Would that mean all drives are running in PIO mode or would one drive slow down the whole array?

 

Once DMA is disabled, then you are using a PIO mode, and there should have been a message to that effect.  It only affects this drive, not the others.  It only affects operations that include access to this drive, such as parity checks, but could slow it down to the speed of the slowest drive.  PIO modes tend to result in speeds in the low single digits, around 3MB/s is typical.

 

By the way, I heartily recommend installing UnMENU and using the MyMain plugin.  There is a very under-emphasized feature there, perhaps undiscovered by most, that allows you to examine just the syslog messages that pertain to a single drive.  Just click the SY link at the far right to see them.  Another great idea from Brian!

  • Author

Just an update..

 

I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one.

 

I have been running 2 days so far without any errors but I do see a lot of this in the syslog:

 

Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

 

Should I be worried about that?

Just an update..

 

I have replaced the cable (and not as I thought, it was connected to the end of the ide cable) with a brand new one.

 

I have been running 2 days so far without any errors but I do see a lot of this in the syslog:

 

Jan 19 20:18:03 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatioon 1224, free_space(entry_count) 0

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 20:18:03 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

Jan 19 21:46:45 TANK kernel: ReiserFS: warning: is_leaf: item location seems wrong (second one): *3.6* [2139 2200 0x1ce001 IND], item_len 2616, item_locatio$

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-5150: search_by_key: invalid format found in block 106846153. Fsck?

Jan 19 21:46:45 TANK kernel: ReiserFS: md3: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2395 2457 0x0 SD]

 

Should I be worried about that?

You probably need to run a reiserfsck on that drive.  (The drive assigned to disk3 in your array.)

http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

 

Joe L.

  • Author

root@TANK:~# samba stop

root@TANK:~# umount /dev/md3

root@TANK:~# reiserfsck /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

 

*************************************************************

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to [email protected], **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

 

Will read-only check consistency of the filesystem on /dev/md3

Will put log info to 'stdout'

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Wed Jan 21 01:39:41 2009

###########

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Checking internal tree../  2 (of  5)/140 (of 155)/ 45 (of 170)block 106846153: The number of items (3) is incorrect, should be (0)

the problem in the internal node occured (106846153), whole subtree is skipped                                      finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

1 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Wed Jan 21 01:53:38 2009

###########

root@TANK:~#

 

I'll do the next part now

  • Author

All done:

 

root@TANK:~# reiserfsck --rebuild-tree /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

 

*************************************************************

** Do not  run  the  program  with  --rebuild-tree  unless **

** something is broken and MAKE A BACKUP  before using it. **

** If you have bad sectors on a drive  it is usually a bad **

** idea to continue using it. Then you probably should get **

** a working hard drive, copy the file system from the bad **

** drive  to the good one -- dd_rescue is  a good tool for **

** that -- and only then run this program.                **

** If you are using the latest reiserfsprogs and  it fails **

** please  email bug reports to [email protected], **

** providing  as  much  information  as  possible --  your **

** hardware,  kernel,  patches,  settings,  all reiserfsck **

** messages  (including version),  the reiserfsck logfile, **

** check  the  syslog file  for  any  related information. **

** If you would like advice on using this program, support **

** is available  for $25 at  www.namesys.com/support.html. **

*************************************************************

 

Will rebuild the filesystem (/dev/md3) tree

Will put log info to 'stdout'

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

###########

reiserfsck --rebuild-tree started at Wed Jan 21 01:56:03 2009

###########

 

Pass 0:

####### Pass 0 #######

Loading on-disk bitmap .. ok, 107213798 blocks marked used

Skipping 11937 blocks (super block, journal, bitmaps) 107201861 blocks will be read

0%....20%....40%block 106846153: The number of items (3) is incorrect, should be (0) - corrected

block 106846153: The free space (0) is incorrect, should be (4072) - corrected

                                                      left 0, 16031 /secc

20919 directory entries were hashed with "r5" hash.

        "r5" hash is selected

Flushing..finished

        Read blocks (but not data blocks) 107201861

                Leaves among those 107905

                        - leaves all contents of which could not be saved and deleted 1

                Objectids found 20921

 

Pass 1 (will try to insert 107904 leaves):

####### Pass 1 #######

Looking for allocable blocks .. finished

0%....20%....40%....60%....80%....100%                          left 0, 88 /sec

Flushing..finished

        107904 leaves read

                107805 inserted

                99 not inserted

####### Pass 2 #######

 

Pass 2:

0%....20%....40%....60%....80%....100%                          left 0, 66 /sec

Flushing..finished

        Leaves inserted item by item 99

Pass 3 (semantic):

####### Pass 3 #########

... ard Top 100 Songs - 1951 - 2000/1968/1968-061 Donovan - Hurdy Gurdy Man.mp3vpf-10680: The file [2395 2456] has the wrong block count in the StatData (9  0) - corrected to (3696)

/MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968rebuild_semantic_pass: The entry [2395 2457] ("1968-062 Steppenwolf - Magic Carpet Ride.mp3") in direc  ry [1162 2395] points to nowhere - is removed

/MEDIA/mp3/Billboard Top 100 Songs - 1951 - 2000/1968vpf-10650: The directory [1162 2395] has the wrong size in the StatData (6888) - corrected to (6824/19  Flushing..finished

        Files found: 20007

        Directories found: 913

        Names pointing to nowhere

Pass 3a (looking for lost dir/fil

####### Pass 3a (lost+found pass)

Looking for lost directories:

Flushing..finished36, 67 /sec

Pass 4 - finished      done 0, 0

        Deleted unreachable items

Flushing..finished

Syncing..finished

###########

reiserfsck finished at Wed Jan 21

###########

root@TANK:~#

 

That looks like it may have created a mess!  I really hope you made a backup of the drive.  If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup.  I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean.

  • Author

That looks like it may have created a mess!  I really hope you made a backup of the drive.  If so, your best choice is to delete *everything* on this drive, and copy it all back from the backup.  I'd probably run one more simple reiserfsck afterward, just to be sure it is now clean.

 

Unfortunately, no, I didn't make a backup of the drive (although, it has prompted me to go out and buy a portable HDD to keep 'offsite').

 

I haven't seen a single error or syslog entry since that check so it's looking good so far!

 

Thanks guys heaps for your help, I'm in debt to these forums!

 

nb: I have 100Mbit colo with unlimited outgoing, so if anyone wants me to help share new releases, let me know!

 

nnb: I'll run another scan as suggested and post the results.

 

 

Cheers

 

 

EDIT:

 

after running a check again:

 

root@TANK:~# samba stop

root@TANK:~# umount /dev/md3

root@TANK:~# reiserfsck /dev/md3

reiserfsck 3.6.19 (2003 www.namesys.com)

 

<snip>

<snip>

 

Will read-only check consistency of the filesystem on /dev/md3

Will put log info to 'stdout'

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Fri Jan 23 00:33:18 2009

###########

Replaying journal..

Reiserfs journal '/dev/md3' in blocks [18..8211]: 0 transactions replayed

Checking internal tree..finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 107899

        Internal nodes 706

        Directories 913

        Other files 20007

        Data block pointers 107091505 (0 of them are zero)

        Safe links 0

###########

reiserfsck finished at Fri Jan 23 01:00:55 2009

###########

root@TANK:~#

 

;D ;D ;D

 

Tom... this thread should be moved to the appropriate 4.4 section...

 

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.