Jump to content

Red ball drive and reiserfsck


Recommended Posts

I had a drive (WDC_WD20EADS) drop out with the red ball.  SMART queries reported that this drive and the parity drive didn't support SMART.  The parity drive also showed millions of errors.  I shutdown the array, rebooted and now the drives show up as being SMART friendly and no errors on the parity drive.  A few short SMART tests on the drive in question all passed although the raw_read_error_rate and calibration_retry_count look mildly suspicious:

 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

 1 Raw_Read_Error_Rate     0x002f   180   180   051    Pre-fail  Always       -       205540

 3 Spin_Up_Time            0x0027   171   149   021    Pre-fail  Always       -       8416

 4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2463

 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

 9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14721

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   068   068   000    Old_age   Always       -       65535

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       45

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       20

193 Load_Cycle_Count        0x0032   150   150   000    Old_age   Always       -       151104

194 Temperature_Celsius     0x0022   118   114   000    Old_age   Always       -       34

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

 

SMART Error Log Version: 1

No Errors Logged

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     14716         -

 

I also noticed this in the syslog after reboot:

 

Oct  1 12:18:29 unraid kernel: REISERFS warning: reiserfs-5082 is_leaf: free space seems wrong: level=1, nr_items=4, free_space=48 rdkey  (Minor Issues)

Oct  1 12:18:29 unraid kernel: REISERFS error (device md1): vs-5150 search_by_key: invalid format found in block 8211. Fsck? (Errors)

Oct  1 12:18:29 unraid kernel: REISERFS (device md1): Remounting filesystem read-only (Drive related)

Oct  1 12:18:29 unraid kernel: REISERFS error (device md1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] (Errors)

Oct  1 12:18:29 unraid kernel: REISERFS (device md1): Using r5 hash to sort names (Routine)

Oct  1 12:18:29 unraid kernel: REISERFS (device md4): Using r5 hash to sort names (Routine)

Oct  1 12:18:29 unraid logger: mount: /dev/md1: can't read superblock

Oct  1 12:18:29 unraid emhttp: _shcmd: shcmd (23): exit status: 32 (Other emhttp)

Oct  1 12:18:29 unraid emhttp: disk1 mount error: 32 (Errors)

Oct  1 12:18:29 unraid emhttp: shcmd (24): rmdir /mnt/disk1 (Other emhttp)

 

Running "reiserfsck --check /dev/md1" resulted in it telling me to run it again with "--rebuild-tree".  I'm about half way through that (estimated 10 hours to finish).  When that is done, how do I put the drive back in the array (assuming its still useable)?  Do I just make it trusted?  The make trusted FAQ page mentions only to do it if no writes have been done and I'm not sure what --rebuild-tree" does with respect to this.

 

Thanks!

 

EDIT: Syslog from after reboot attached, previous syslog was full of these repeated every 10s:

 

Oct  1 07:19:36 unraid emhttp: mdcmd: write: Input/output error

Oct  1 07:19:36 unraid emhttp: mdcmd: write: Input/output error

Oct  1 07:19:36 unraid kernel: mdcmd (116544): spindown 0

Oct  1 07:19:36 unraid kernel: md: disk0: ATA_OP_STANDBYNOW1 ioctl error: -5

Oct  1 07:19:36 unraid kernel: mdcmd (116545): spindown 1

Oct  1 07:19:36 unraid kernel: md: disk1: ATA_OP_STANDBYNOW1 ioctl error: -5

syslog-2011-10-01_1.txt

Link to comment

If you ran reiserfsck on /dev/md1, then parity has been updated with the fixes reiserfsck has made.

 

If you had a "red" indicator, then a "write" to the drive failed.  It should be re-constructed, since it is guaranteed to be incorrect. (remember, the write failed)

 

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...