Jump to content

Drive errors while trying to build parity. Disk marked bad, but data is in tact


Recommended Posts

My parity drive was marked bad a few weeks back.  I replaced, it, and told UnRAID to rebuild parity on to the new drive.  The parity build ran for a few hours, and I noticed that there were hundreds of thousands of errors on one of the drives, and the number was climbing quickly.

 

I restarted the array, and when it came back up both the parity and the drive that had the errors were marked as bad.  I pulled both drives, and ran the manufacturer's disk utils on them, and both disks check out fine.  I loaded the data drive in YAReG, and could see all of the data on the drive still accessible.  To be sure I copied several files from the disk, and loaded them in their respective apps.  Everything worked fine.

 

I put the drives back into the array.  They were still marked bad in the array.  On the advice of a friend I took the flash drive, and renamed super.dat to super.old.  Then I booted the array, and both drives were available.  The parity drive, and all data drives were available. 

 

I started the parity build again, and monitored it's progress.  After a time I saw the same behavior as before.  This time it was a different data drive, but it started producing a large number of errors.  I ran through the process above again, and got everything back to stable.  I decided to try it once more.  This time I copied the syslog off, so I could provide you with a fresh syslog along with screenshots when it happened again. 

 

Based on your suggestions in the other posts in this forum I verified that all drive settings in the BIOS that can be set to AHCI have been set to AHCI.  I don't believe I have IDE Emulation set.  I believe the following post is relevant, but I don't see a complete solution. http://lime-technology.com/forum/index.php?topic=21169.0

 

All told I've been through the same process three or four time with the exact same results.  The only thing that seems to change is which disk gets the errors, and is eventually marked bad. 

 

I've posted a complete sylog file for the event.

 

UnRAID version 4.7

 

I'm using the following hardware:

Mobo: asus p5b-vm do

Processor: Intel Celeron 430 Conroe-L 1.8GHz LGA 775 35W Single-Core Processor

Memory: 2GB

SATA Cards: Promise Sata 300 TX4

syslog-2012-07-16.zip

Link to comment

Post a SMART report for disk 9.

 

NOTE: I've been helping egnever with this issue. 

 

The problem is that this happens on a different disk each time.  Disk is pulled and replaced and tested and all is well. He's going to pull a mart anyway but I though I should mention that.  The PS COULD be the cause.  I have a power supply tester but if it is intermittent I am not sure how to confirm.  He's already replaced the mobo, proc and memory. 

Link to comment

Disk 9 is dying. It has reallocated 1983 sectors with 64 new sectors destined for reallocation upon write. This disk will continue to give read errors and must be replaced.

 

5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1983

197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      64

 

Post SMART reports for all of the other drives.

Link to comment

The following drives need to be monitored more closely than the others:

 

disk10.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      5

disk10.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

disk2.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1

disk2.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

disk3.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      10

disk3.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

disk5.txt:  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0

disk5.txt:197 Current_Pending_Sector  0x0032  200  199  000    Old_age  Always      -      8

disk6.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1

disk6.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

disk7.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      191

disk7.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0

disk9.txt:  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      1983

disk9.txt:197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      64

 

Disks with Reallocated_Sector_Ct need to be watched to see if the Current_Pending_Sector or Reallocated_Sector_Ct increase. A few reallocated sectors are ok. Watch these numbers for increasing values. As long as the value remains stable the disk is fine.

 

Disks with Current_Pending_Sector will give read errors until the pending sectors are rewritten. The easiest way to rewrite the sectors is to rewrite all of the sectors by doing a disk rebuild.

 

There are 2 disks with currently pending sectors and they will give read errors. 2 Disks giving read errors will prevent the reconstruction of either. I hope you have backups of the contents of disk5 and disk 9.

 

You can try to copy the files off of these disks. But I don't think it will be easy. You may need to put the disks in a Windows machine and use something like this: http://www.recoverdatasoftware.com/recover-reiserfs-files.html

 

UnRAID cannot help recover from this because all remaining disks need to be healthy in order to recover a single failed drive.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...