egnever Posted August 20, 2012 Share Posted August 20, 2012 My parity drive was marked bad a few weeks back. I replaced, it, and told UnRAID to rebuild parity on to the new drive. The parity build ran for a few hours, and I noticed that there were hundreds of thousands of errors on one of the drives, and the number was climbing quickly. I restarted the array, and when it came back up both the parity and the drive that had the errors were marked as bad. I pulled both drives, and ran the manufacturer's disk utils on them, and both disks check out fine. I loaded the data drive in YAReG, and could see all of the data on the drive still accessible. To be sure I copied several files from the disk, and loaded them in their respective apps. Everything worked fine. I put the drives back into the array. They were still marked bad in the array. On the advice of a friend I took the flash drive, and renamed super.dat to super.old. Then I booted the array, and both drives were available. The parity drive, and all data drives were available. I started the parity build again, and monitored it's progress. After a time I saw the same behavior as before. This time it was a different data drive, but it started producing a large number of errors. I ran through the process above again, and got everything back to stable. I decided to try it once more. This time I copied the syslog off, so I could provide you with a fresh syslog along with screenshots when it happened again. Based on your suggestions in the other posts in this forum I verified that all drive settings in the BIOS that can be set to AHCI have been set to AHCI. I don't believe I have IDE Emulation set. I believe the following post is relevant, but I don't see a complete solution. http://lime-technology.com/forum/index.php?topic=21169.0 All told I've been through the same process three or four time with the exact same results. The only thing that seems to change is which disk gets the errors, and is eventually marked bad. I've posted a complete sylog file for the event. UnRAID version 4.7 I'm using the following hardware: Mobo: asus p5b-vm do Processor: Intel Celeron 430 Conroe-L 1.8GHz LGA 775 35W Single-Core Processor Memory: 2GB SATA Cards: Promise Sata 300 TX4 syslog-2012-07-16.zip Link to comment
dgaschk Posted August 20, 2012 Share Posted August 20, 2012 What PSU? Disk 9 has read errors. Link to comment
egnever Posted August 20, 2012 Author Share Posted August 20, 2012 Silencer 610 EPS12V 610W Continuous @ 40C (670W Peak) 80+ Certified (83%); .99 Active PFC +12VDC @ 49A (Large Single Rail) 24-pin, 8-pin*, 4-pin M/B Connectors 2 PCI-E and 15 Drive Connectors Link to comment
dgaschk Posted August 20, 2012 Share Posted August 20, 2012 Post a SMART report for disk 9. Link to comment
hypyke Posted August 20, 2012 Share Posted August 20, 2012 Post a SMART report for disk 9. NOTE: I've been helping egnever with this issue. The problem is that this happens on a different disk each time. Disk is pulled and replaced and tested and all is well. He's going to pull a mart anyway but I though I should mention that. The PS COULD be the cause. I have a power supply tester but if it is intermittent I am not sure how to confirm. He's already replaced the mobo, proc and memory. Link to comment
egnever Posted August 21, 2012 Author Share Posted August 21, 2012 Here's the smart report for disk 9. Hypyke was right. This has happened on a different disk each time it happens. It was disk 7 once, and disk 10 another time. disk9.txt Link to comment
dgaschk Posted August 21, 2012 Share Posted August 21, 2012 Disk 9 is dying. It has reallocated 1983 sectors with 64 new sectors destined for reallocation upon write. This disk will continue to give read errors and must be replaced. 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1983 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 64 Post SMART reports for all of the other drives. Link to comment
egnever Posted August 22, 2012 Author Share Posted August 22, 2012 Here are the SMART Reports for all of my drives. smart-reports-all-disks.zip Link to comment
dgaschk Posted August 22, 2012 Share Posted August 22, 2012 The following drives need to be monitored more closely than the others: disk10.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 5 disk10.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 disk2.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1 disk2.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 disk3.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 10 disk3.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 disk5.txt: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 disk5.txt:197 Current_Pending_Sector 0x0032 200 199 000 Old_age Always - 8 disk6.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1 disk6.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 disk7.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 191 disk7.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 disk9.txt: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1983 disk9.txt:197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 64 Disks with Reallocated_Sector_Ct need to be watched to see if the Current_Pending_Sector or Reallocated_Sector_Ct increase. A few reallocated sectors are ok. Watch these numbers for increasing values. As long as the value remains stable the disk is fine. Disks with Current_Pending_Sector will give read errors until the pending sectors are rewritten. The easiest way to rewrite the sectors is to rewrite all of the sectors by doing a disk rebuild. There are 2 disks with currently pending sectors and they will give read errors. 2 Disks giving read errors will prevent the reconstruction of either. I hope you have backups of the contents of disk5 and disk 9. You can try to copy the files off of these disks. But I don't think it will be easy. You may need to put the disks in a Windows machine and use something like this: http://www.recoverdatasoftware.com/recover-reiserfs-files.html UnRAID cannot help recover from this because all remaining disks need to be healthy in order to recover a single failed drive. Link to comment
egnever Posted August 27, 2012 Author Share Posted August 27, 2012 Thanks for your help. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.