March 28, 201016 yr I have been having issues for some time now, and with this particular slot 8 in my raid. It started a few weeks ago, when I decided to replace a 500GB drive with a 2TB one. After the rebuild finished, there were about 15,000 sync errors. I redid parity sync, and I got 0 errors. But later when I started copying files to that drive I started getting numerous errors again, hundreds new each few seconds when copying. I had a spare 2TB drive for emergency such as this, so I replaced the drive and did a rebuild. No errors this time. All was fine for a couple of weeks. Today I open the console and see that drive shows a red dot next to it. I did copy a few files yesterday, but did not check the status, so not sure when that happened. Anyway, forgot to capture syslog. Rebooted the server - no change. Shut down - reseated power connector, replaced sata cable - no effect. Finally, shut down, unplugged the cache drive, and plugged drive 8 into a different slot, thinking maybe the slot on the card was faulty - nothing. Also tried to connect directly to the board sata slot - again, no effect. I am at a loss at this point and need help, please. I checked the syslog I just captured after a reboot - don't see any errors. But then again, I don't know how what to look for besides error/fail/warn/etc. syslog.txt
March 28, 201016 yr You have several issues. 1. one of your disks has a very large number of re-allocated sectors. 5 Reallocated_Sector_Ct 0x0033 159 159 140 Pre-fail Always - 328 Its current normalized value is 159, when it gets to 140 the SMART logic will consider the pool of reserve blocks all used and the drive will report as FAILING_NOW. I personally would not wait that long and plan on an RMA now. I'd also copy any files off of it that you think are critical. Parity sync errors are NOT reported on a specific drive, they are because when reading ALL your drives the calculated parity on a given bit position was not an even number. The errors could be because of anything... a disk, a drive cable, bad memory, bad or noisy power, anything. They should never repeat, since when a parity error is detected the parity drive is updated. If a subsequent errors occurs, then typically the data being read is inconsistent, again because of a disk, a drive cable, bad memory, bad or noisy power, anything. A disk is made "red" when it cannot be written to. It is taken off-line and will not go back online (even if it was a lose cable you re-seated) until you un-assign it, reboot, then re-assign it. that reboot with it un-assigned will cause unRAID to forget its serial number and allow it to think it is a replacement drive. We must assume that the drive with the "red" ball failed when it was written to at some point. The failure could have been a loose power or data cable, a drive whose firmware was re-calibrating itself and timed out, a faulty drive, your guess is as good as mine.) You moving and re-assigning the drives as you are, has only muddied the situation. Errors reported on a specific drive are "read" errors. Those are when the drive reports it cannot read a given sector. (That same inability to read a sector usually marks it for re-allocation by the drive itself) There is a procedure in the wiki on how to re-enable a drive with a red-ball. http://lime-technology.com/wiki/index.php?title=FAQ#Hard_Drives It also describes what the red ball indicates. Your description of the errors makes it difficult to learn what you are really experiencing. A current syslog would be helpful. Joe L.
March 28, 201016 yr Thanks Joe. The disk with 328 reallocated sectors is the parity disk, so I guess I don't need to backup anything off of it. True, nothing to backup. The disk with the read ball is the data disk, which according to SMART test is perfectly healthy. Yes, but remember.. a "write" to it failed. That is why it is marked with a red ball. You don't know why the write failed, but you can be certain the data on that disk is NOT as you expect. (otherwise the write would have succeeded) So, am I correct that what I need to do is: 1. Backup critical data from the data drive with red ball. You can try, but remember the contents will be re-constructed from parity in combination with the other disks. If the parity drive keeps finding more sectors it cannot read it will not be able to re-construct the correct contents correctly. You'll just need to see. 2. Re-enable the drive (use disk reconstruct procedure, not the restore procedure aka Trust my Array). Un-assign the drive, reboot, re-assign it. Then use "Start" to begin the reconstruction. DO NOT USE the button labeled as "restore" as that immediately invalidates parity and prevent any re-construction of the failed data drive. You are correct, do not use the trust my parity process. 3. Replace parity drive with a new one. Attached is my current syslog. I'll take a look.
Archived
This topic is now archived and is closed to further replies.