October 26, 201312 yr Hoping some knowledgeable folk can help sort out this situation I'm in.. I'm running 5.0 final, and about 2 weeks ago I noticed both my parity drive and one of my data drives (disk4) showing errors. Nothing was red-balled, just errors showing on the webgui. I purchased two 4TB drives to replace the existing drives showing errors (existing drives are both 2TB). The first new drive finished pre-clearing yesterday (2 cycles, 0 sectors pending re-allocation each cycle), and after doing some searching around on the forums I read that I should do a non-correcting parity check prior to the upgrade of the parity drive to 4TB. That parity check returned with no errors (and I have yet to receive an error during any of my monthly parity checks). I've attached that syslog below. Next step was to remove the old parity drive and replace with new (stop array, remove drive, etc..). Once the array was back up, I assigned the new drive to the parity slot and it started to re-build parity. I've now received 475 corrected errors during the parity-sync, and the webgui shows 475 errors on disk4. I still have the old parity drive and have not touched it since the last parity check with 0 errors. How do I proceed? Can I even trust the old parity drive because it itself was showing errors on the webgui prior to the upgrade? I can't grab a SMART report for the old parity drive right now (currently pre-clearing the other new 4TB drive), but I've attached the SMART report for disk4, as well as the most current syslog. Thanks in advance. Disk4 SMART -> http://tny.cz/c6cd4d20 Syslog prior to parity upgrade -> http://tny.cz/50b506d9 Syslog after parity upgrade/sync -> http://tny.cz/779f7495
October 26, 201312 yr If your parity check showed zero sync errors, and you knew Disk #4 had problems, you should have replaced that disk BEFORE upgrading the parity drive. Obviously this would have meant you had to buy another drive no larger than your old parity disk ... although if you were confident the disk was actually okay (loose cables or not well seated in a cage), you could have unassigned it; Started the array so it showed a missing disk; Stopped the array and reassigned it; and then Started the array and let it rebuild the disk. You COULD still do this IF you're (a) CERTAIN you haven't written any data to the array since you removed the old parity disk; and (b) are confident the old parity disk has good parity. To do that, just replace the old parity disk; do a New Config -- assigning everything as it is now with the old parity disk -- and check the "Trust Parity" box. Then you can do the Stop; unassign disk; Start; Stop; reassign disk; and Start process to rebuild Disk #4. But DO NOT do that if you have any doubts about the integrity of the parity disk or whether or not you may have had writes to the array. As for what's happening now -- it's not clear what you're saying. You indicated you're getting errors on Disk4, but also say you're seen "... 475 corrected errors during the parity sync". There's NO parity available during a parity sync ... that's what it's building -- so there's nothing to correct. The log indicates the sync was completed successfully, so I'm not sure what you're saying here. IF, however, there were indeed a lot of errors when reading disk4, that would result in BAD parity ... as the parity bit will be computed based on the bad data. Doing a rebuild with the original configuration (and original parity disk) would seem the best thing to do now IF you are absolutely POSITIVE you never wrote to the array, and the old parity disk is good.
October 26, 201312 yr Author Hi garycase, thanks for your reply! If your parity check showed zero sync errors, and you knew Disk #4 had problems, you should have replaced that disk BEFORE upgrading the parity drive. Obviously this would have meant you had to buy another drive no larger than your old parity disk ... although if you were confident the disk was actually okay (loose cables or not well seated in a cage), you could have unassigned it; Started the array so it showed a missing disk; Stopped the array and reassigned it; and then Started the array and let it rebuild the disk. I knew disk4 was bad, not just a loose cable. I also knew I was getting errors on my old parity drive at the same time. So in that situation, which one should I have replaced first? You COULD still do this IF you're (a) CERTAIN you haven't written any data to the array since you removed the old parity disk; and (b) are confident the old parity disk has good parity. To do that, just replace the old parity disk; do a New Config -- assigning everything as it is now with the old parity disk -- and check the "Trust Parity" box. Then you can do the Stop; unassign disk; Start; Stop; reassign disk; and Start process to rebuild Disk #4. But DO NOT do that if you have any doubts about the integrity of the parity disk or whether or not you may have had writes to the array. Not an option any longer as I know the array has been written to since that parity check that showed 0 errors. As for what's happening now -- it's not clear what you're saying. You indicated you're getting errors on Disk4, but also say you're seen "... 475 corrected errors during the parity sync". There's NO parity available during a parity sync ... that's what it's building -- so there's nothing to correct. The log indicates the sync was completed successfully, so I'm not sure what you're saying here. IF, however, there were indeed a lot of errors when reading disk4, that would result in BAD parity ... as the parity bit will be computed based on the bad data. Doing a rebuild with the original configuration (and original parity disk) would seem the best thing to do now IF you are absolutely POSITIVE you never wrote to the array, and the old parity disk is good. Sorry if I didn't explain it well enough. Looking in Unmenu during the parity re-build, it was showing 475 corrected parity-sync errors. After the rebuild, this is what it looks like: The unRAID webgui is where it shows the 475 errors on disk4; but it also says that parity is valid.
October 26, 201312 yr Okay, so at this point you have a "good" parity drive -- but there's a question as to what data was incorporated in those parity calculations (i.e. it may be incorrect data on disk4). And you also have a questionable disk4 ... but it's not "red balled", so the system hasn't encountered any write errors to it. So you basically have three choices ... (1) Assume that disk4's data is okay, but since you know the drive has issues, go ahead and replace it [use the Stop; unassign; Start; Stop; Assign new drive; Start process to force a rebuild onto a new disk]. This will rebuild the data onto a new disk4 ... but if the data was bad, it will still be bad. or (2) Replace disk4 and do a "New Config" with the new disk4 (much faster, but you'll be running "at risk" while a new parity sync is done). Then copy the data that was on disk4 from your backups to the array. [Alternatively, you could attach the old disk4 to another system (e.g. a Windows PC with the free LinuxReader installed) and copy the data from it to the array => but, as with #1, this means you may be copying corrupted data.] or (3) Do nothing. The system "thinks" everything is okay. The SMART data for Disk4 is okay, and the system hasn't red-balled it, so it hasn't detected any write errors. It's quite possible all of the read errors were corrected by re-writing the sectors from reconstructed data (that's what fault-tolerance is all about). I'd tend towards #3 as long as you have good backups. In fact, what I'd do at this point is run a comparison between the data on disk4 and my backups.
October 27, 201312 yr Author Okay, so at this point you have a "good" parity drive -- but there's a question as to what data was incorporated in those parity calculations (i.e. it may be incorrect data on disk4). And you also have a questionable disk4 ... but it's not "red balled", so the system hasn't encountered any write errors to it. So you basically have three choices ... (1) Assume that disk4's data is okay, but since you know the drive has issues, go ahead and replace it [use the Stop; unassign; Start; Stop; Assign new drive; Start process to force a rebuild onto a new disk]. This will rebuild the data onto a new disk4 ... but if the data was bad, it will still be bad. or (2) Replace disk4 and do a "New Config" with the new disk4 (much faster, but you'll be running "at risk" while a new parity sync is done). Then copy the data that was on disk4 from your backups to the array. [Alternatively, you could attach the old disk4 to another system (e.g. a Windows PC with the free LinuxReader installed) and copy the data from it to the array => but, as with #1, this means you may be copying corrupted data.] or (3) Do nothing. The system "thinks" everything is okay. The SMART data for Disk4 is okay, and the system hasn't red-balled it, so it hasn't detected any write errors. It's quite possible all of the read errors were corrected by re-writing the sectors from reconstructed data (that's what fault-tolerance is all about). I'd tend towards #3 as long as you have good backups. In fact, what I'd do at this point is run a comparison between the data on disk4 and my backups. Ok I've compared against my backups and all seems well to my eyes. I will be replacing the drive, so I'll follow your step #1 and report back. Should be another few days as the new disk is almost halfway through its pre-clear. Thanks gary
October 27, 201312 yr Sounds good. Sounds like you'll have a good array in another day or two. You could then do a pre-clear on the disk you just replaced, just to confirm whether it's actually bad or not ... although given its history, I'd be more inclined to relegate it to off-line backup purposes than to put it back in service in the array.
Archived
This topic is now archived and is closed to further replies.