ednigma Posted April 29, 2010 Share Posted April 29, 2010 My original PATA unRaid server developed write errors to one of the disks a while back, but for various reasons I have not had the time to debug it. After some months, I started the server and now I had 4 drives missing -- Aha! These drives are "paired" -- the parity drive and disk 1, disk 8 and disk 9 (disk 9 had the original drive errors. I open the case and realize the for disk 9 the Y power splitter was suspect. I replaced the power splitters and reseated the IDE cable for the 4 drives and rebooted the system. Now, parity and disk1 and 8 are OK, but disk 9 was still marked disabled. I copied about 80G of data to my desktop, letting unRaid correct the data. I physically pull disk 9 and ran Spinwrite which found no errors, so I figure that I only had cabling errors and all the data is OK. So my plan was to put the drive back in, use the Trust My Array Procedure to initialize the array and run a parity check -nocorrect as a verification that no data is actually in error. After starting the array, a parity check started which I wanted to stop (so I could start a no correct), so I mistakenly pressed Stop array instead of Cancel parity. I restarted the array and unMenu says that "Parity updated 130 times to address sync errors" So now my questions... Since I feel that all my data was OK to begin with, where are these errors coming from? Parity was only running a very short time - I pressed stop as soon as I could after the array started from the Trust my array procedure. Does this mean that my parity disk has now changed and my only option is to forget about running a Parity -nocorrect and assume that my disk9 is valid and just run a normal parity check (using the restore array) letting the parity disk get updated? The disk 9 is a 250G drive almost full of which I could only copy about 80G to free space on my desktop Thanks.. Ed Link to comment
Joe L. Posted April 29, 2010 Share Posted April 29, 2010 My original PATA unRaid server developed write errors to one of the disks a while back, but for various reasons I have not had the time to debug it. After some months, I started the server and now I had 4 drives missing -- Aha! These drives are "paired" -- the parity drive and disk 1, disk 8 and disk 9 (disk 9 had the original drive errors. I open the case and realize the for disk 9 the Y power splitter was suspect. I replaced the power splitters and reseated the IDE cable for the 4 drives and rebooted the system. Now, parity and disk1 and 8 are OK, but disk 9 was still marked disabled. I copied about 80G of data to my desktop, letting unRaid correct the data. I physically pull disk 9 and ran Spinwrite which found no errors, so I figure that I only had cabling errors and all the data is OK. So my plan was to put the drive back in, use the Trust My Array Procedure to initialize the array and run a parity check -nocorrect as a verification that no data is actually in error. After starting the array, a parity check started which I wanted to stop (so I could start a no correct), so I mistakenly pressed Stop array instead of Cancel parity. I restarted the array and unMenu says that "Parity updated 130 times to address sync errors" So now my questions... Since I feel that all my data was OK to begin with, where are these errors coming from? Parity was only running a very short time - I pressed stop as soon as I could after the array started from the Trust my array procedure. Does this mean that my parity disk has now changed and my only option is to forget about running a Parity -nocorrect and assume that my disk9 is valid and just run a normal parity check (using the restore array) letting the parity disk get updated? The disk 9 is a 250G drive almost full of which I could only copy about 80G to free space on my desktop Thanks.. Ed We've seen those at the very beginning of a parity check. They are usually affiliated with the housekeeping areas of the file-system. You'll probably be fine. The reiser file-systems have journal entries that can be re-played, the parity disk does not. The "Trust" procedure will frequently find differences as well. It almost has to. Remember, the disk was taken out of service since a write to to it failed. That same "write" that failed to it was successful in updating parity. As an example, if you had written a completely new file to the disk, and on the very first write to it discovered it was un-responsive, then none of that new file ever was written to the data disk. If you invoke the "trust" procedure you'll use the data disk to correct parity, erasing the fact it ever existed. (probably NOT what you would have expected) Instead, if you had re-constructed the old contents of the data disk onto its replacement, you would have the file you had written. So, your choice. If you had not written any new files when the disk failure occurred then you can use the "trust" procedure, otherwise.... If disk9 is still marked as disabled, Stop the array un-assign disk9 Start the array (This will cause it to forget the model/serial number of disk9) Stop the array Re-assign disk9 (It will think it is a replacement disk) Start the array letting it rebuild the contents. Do NOT press the button labeled as "restore" if doing this second series of steps. We do NOT want to make it forget parity. Joe L. Link to comment
ednigma Posted April 29, 2010 Author Share Posted April 29, 2010 We've seen those at the very beginning of a parity check. They are usually affiliated with the housekeeping areas of the file-system. You'll probably be fine. The reiser file-systems have journal entries that can be re-played, the parity disk does not. So this is somewhat common, to get address sync errors at the beginning of a parity check? Are you saying that these errors are from differences in journal entries of the data drives? I failed to mention that I mounted the disk9 to my XP desktop using a PATA to USB2 adapter and YAReG-1.0 to read the disk to see if the data was there at all. I've never seen these address sync errors in any parity check before. Are you also saying that parity w.r.t. the data drives is intact? I was afraid that the reported errors resulted in the parity being updated. The "Trust" procedure will frequently find differences as well. It almost has to. Remember, the disk was taken out of service since a write to to it failed. That same "write" that failed to it was successful in updating parity. As an example, if you had written a completely new file to the disk, and on the very first write to it discovered it was un-responsive, then none of that new file ever was written to the data disk. If you invoke the "trust" procedure you'll use the data disk to correct parity, erasing the fact it ever existed. (probably NOT what you would have expected) Instead, if you had re-constructed the old contents of the data disk onto its replacement, you would have the file you had written. So, your choice. If you had not written any new files when the disk failure occurred then you can use the "trust" procedure, otherwise.... If disk9 is still marked as disabled, Stop the array un-assign disk9 Start the array (This will cause it to forget the model/serial number of disk9) Stop the array Re-assign disk9 (It will think it is a replacement disk) Start the array letting it rebuild the contents. Do NOT press the button labeled as "restore" if doing this second series of steps. We do NOT want to make it forget parity. Joe L. I already used the Trust procedure to get to this point (which includes the Restore). Before seeing your reply, I decided to unassign disk9, start the array and copy the rest of the data from the array to some space I freed on another desktop. Since I've now unassigned disk9 and restarted the array, I'm committed to the above rebuilding procedure. I'm just still hung up on those address sync errors possibly changing the parity and rebuilding will write incorrect data. I have this sinking feeling that trying to run Parity -nocorrect as a sanity check was not a good idea and I should have just started a rebuild from the start Thanks.. Ed Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.