Brucey7 Posted August 11, 2015 Share Posted August 11, 2015 I had a situation where my system crashed and on reboot it destroyed parity protection. On reboot, one data disk was producing lots of URE's and it immediately wrote garbage to the parity disk and destroyed my protection. It turned out to be a faulty cable. I suggest you use the option in the scheduler for regular parity checks to enable it to reboot without writing parity changes. Now I always do a parity check without writing parity first, and if errors are found I consider my options and make an informed choice. Link to comment
garycase Posted August 11, 2015 Share Posted August 11, 2015 I agree it would be nice if the automatic parity check after an unclean shutdown required user interaction ==> i.e. do NOT auto-start the array in this case, and when you go to the GUI you get a message window that notes you've recovered from an unclean shutdown and a parity check should be run -- with a dialog message along the lines of "Run Parity Check Now?" ... this would let you NOT run that check if you didn't want to. The parity status could change to something like "Unknown" until you've actually run a check, so if you didn't do it, and later rebooted, the array would revert to its normal "auto start" setting ... but the parity status would still be "Unknown" until you actually ran a check. Link to comment
Brucey7 Posted August 11, 2015 Author Share Posted August 11, 2015 Correct me if I'm wrong, but I believe with the Auto Start the array set to OFF, following an unclean shutdown, I don't think it immediately does a parity check on power on, but there is no way when you start the array to avoid the parity check that writes parity corrections to your parity drive and risks destroying valid parity. Another answer, would be to NOT write parity for any Bit of data where there was a URE detected. If you can't read that bit of data, it's not a valid option to calculate and write parity for it when the chances are that Bit on the parity drive is ALREADY correct. I accept the current method is acceptable on a single or low number of URE's, but it's 100% the wrong solution on thousands of URE's. You lose any chance of ever recovering the disk with the URE's. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.