Parity check chckpoint and restart


c3

Recommended Posts

Please consider adding a checkpoint and restart ability to parity check.

 

The use case would be to schedule parity for limited time periods. As drive sizes continue to increase, the length of time required to complete a parity check gets longer. By limiting the time a parity check can run, the impact during "prime" time is avoided. By having restart, the entire check will eventually get completed, instead of just starting at the beginning over and over.

 

If a parity check requires 24 hours to complete, likelihood of parity check overlapping with normal usage is approaching 100%. If the parity check is schedule for just 1 hour per day, the parity check is completed within the month, and overlap is very limited if any.

Link to comment

I want to add/quote this as it would fit better to my use case:

 

Even better and a solution I would prefer to have this days already is to perform parity checks during idle times.

Instead of spinning down the drives run the parity check if scheduled.

Doesn't matter if it takes days to complete imo. All that matters is that each bit has been checked

once in a while.

 

A scheduled check during certain times may fit for those who have the server running 24/7.

Personally, I have my server running only on demand but when it's up after 17:00 it will run til 01:00 in the night no matter if it's used or not.

Most of the time it is fired up for the kids to watch a comic before bedtime.

As far as possible we will access the server once again later for an episode or movie.

But most of the time the server will sit idling until he shuts down in the night.

 

Having the server performing the parity check during the idle period should be doable once the "checkpoint and restart" ability is installed.

Link to comment

If you were to implement this feature, could it also include a periodic check of drive temperatures? 

 

  If any drive exceeded a limit of say 49 degrees, pause the parity check for 30 min, or until all drives are below 45 degrees.

 

I wouldn't try to resolve hardware issues with special logic in the parity check code => if your temps are running in the 50's, YOU need to isolate why that's happening.    With notifications, you would get an e-mail telling you about the high temps, and could then take steps to resolve the problem.    e.g. you might want to abort a parity check and then check for a failed fan; adjust the location/airflow for your drives and fans; etc. -- whatever's needed to improve the cooling.  If you simply have a drive that's running very hot, that is a very good sign that it's failing -- so the solution to that is to replace the drive.

 

Link to comment

There are a variety of ways to address this -- at one time Tom had indicated he was looking into an automatic throttling of the checks during other activity, which would also help reduce/eliminate any impact during "prime time" on the array performance.    A "Pause/Resume" button would be a nice adjunct to this, as it would allow only running the checks when you wanted them running.    For those who don't run their array 24/7, a shutdown could automatically "Pause" any active check -- and on reboot any in-process check could automatically be resumed (as long as throttling was implemented this wouldn't cause any performance issues).

 

I agree that with the check times on the very large drives that are now being used in a lot of arrays, this is much more of an issue than it used to be.

 

Link to comment

I would definitely love a modification to the parity check behavior to address issues with parity checks that run 1+ days. Having them throttle down or paused when data needs to be accessed  would be great. Heck, mine right now shows an 11 day estimation to complete lol (that's a whole other issue). Ideally it could pause/restart at will, or based on a schedule, or automatically detect when to pause/restart.

Link to comment

This is a very interesting topic.  My array is small enough that this isn't an issue for me, but I can see how it can be for some.  One question, with stopping, and starting a parity check, and the potential for a check to take days how would writing data to the array during this time effect the check itself?  I've always run my checks at night while sleeping, and never considered writing data to the array during a check.

Link to comment

This is a very interesting topic.  My array is small enough that this isn't an issue for me, but I can see how it can be for some.  One question, with stopping, and starting a parity check, and the potential for a check to take days how would writing data to the array during this time effect the check itself?  I've always run my checks at night while sleeping, and never considered writing data to the array during a check.

 

Writing data to the array during a check simply results in slowing down both the check and the writes during the time you're doing both at once.  It doesn't cause any problems with the check itself -- if the write is to an area that's already been checked; it simply updates parity and it will be check on the next parity check;  if the write is to an area that's not yet been check, parity will be updated, and will then be checked when the check gets to that area.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.