Scheduled Parity Check

syntaxx · October 3, 2012

What is the recommended scheduled parity check? Are there some? Mine is set every week is that alright?

dalben · October 3, 2012

I used to do it weekly when I could be sure it would be finished in 7 hours (midnight to 7am).

Now I do it monthly because it runs into the day when I want to use the server. If the parity check speed ever speeds up I would consider going back to weekly.

WeeboTech · October 3, 2012

I do mine monthly on the 28th day. (rather then the first of the month).

I do daily disk read tests with badblocks on days 1-20 with each disk # scheduled for the day of the month.

Last time I had to do a recovery, a bad block showed up in the middle of the disk causing the rebuild to fail.

That caused all sorts of grief. At least now I'll get monthly read tests and smart reports to see if I'm getting pending sectors.

When unRAID goes final, I'll release the package. I'm still working on it.

It would also require monthly checks to occur on a different day then the first of the month or everything will slow down.

chip · October 4, 2012

Weebotech -

How do you schedule it for the 28th of the month?

Also can you expand on disk read tests and how you do those....

Thanks

WeeboTech · October 4, 2012

Weebotech -

How do you schedule it for the 28th of the month?

Also can you expand on disk read tests and how you do those....

Thanks

This is my monthly_parity_check.sh script.

#!/bin/sh

[ ${DEBUG:=0} -gt 0 ] && set -x -v

CRONTAB=/tmp/crontab
CRONTAB=/var/spool/cron/crontabs/root-
SCHEDULE="00 00 27 * *"
CRONLINE="/root/mdcmd check NOCORRECT"
# crontab -l > ${CRONTAB}

if ! grep -q "${CRONLINE}" ${CRONTAB}
   then
    echo "# check parity near the end of the month at midnight with /root/mdcmd check." >> ${CRONTAB}
    echo "# We do this on 27th so that long running checks finish before new month.   " >> ${CRONTAB}
    echo "# Also to insure that days 1-24 are free for checking individual drives     " >> ${CRONTAB}
    echo "# without interference.                                                     " >> ${CRONTAB}
    echo "${SCHEDULE} ${CRONLINE}" >> ${CRONTAB}
    # crontab ${CRONTAB}
    cat /var/spool/cron/crontabs/root- | crontab - -u root
fi

exit

FWIW, I was wrong, I do it on the 27th, just in case it runs past 24 hours. I would not want it to interfere with the next day's disk read test.

When we go past 27 drives.. We may have a lil trouble here. I may have to use some kind of external schedule file or do more then one disk a day.

WeeboTech · October 4, 2012

Also can you expand on disk read tests and how you do those....

The break down is.

1. read /proc/mdcmd and load into a bash array.

2. Find today's day of month.

3. Select the disk # that matches today's day of month.

4. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PRE

5. badblocks-1.42 -o /boot/logs/rdevModel.rdevSerial.CCYYMMDD.badblocks

6. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PST

If there are any badblocks in the badblocks file, post a message to syslog. (for now).

I have not worked out all the details. I plan to alter badblocks-1.42 to print the speed at which the reads are going and provide updates every minute instead of every second. Then pipe that to logger which would go into the syslog file.

If I can get this to work out in the background correctly, It would be a good basis for triggering via emhttp and/or doing a preclear.

Joe L's preclear script is great, but I think using badblocks instead of DD would be better. Once I work out the details of a customized badblocks I think we can support this going forward.

So in the meantime I'm testing my script out via cron to see how things pan out.

Every day from day's 1-20 the monthly_disk_check.sh is run, selecting the disk that matches the current day of month.

This means a full disk read 2x a month with verification.

Once with a daily badblocks read test, second with a full parity check.

tr0910 · October 24, 2012

The break down is.

1. read /proc/mdcmd and load into a bash array.

2. Find today's day of month.

3. Select the disk # that matches today's day of month.

4. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PRE

5. badblocks-1.42 -o /boot/logs/rdevModel.rdevSerial.CCYYMMDD.badblocks

6. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PST

If there are any badblocks in the badblocks file, post a message to syslog. (for now).

I have not worked out all the details. I plan to alter badblocks-1.42 to print the speed at which the reads are going and provide updates every minute instead of every second. Then pipe that to logger which would go into the syslog file.

If I can get this to work out in the background correctly, It would be a good basis for triggering via emhttp and/or doing a preclear.

Joe L's preclear script is great, but I think using badblocks instead of DD would be better. Once I work out the details of a customized badblocks I think we can support this going forward.

So in the meantime I'm testing my script out via cron to see how things pan out.

More awesomeness in the works. I am following your filelist project too. Love it....

I am looking for documentation for the badblocks 1.42 program in your google code page.

1. If I do a "badblocks /dev/sda" will it do a destructive write test or is this safe on a live working array drive?

2. What is the correct way to do a 4 pass write test on a blank non array drive? Is this destructive?

3. What is the difference between badblocks 1.4.2 on your google code page and the unraid native one?

Again, my linux skills are woeful. Thanks for your work.

WeeboTech · October 24, 2012

The break down is.

1. read /proc/mdcmd and load into a bash array.

2. Find today's day of month.

3. Select the disk # that matches today's day of month.

4. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PRE

5. badblocks-1.42 -o /boot/logs/rdevModel.rdevSerial.CCYYMMDD.badblocks

6. smartctl -a > /boot/logs/rdevModel.rdevSerial.CCYYMMDD.PST

If there are any badblocks in the badblocks file, post a message to syslog. (for now).

I have not worked out all the details. I plan to alter badblocks-1.42 to print the speed at which the reads are going and provide updates every minute instead of every second. Then pipe that to logger which would go into the syslog file.

If I can get this to work out in the background correctly, It would be a good basis for triggering via emhttp and/or doing a preclear.

Joe L's preclear script is great, but I think using badblocks instead of DD would be better. Once I work out the details of a customized badblocks I think we can support this going forward.

So in the meantime I'm testing my script out via cron to see how things pan out.

More awesomeness in the works. I am following your filelist project too. Love it....

I am looking for documentation for the badblocks 1.42 program in your google code page.

1. If I do a "badblocks /dev/sda" will it do a destructive write test or is this safe on a live working array drive?

The default is a read only test. The issue I've come across is, emhttp will try to spin down the drive midstream.

I read the output of /proc/mdcmd and run a back ground process that touches a file on the drive every minute.

In some respect this is probably good as it will reset the arm of the drive to another position.

At the end of badblocks the background process is removed.

2. What is the correct way to do a 4 pass write test on a blank non array drive? Is this destructive?

badblocks -sv -w -o /tmp/badblocks.out /dev/sd?

It is destructive.

by hand I suggest

smartctl -a /dev/sd? > /tmp/smartctl.start

badblocks -sv -w -o /tmp/badblocks.out /dev/sd?

smartctl -a /dev/sd? > /tmp/smartctl.end

diff -u /tmp/smartctl.start /tmp/smartctl.end

inspect /tmp/badblocks.out and the diff report.

3. What is the difference between badblocks 1.4.2 on your google code page and the unraid native one?

unRAID's version is slightly older, the status screen is not as informative as 1.42.

I plan to modiify 1.42 to also spit out how many MB/s like Joe L's DD so we can use it as a replacement for the DD command in Joe's preclear script.

it will add an even higher level of confidence to the preclear because if ANY badblocks are reported it can easily be tested in the output file.

If any exist, the drive should NOT be used with unRAID.

The last part of the 4 pass write is 0x00 just like Joe's DD and the preclear.

This means after a badblocks 4 pass test you can just write the MBR/Partition and preclear signature.

I've had drives pass the preclear without reallocated sectors but not the badblocks test. It's pretty thorough.

subwars · October 27, 2012

wow, look forward to seeing this added to a package and scripted

subwars · March 3, 2013

Hi WeeboTech, just wondering if you've managed to do any development on a badblocks automation script?

looks like i've got a bad block on my cache drive, just pulled it and ran seatools which failed, so has prompted me that i really should run something on all my other drives.

can someone confirm for me the correct procedure to run a safe test on my active live disks

WeeboTech · March 3, 2013

Hi WeeboTech, just wondering if you've managed to do any development on a badblocks automation script?

looks like i've got a bad block on my cache drive, just pulled it and ran seatools which failed, so has prompted me that i really should run something on all my other drives.

can someone confirm for me the correct procedure to run a safe test on my active live disks

I have not gotten any further since my apartment was flooded by Sandy.

I just bought an areca controller. When it comes in I might be able to capture some source code form the raid1 part of the array. I don't know yet.

In any case,

You can do badblocks in readonly mode on the raw device while unRAID is in maintenance mode.

If you start emhttp, then you have to do something to keep the md device busy, i.e. periodic reads or writes, or turn off the sleep timer. Otherwise emhttp will spin down the drive while badblocks is reading it thus causing allot of spinup/spindowns.

If you want to force a clearing or rewrite of all sectors, get your data off, then do badblocks in 4 pass write mode.

Automatic · March 3, 2013

If you start emhttp, then you have to do something to keep the md device busy, i.e. periodic reads or writes, or turn off the sleep timer. Otherwise emhttp will spin down the drive while badblocks is reading it thus causing allot of spinup/spindowns.

Will it just cause one every hour (Spin down time is an hour) or one every couple of seconds?

WeeboTech · March 3, 2013

If you start emhttp, then you have to do something to keep the md device busy, i.e. periodic reads or writes, or turn off the sleep timer. Otherwise emhttp will spin down the drive while badblocks is reading it thus causing allot of spinup/spindowns.

Will it just cause one every hour (Spin down time is an hour) or one every couple of seconds?

It will go into a loop because you are doing badblocks on the /dev/sd? device and unraid is monitoring the /dev/md? device.

I suppose you could use the /dev/md? device, but then if the read fails, badblocks won't get the status. unRAID will emulate the failed drive thus defeating the purpose of badblocks.

Anyone know of a way to turn off spindown temporarily for a device?

madburg · March 4, 2013

I have to assume issuing a hdparm -S0 /dev/xxx is a no go here as unRAID will bring down the drive based on global or spin up group set.

So the only 2 things that come to mind at the moment is:

1) have the script store the current value of the global "Default spin down delay", then changing the global "Default spin down delay" to "Never" until the script completes. This would affect other drives your not currently running the script on/against from spinning down (so may not be what someone wants). Change back the global "Default spin down delay" to the original value once script completes.

2) have the script set a cron job to run every few minutes writing a file to the disk the script is running against. Once script completes, the script should remove the cron job and delete the temp file (that was used for not having the drive spin down) is was writing .

subwars · March 4, 2013

Can you please elaborate on the emhttp thing. I've got no problem manually setting each disk i'm working on to spin down never while running the scan, and setting it back when finished. just need what commands are required, ps i dont want it wiping data(destructive mode) and i want the server still available, sharing the data while the scans are being done.

WeeboTech · March 4, 2013

2) have the script set a cron job to run every few minutes writing a file to the disk the script is running against. Once script completes, the script should remove the cron job and delete the temp file (that was used for not having the drive spin down) is was writing .

My working script actually did this, It spawned a co-process that would read the drive every minute. When the parent ended, so did the co-process. I'm pretty close to ordering a new laptop capable of vm's and I've ordered/received parts for an ESXi/unRAID machine. I should be up and running at the end of the month.

madburg · March 4, 2013

So sounds like your script is good to go then. Are you in NYC? If so and there's anything you need, even if too borrow and I have it, your welcome to it. Just PM me.

WeeboTech · March 4, 2013

So sounds like your script is good to go then. Are you in NYC? If so and there's anything you need, even if too borrow and I have it, your welcome to it. Just PM me.

My script is GONE! It was underwater for many hours. The flash drive no longer exists.

I have to re-write the script.

Should'nt be that hard, but I have to build an unRAID server first.

It was pretty basic.

Took day of month as the number of the drive to process.

fired off a co-process. disowned it remembered the pid.

ran smartctl

ran badblocks

ran smarctl again

diffed them.

killed the co-process

Exited.

It ran on days 1-26, I ran parity check on the 27th so it would finished before the first of the month.

This way you got a monthly badblocks test of every drive and a monthly parity check of the whole system.

Eventually I was going to add a tool to capture a filelist of the drive.

Then do an md5sum on new files so you had a list of what was there and what the md5 was in the event you lost a drive and could only recover partially.

I could possibly have the filelist/md5 part on my raid1 drive from the Areca, but I have wait for an areca controller to come in before I access those files... Dontcha just know it. I actually migrated allot of the raid1 data to an SSD.. but it was too low on the server and was submerged. We'll see how I fare with the raid1 array since that was highest in the server and many of those drives survived the flood.

madburg · March 4, 2013

oh man

WeeboTech · March 5, 2013

oh man

It's the pits but I'll rebuild. I saved about 75% of the drives. This is where unRAID really helped. I'll have only lost the drives that were underwater and will not spin back up. The others will be OK.

madburg · March 5, 2013

Good luck to you!

Scheduled Parity Check

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation