WeeboTech

Moderators
  • Posts

    9457
  • Joined

  • Last visited

Everything posted by WeeboTech

  1. What applications or situations make sense to write to multiple drives at the same time? I plan to expand my unRAID drives one at a time (as opposed to buying 24 drives for the Norco at once) as I reach the capacity of the system, so I imagine I will only be writing to the newest drive. Let's say you are torrenting or using some other downloader to an array drive. Then you decide to save the latest edited image from your workstation or update a set of mp3 tags in a directory. The extra speed of a 7200 RPM drive will come into play (at a minor, but noticeable level). The more high speed streaming you've got going on, the more noticable it is. When I used to torrent to a 5400 RPM data drive and 5400 RPM parity drive, anything else I did on the array would drag and I would be waiting. Todays drives are larger and faster, but I would still recommend 7200 RPM drives for the parity and the busiest drive in your array. Movie, Media and read mostly can easily be on slower drives without much impedance. Should I be torrenting directly to an array drive instead of the cache? I don't know the benefits/cons of either method. I did and still do. In fact I set up an HP micro server with a 3tb 7200 RPM parity drive and 3TB 7200 RPM data drive (disk1) for the live torrents and move them to the 5400 RPM 3TB drives after I've used or choose to retire them after seeding. Not too sure what is meant by user share but I'll read some more about it. I don't know what is meant by "INCLUDE" either, but I think this may just be a result of me never having used unRAID before. This will become more evident as you use unRAID and see how to configure user shares. It's a way of configuring a user share that spans multiple disks and isolating that virtual join to specific disks. This is not something I realized ... perhaps I should just use a standard HDD instead of an SSD for my cache ... There will be a measurable difference, with direct array writes you can burst at max ethernet speed and then it will slow down to somewhere ebetween 35~60MB/s. With an SSD cache, it can sustain at a high rate of almost 90MB/s. It depends on how much data will be moved at one time and if you can wait. I move and edit mp3's all day from captured streams. I get good speed without a cache. However the cache drive is needed if you are going to run VM's or Dockers. It seems that I probably don't need a cache drive and should save my torrent files directly to the parity-protected array. As I mentioned above, I write torrents directly to the array, but I use top speed drives for the torrent drive and parity. When I get a good high speed torrent I can sustain 7.5MB/s for as long as it takes with no impedance.
  2. I would say yes, and especially if you plan to use hash checksums on the files via bunker, bitrot or any of the newer tools. Reading the data is usually at a constant speed, but having the extra horsepower to calculate one or more file hashes in parallel help. The difference from a 2.2ghz dual core amd to a 2.4 ghz dual core hyperthreaded xeon was dramatic in my case. In the past we would say no, but in today's day and age where we do more with the NAS and proactively monitor the health of our files, the extra cores help.
  3. If you write to multiple drives at the same time, then a 7200 RPM parity drive helps. This is especially true when creating a brand new parity drive. Not so much for parity check operations, but parity generate, yes. multiple parallel writes, yes to some degree. If you are doing torrenting directly to an array drive, then the torrent drive and the parity drive should be 7200 RPM. In my builds, the parity drive and the most used drive in my systems (I have about 6 mini systems) are 7200 RPM. So usually drive 1 and parity are 7200 RPM. I found I did not need to use a cache in each of these systems when configured like this. 60MB/s writes with a burstable throughput of up to 100MB/s for the first few GB works for me. Where an SSD cache saves is not having to spin up drives, but since I write all day long on some servers, it doesn't really matter. Depends on where you configure the torrents to be written. If user share, then you may want to confine it to specific drives with INCLUDE. If configured for user share, files are written to the cache first (if enabled), and moved later (if they are not open) Many of my systems use the HGST 6TB 7200 RPM NAS drives for parity. They get about 225MB/s on the outer tracks. I get excellent burst write speeds after a few kernel tunings.
  4. Preclear is probably all you need from the most basic level (note it will erase the drive) for a drive destined to be used in unRAID. on another note, the smart long(extended) test is a read only operation Another thought is to use hdsentinal, but that looks to be PC and linux oriented. some other links which may help you help yourself. http://apple.stackexchange.com/questions/135565/how-do-i-get-detailed-smart-disk-information-on-os-x-mavericks-or-later http://hints.macworld.com/article.php?story=20031122041138373 http://pondini.org/OSX/DU9.html If you can get smartmon tools on your version of OS, then you can have command line access to trigger the smart firmware tests.
  5. If you have a spare slot in your unraid server you can use the preclear script. Other then that there is the linux badblocks command. There is also the standard smart firmware commands. conveyance test which tests the drive's functionality to verify any transit damage. smart long test which scans the surface for issues. My confidence test is on an unRAID server. 1. smart conveyance test (check logs after complete) {minutes} 2. smart long test (check logs after complete) {hours} 3. badblocks 4 pass write read test (check screen for any failures) {days} 4. smart long test again (check logs after complete) {hours} If I need to add a preclear signature then I use Joe L's excellent preclear script to add the signature. This is a pretty long procedure on large drives, but it has saved me from adding new and old questionable drives into my array.
  6. Send an email to [email protected] He may be able to compile the driver and allow you to insmod the individual driver until a new release comes out.
  7. Great additions, I will incorporate them into my go script. Here is my prefix in waiting for /dev/md1 to come online. declare -a CHAR=('+' 'x'); let i=0 notices=60 DEV=/dev/md1 while [[ ${notices} -gt 0 && ! -b ${DEV} ]] do printf "Waiting $notices seconds for ${DEV}. Press ANY key to continue: [${CHAR[${i}]}]: " read -n1 -t1 && break echo -e "\r\c" (( notices-=1 )) [[ $(( i+=1 )) -ge ${#CHAR[@]} ]] && let i=0; done [ ${notices} -ne 60 ] && echo let i=0 notices=60 DIR=/mnt/disk1 while [[ ${notices} -gt 0 && ! -d "${DIR}" ]] do printf "Waiting $notices seconds for ${DIR}. Press ANY key to continue: [${CHAR[${i}]}]: " read -n1 -t1 && break echo -e "\r\c" (( notices-=1 )) [[ $(( i+=1 )) -ge ${#CHAR[@]} ]] && let i=0; done [ ${notices} -ne 60 ] && echo There is more code, but it's a visually appealing notice with a max wait value that lets you break out of the loop early when doing maintenance. Ideally this should be adjusted for the last drive in the array, however reading the sb.dat file in bash is/was more then I want to deal with.
  8. My preference would be for the last (configurable #) unread notifications to be on the dash board as a list with the other data. Allowing them to be read, deleted or archived with full view. The popups do have the description, but not the long description. The popups can be an annoying feature after being away from the webGui for weeks. I hardly ever go to the webGui, but when I do, I have to acknowledge large amounts of popup notifications. Then I have to go to notification settings and trash them. Furthermore, There's no way to click on the message and get the long notification data. The long notification data does not seem to be preserved in the .notify file. Example: /usr/local/emhttp/plugins/dynamix/scripts/notify -e event -s subject -d description -m 'long description ' root@unRAIDm:/boot/local/bin# more /tmp/notifications/unread/event_1445711452.notify timestamp = 1445711452 event = event subject = subject description = description importance = normal Here is what was received via email. Event: event Subject: subject Description: description Importance: normal long description
  9. I'm guessing... perhaps the disk partition table not cached, so the drive is examined thus causing it to spin up. Just a guess.
  10. I think that's for another thread. The unassigned devices plugin must not be using the smartctl -n standby as in root@unRAIDm:/usr/local/emhttp# find . -name '*.php' | xargs grep smartctl ./plugins/dynamix/include/DeviceList.php: if (!file_exists($smart) || (time()-filemtime($smart)>=$var['poll_attributes'])) exec("smartctl -n standby -A /dev/$device > $smart");
  11. Same for me as well. I have a Sandisk Ultra Fit and a Sandisk Cruzer fit both allocated to unRAID. What people may be unaware of is ESX also reports 'Reset high speed USB' messages in /var/log/vmkernel.log until the messages stop in unRAID. Correct, by the way I have those exact 2 USB's (One cruzer fit, one ultra fit) in my server as well . I wonder if putting the flash on a usb hub would have the same effect. No dice so far for me, putting a single flash drive on a USB hub did not stop the continuous messages.
  12. capital -S# Also when you initially set the value with -S240 or something, the drive will spin up, then wait for the counter to timeout. I would suggest setting it higher.
  13. It might be, but check with the command I provided to validate. Some drives also spin up when queried with smart or some other program. Depends on firmware. Recent drives do not have this problem, but the older drives may exhibit this behavior.
  14. You can try the following command at the command line (replacing your device id) to see it's state. root@unRAIDm:/usr/local/sbin# hdparm -C /dev/disk/by-id/ata-ST4000VN000-1H4168_S3012W7N /dev/disk/by-id/ata-ST4000VN000-1H4168_S3012W7N: drive state is: standby
  15. >> Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. 6 seconds may be too low and the firmware is overriding it. I've never gone underneath 241. My drives have spun down and are showing the same in the webGui. There also may be a timeout or webGui attribute update based on the polling time of 1800 seconds (half an hour) root@unRAIDm:/usr/local/sbin# cat /etc/unraid-version version="6.1.3" root@unRAIDm:/usr/local/sbin# /boot/local/bin/hdparm_set_default_standby.sh /dev/disk/by-id/ata-HGST_HDN726060ALE610_NAG1D7TP: drive state is: standby /dev/disk/by-id/ata-HGST_HDN726060ALE610_NAG1DEKP: drive state is: standby /dev/disk/by-id/ata-ST3000DM001-1CH166_W1F1GTFJ: drive state is: standby /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F2WFKV: drive state is: standby /dev/disk/by-id/ata-ST4000VN000-1H4168_S3012W7N: drive state is: standby /dev/disk/by-id/ata-ST4000VN000-1H4168_S3012WS6: drive state is: standby /dev/disk/by-id/ata-ST4000VN000-1H4168_S301HS8H: drive state is: standby /dev/disk/by-id/ata-ST6000DX000-1H217Z_Z4D0EE7M: drive state is: standby /dev/disk/by-id/ata-ST6000DX000-1H217Z_Z4D0EEDV: drive state is: standby /dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M: drive state is: active/idle /dev/disk/by-id/ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M: setting standby to 243 (1 hours + 30 minutes)
  16. /boot/local/bin/smartd.sh[ #!/bin/bash [ ${DEBUG:=0} -gt 0 ] && set -x -v P=${0##*/} # basename of program R=${0%%/$P} # dirname of program P=${P%.*} # strip off after last . characterA # if fd1(stdout) is not connected to terminal. # redirect to logger coprocess. # COPROC[0] is connected to the standard output of the co-process # COPROC[1] is connected to the standard input of the co-process. if [ ! -t 1 ] then coproc /usr/bin/logger -t${P}[$$] # Redirect stdout/stderr to logger eval "exec 1>&${COPROC[1]} 2>&1 ${COPROC[0]}>&-" fi [ ! -d /var/lib/smartd ] && mkdir -p /var/lib/smartd if grep -wq '^DEVICESCAN$' /etc/smartd.conf then sed -i -e 's#^DEVICESCAN$#DEVICESCAN -m root#g' /etc/smartd.conf fi renice -n 19 $$ >/dev/null exec /usr/sbin/smartd --savestates='/var/lib/smartd/' --quit=onecheck /boot/local/bin/hdparm_set_default_standby.sh #!/bin/bash [ ${DEBUG:=0} -gt 0 ] && set -x -v P=${0##*/} # basename of program R=${0%%/$P} # dirname of program P=${P%.*} # strip off after last . characterA # From hdparm settings # -s Enable/disable the power-on in standby feature, if supported by the drive. # VERY DANGEROUS. Do not use unless you are absolutely certain that both the system BIOS (or firmware) # and the operating system kernel (Linux >= 2.6.22) support probing for drives that use this feature. # When enabled, the drive is powered-up in the standby mode to allow the controller to sequence the spin-up of devices # , reducing the instantaneous current draw burden when many drives share a power supply. # Primarily for use in large RAID setups. This feature is usually disabled and the drive is powered-up in the active mode # (see -C above). Note that a drive may also allow enabling this feature by a jumper. # Some SATA drives support the control of this feature by pin 11 of the SATA power connector. # In these cases, this command may be unsupported or may have no effect. # # -S Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive. # This timeout value is used by the drive to determine how long to wait (with no disk activity) # before turning off the spindle motor to save power. Under such circumstances, # the drive may take as long as 30 seconds to respond to a subsequent disk access, # though most drives are much quicker. The encoding of the timeout value is somewhat peculiar. # A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. # Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. # Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. # A value of 252 signifies a timeout of 21 minutes. # A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, # and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. # Note that some older drives may have very different interpretations of these values. # S=243 # if fd1(stdout) is not connected to terminal. # redirect to logger coprocess. # COPROC[0] is connected to the standard output of the co-process # COPROC[1] is connected to the standard input of the co-process. if [ ! -t 1 ] then coproc /usr/bin/logger -t${P}[$$] # Redirect stdout/stderr to logger eval "exec 1>&${COPROC[1]} 2>&1 ${COPROC[0]}>&-" fi # ls -l /dev/disk/by-id | grep -v "\-part" | egrep 'ata-|scsi-' | cut -d" " -f10 | while read device ls -L1 /dev/disk/by-id | grep -v "\-part" | egrep 'ata-|scsi-' | while read device do [ "${device}" == "" ] && continue STATE=`hdparm -C /dev/disk/by-id/${device}` STATE=${STATE//$'\n'/} # Remove all newlines. echo ${STATE} STATE="${STATE##* }" if [[ "${STATE}" =~ "standby" ]] then : # echo "Skipping set of standby timer for ${device}" continue fi # -s Set power-up in standby flag (0/1) (DANGEROUS) # -S Set standby (spindown) timeout hdparm -S${S} /dev/disk/by-id/${device} done if [ ! -t 1 ] then eval "exec ${COPROC[1]}>&-" fi and the cron entry that triggers these daily. /etc/cron.d/smartd (rsynced from /boot/local/etc/cron.d/smartd in go script) 50 05 * * * /boot/local/bin/smartd.sh 2>&1 | exec /usr/bin/logger -tsmartd[$$] 05 06 * * * /boot/local/bin/hdparm_set_default_standby.sh 2>&1 | exec /usr/bin/logger -thdparm_set_default_standby[$$] # # * * * * * <command to be executed> # | | | | | # | | | | | # | | | | +---- Day of the Week (range: 1-7, 1 standing for Monday) # | | | +------ Month of the Year (range: 1-12) # | | +-------- Day of the Month (range: 1-31) # | +---------- Hour (range: 0-23) # +------------ Minute (range: 0-59)
  17. unRAID does not do automated smart tests, However unRAID does inspect attributes when the drive is spun up and part of the array. I'm unsure of the current status of unassigned devices. I believe you can see the attributes if the drive is spinning, but the attributes are not monitored. my smartd shell checks the attributes once a day with the side effect of spinning all drives up. It is separate from emhttp's attribute monitoring.
  18. Currently my script affects all connected drives. I do this for a few reasons, one of which, the timer is set to a default value and even if emhttp is not running, the drive will spin down after a set period of time. My script is probably not what you want as you can add a single command to the go script and achieve the desired results. On the hdparm manpage scroll to the -S part. http://linux.die.net/man/8/hdparm I'm including it here for quick access. -S Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive. This timeout value is used by the drive to determine how long to wait (with no disk activity) before turning off the spindle motor to save power. Under such circumstances, the drive may take as long as 30 seconds to respond to a subsequent disk access, though most drives are much quicker. The encoding of the timeout value is somewhat peculiar. A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of 5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30 minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours, and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations of these values. Find the device serial you want with Example: root@unRAIDm:~# ls -l /dev/disk/by-id | egrep -v '\-part' | egrep 'scsi-|ata-' root@unRAIDm:~# ls -l /dev/disk/by-id | egrep -v '\-part' | egrep 'scsi-|ata-' lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-HGST_HDN726060ALE610_NAG1D7TP -> ../../sdk lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-HGST_HDN726060ALE610_NAG1DEKP -> ../../sdj lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST3000DM001-1CH166_W1F1GTFJ -> ../../sdd lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST3000DM001-1CH166_Z1F2WFKV -> ../../sdg lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST4000VN000-1H4168_S3012W7N -> ../../sdh lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST4000VN000-1H4168_S3012WS6 -> ../../sdi lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST4000VN000-1H4168_S301HS8H -> ../../sdc lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST6000DX000-1H217Z_Z4D0EE7M -> ../../sde lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-ST6000DX000-1H217Z_Z4D0EEDV -> ../../sdf lrwxrwxrwx 1 root root 9 Oct 22 09:06 ata-Samsung_SSD_840_PRO_Series_S1AXNSAF701196M -> ../../sdb add a call to hdparm like this in your go script using the device/serial of the respective drive. This value does not change like the /dev/sd[a-z] does. hdparm -S243 /dev/disk/by-id/ata-ST4000VN000-1H4168_S301HS8H Set the numeric value accordingly to the calculated values in the manpage,
  19. If desired, I can post a very rudimentary shell that I use in cron to set the hdparm automated firmware spindown of every drive in the system to some value. In my particular case, I have drives in warm standby unassigned. I run smartd in onecheck mode. This is to do a smartd analysis of the drive, save the state and email me if there are issues. For assigned or unassigned drives. It has the side effect of spinning all drives up once a day. Thus I have another shell to set the spindown timer of every drive using hdparm -S2## ( > then the highest unRAID value). This causes the drives firmware to spin the drive down when inactivity reaches this timer. This allows the drives to spin down automatically even if emhttp is not running (which is sometimes the case when I do maintenance). If this value is too low there will be collisions with what unRAID is doing and the drive's firmware. Therefore this is a stopgap. I used to have a shell that would do specific hdparm options for specific drives, that was helpful for customized hdparms in the go script. That's another option for those looking to explore possibilities. Ideally the webGui should have the ability to manage unassigned drives, after all our licensing is based on drives in the array or not in the array. Therefore this should probably be requested in the Feature Request area.
  20. Not what about this surprised you, but don't expect a device test to generate pending. The surface scan does not count as a IO request. Your results don't show anything odd. Typically in my prior tests, when a surface scan fails with a read error the number of pending sectors is incremented. At least with the drives I've used. I'm tossing it. I don't use them either.
  21. While I realize this is an old post, the data presented here has historical value in that badblocks, preclear and/or a full parity read scan cannot always reveal a potential problem with a hard drive. Anecdotal Evidence Notice how the smart short tests did not reveal the problem. When checking the health of this drive I did the following procedure smart short test smart long test, Read failure. 4 pass badblocks pass 1 revealing problem, pass 2-4 did not reveal issue. 16 sectors reallocated at start, 30 sectors reallocated at end, no pending sectors. Smart long test again, read failure in a different place. No pending sectors. (now that surprised me) 4 pass badblocks, no errors reported, no sectors reallocated, no pending sectors. smart long test passes. What this reveals is that badlocks and/or preclear alone cannot detect or force the error to occur. Nor can a full badblocks read or parity check read. It's probably more likely that a problem will be revealed with a full read, but in this case that did not happen until the smart long test occurred. It's crucial to do the smart long test before inserting the drive into your array or someone may be unpleasantly surprised. When inserting a drive into the array I always do 1. conveyance test (which tests mechanics) It's a manufacture test of potential shipping damage. 2. smart long test (marks a line in the smart log) 3. Check for issues in smart report 4. preclear or my own 4 pass badblocks of xaa,x55,xff,x00 5. check for issues in smart report 6. smart long test again (marks a line in the smart log) 7 check for issues and save smart report.
  22. Anecdotal Evidence Notice how the smart short tests did not reveal the problem. With this case it's as the prior post. smart short test smart long test, Read failure. 4 pass badblocks. pass 1 revealing problem, pass 2-4 did not reveal issue. 16 sectors reallocated at start, 30 sectors reallocated at end, no pending sectors. Smart long test again, read failure in a different place. No pending sectors. (this is what really surprised me) 4 pass badblocks, no errors reported, no sectors reallocated, no pending sectors. smart long test passes. What this reveals is that badlocks or preclear alone cannot detect or force the error to occur. Nor can a full badblocks read or parity check read. It's probably more likely that a problem will be revealed with a full read, but in this case that did not happen until the smart long test occured. It's crucial to do the smart long test before inserting the drive into your array or someone may be unpleasantly surprised.
  23. I've run into issues whereby a badblocks read of the entire disk (like a parity check) succeeds without a hint of issue and a smart long test catches an LBA that is causing trouble. In fact, I ran into this about 3 days ago while testing a new drive. Executed smart long test, READ ERROR flagged at LBA nnnnnn. 4 pass badblocks executed. First pass reported bad blocks, next 3 passes succeeded without issue. checked smart, sectors were reallocated. Reran smart long test a new READ ERROR flagged at LBA oooooo (this one went much further into drive). Re-ran badblocks 4 pass test, no errors reported. Smart did not reveal any new pending sectors or additional sector reallocations. Reran smart long test succeeded without error. In addition, I've run into issues before where a parity check was executed with no hint of issue and having a double drive failure only hours later. The smart long test revealing pending sectors. I would be on the "it's not needed camp as well" However my experience has shown that a smart long test can reveal problems that other basic read tests do not. Every new drive gets a conveyance test, smart long test, 4 pass bad blocks and a final smart long test. Periodically I run smart long tests just for good measure. They have revealed LBA's that were not flagged in the normal course of a full drive read. Perhaps the smart long test is less forgiving and flags an error earlier then the firmware read ECC/recovery does.
  24. This is probably only what you need. The issue is, the spin down has to be temporarily disabled while this test is running. Until there's an API for that, it's kind of hard. I had started on some sort of dd of single random blocks, but stopped as the real way to fix this is turn off the spin down timer temporarily. The short test is easy and finishes in minutes, but it's not really comprehensive enough. Perhaps the spin down logic can inspect the smart data and if a test is being executed, skip the spin down until the test is no longer active. Oh so you're saying that currently if you are running an extended test that the disk will still spin down if it's not being accessed otherwise? Last I remember, yes. A spin down is issued and it aborts the SMART test. This may have been changed, but I may have missed it. As far as short vs extended. If we do a monthly parity check on the 27th, and do an extended test 1 drive a day, starting with disk1 - diskn where each day of the month is the disk number, this could be done nicely. At least that's how I planned to do it. That's two full sweeps of each disk a month. Another idea is to schedule a smart extended test for all drives on the 27th day and parity check on the 28th day. Sounds good but currently there is no automated way to do this correct? You just have to remember and then manually do it. That is correct. It's fairly easy to write a script to take today and turn it into a drive assignment. The issue is telling emhttp not to spin down the drive. It may be easier with unRAID 6. I have not explored it further after coming across this issue in unRAID 5. Last I remember, even the webGui SMART Long test would still abend due to emhttp's spin down functionality.
  25. This is probably only what you need. The issue is, the spin down has to be temporarily disabled while this test is running. Until there's an API for that, it's kind of hard. I had started on some sort of dd of single random blocks, but stopped as the real way to fix this is turn off the spin down timer temporarily. The short test is easy and finishes in minutes, but it's not really comprehensive enough. Perhaps the spin down logic can inspect the smart data and if a test is being executed, skip the spin down until the test is no longer active. Oh so you're saying that currently if you are running an extended test that the disk will still spin down if it's not being accessed otherwise? Last I remember, yes. A spin down is issued and it aborts the SMART test. This may have been changed, but I may have missed it. As far as short vs extended. If we do a monthly parity check on the 27th, and do an extended test 1 drive a day, starting with disk1 - diskn where each day of the month is the disk number, this could be done nicely. At least that's how I planned to do it. That's two full sweeps of each disk a month. Another idea is to schedule a smart extended test for all drives on the 27th day and parity check on the 28th day.