January 13, 201214 yr Working on enhancing the scripts to do md5sums on all disks at same time to help on checking parity issues. http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors What impact is there on dd if=/dev/${DeviceAddr} skip=${StartHere} count=10000000 | md5sum -b if the SKIP exceeds the disk? What impact would it have on the USB, Cache drive? Is the USB always /dev/sda ?? For example, if the bad block you want to check is 1.5 TB into the disk, and you have some disks 1TB or less, what would happen with the dd cmd above?
January 13, 201214 yr Working on enhancing the scripts to do md5sums on all disks at same time to help on checking parity issues. http://lime-technology.com/wiki/index.php?title=FAQ#How_To_Troubleshoot_Recurring_Parity_Errors What impact is there on dd if=/dev/${DeviceAddr} skip=${StartHere} count=10000000 | md5sum -b if the SKIP exceeds the disk? some disks will lock up if you attempt to seek to a block that does not exist. (some will even stop responding until power cycled) What impact would it have on the USB, Cache drive?Completely unknown. Is the USB always /dev/sda ??No. For example, if the bad block you want to check is 1.5 TB into the disk, and you have some disks 1TB or less, what would happen with the dd cmd above? Might fail with no bytes returned, might lock up the drive, might return zero bytes and not fail. depends on the disk and disk controller. If the bad block is at an address greater than the size of the disk, it cannot be the source of the bad block.
January 13, 201214 yr Author ok...thanks. Was wondering before I did up a script for general usage that would do the dd against all /dev/[hs]d[a-z]. Wouldn't want to cause someone issues Since currently all my drives are same size, I wouldn't encounter the described situation. guess if I get something working for my usage, and want to publish, I'd need to put a warning on it. btw. is there a command that would return the number of blocks on a drive? is there a method of determining which /dev/[hs]d[a-z] is the USB so I could check/exclude it? for the earlier mentioned dd command....would that work if I referenced /dev/md# instead of /dev/[hs]d[a-z] ??
January 13, 201214 yr Interesting thought, do a "for disk in /dev/md[1-20]". I believe this would miss the parity drive though, right? Likely simpler to use "for disk in /dev/[hs]d[a-z]" and then do a check to skip the flash drive. I'm assuming in your worker script you want to do a check that the beginning and end blocks don't go past the end of the drive. Something like if the beginning goes past then just exit with maybe a msg and if the end goes past then test but stop at the end of the drive. Peter
January 13, 201214 yr Author Interesting thought, do a "for disk in /dev/md[1-20]". I believe this would miss the parity drive though, right? Likely simpler to use "for disk in /dev/[hs]d[a-z]" and then do a check to skip the flash drive. I'm assuming in your worker script you want to do a check that the beginning and end blocks don't go past the end of the drive. Something like if the beginning goes past then just exit with maybe a msg and if the end goes past then test but stop at the end of the drive. Peter Correct. /dev/[hs]d[a-z] would be preferred because like you mentioned, /dev/md[1-20] would miss the parity drive. If I can find out (or someone tells me) a method of determining if the drive is the USB, then I would add that to the script. I don't know off hand how to check what the end block of a drive would be, but yes, that would be the logic I'd put in place. Bottom line, would like to get it so that 1 input (driverscript badparityblock) would kick it off for all drives at the same time. Maybe even 1 other parm to control the number of passes with the default being 5
January 13, 201214 yr >> a method of determining if the drive is the USB, then I would add that to the script. here's an example. root@atlas ~ #mount | grep boot /dev/sds1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) root@atlas ~ #mount | grep boot | awk ' { print $1 } ' /dev/sds1
January 13, 201214 yr Can't parity be accessed at md0. I can't check because I'm not near a server. No, it cannot.
January 13, 201214 yr Can't parity be accessed at md0. I can't check because I'm not near a server. The device node is not there.
January 13, 201214 yr Author >> a method of determining if the drive is the USB, then I would add that to the script. here's an example. root@atlas ~ #mount | grep boot /dev/sds1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) root@atlas ~ #mount | grep boot | awk ' { print $1 } ' /dev/sds1 Excellent...I can easily work with that. Now just need to find out how to determine the end block # of a disk.
January 14, 201214 yr >> a method of determining if the drive is the USB, then I would add that to the script. here's an example. root@atlas ~ #mount | grep boot /dev/sds1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) root@atlas ~ #mount | grep boot | awk ' { print $1 } ' /dev/sds1 Excellent...I can easily work with that. Now just need to find out how to determine the end block # of a disk. try something with fdisk -ul /dev/sd?
January 14, 201214 yr >> a method of determining if the drive is the USB, then I would add that to the script. here's an example. root@atlas ~ #mount | grep boot /dev/sds1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) root@atlas ~ #mount | grep boot | awk ' { print $1 } ' /dev/sds1 Excellent...I can easily work with that. Now just need to find out how to determine the end block # of a disk. try something with fdisk -ul /dev/sd? or, this is what I use... blockdev --getsz /dev/sdX | awk '{ print $1 }'
January 14, 201214 yr ... What impact is there on dd if=/dev/${DeviceAddr} skip=${StartHere} count=10000000 | md5sum -b if the SKIP exceeds the disk? From your script's perspective, dd will produce no output on standard-output (ie, into your pipe), it will return error status to its caller (your script), and will spew an error message on standard-error (along with its usual report verbiage). Internally, dd did a llseek() to the byte offset corresponding to the skip (x 512), and got an error return from the kernel with error code EINVAL because that offset is invalid since it is beyond the end-of-file (ie, for a block-device, its addressable capacity). Similarly, if the skip is in range, but the added count would exceed capacity, things will proceed normally, and terminate normally (but prematurely) when end-of-file (ie, addressable capacity) is reached. No error return (to script) nor error msg to standard-error will occur. Short answer: Don't worry; be happy! The only thing you might want to take precautions for, given what (I think) your goal is, is that your input from the /dev/sdX will be coming through the kernel's buffer-cache, and, consequently, repeated occurrences of that same operation might never actually come from the drive (ie, they are fulfilled from the buffer-cache), which will possibly give your test a false positive. I believe there is a mechanism by which you can cause the kernel to discard its (read) buffers, but I don't know the details. What impact would it have on the USB, Cache drive? See above. A block-device is a block-device. [ Note that I am not contradicting what JoeL wrote regarding the behavior of the disk drive itself. Fortunately, we can't get there from here ... the kernel knows the addressable capacity of all its block devices, and avoids that snafu. (Try command "dmesg | grep logical") ] --UhClem
January 14, 201214 yr The only thing you might want to take precautions for, given what (I think) your goal is, is that your input from the /dev/sdX will be coming through the kernel's buffer-cache, and, consequently, repeated occurrences of that same operation might never actually come from the drive (ie, they are fulfilled from the buffer-cache), which will possibly give your test a false positive. I believe there is a mechanism by which you can cause the kernel to discard its (read) buffers, but I don't know the details. Here is my drop_cache script. #!/bin/sh # # To free pagecache: # echo 1 > /proc/sys/vm/drop_caches # To free dentries and inodes: # echo 2 > /proc/sys/vm/drop_caches # To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches
January 14, 201214 yr >> a method of determining if the drive is the USB, then I would add that to the script. here's an example. root@atlas ~ #mount | grep boot /dev/sds1 on /boot type vfat (rw,noatime,nodiratime,umask=0,shortname=mixed) root@atlas ~ #mount | grep boot | awk ' { print $1 } ' /dev/sds1 Excellent...I can easily work with that. Now just need to find out how to determine the end block # of a disk. try something with fdisk -ul /dev/sd? or, this is what I use... blockdev --getsz /dev/sdX | awk '{ print $1 }' Much better solution.
January 15, 201214 yr Forgive a noobs intrusion if you would... is this all in an attempt to help find which drive is causing a parity error when one shows up? As I would be very interested in knowing how once can easily find which drive has caused the bad parity calculation.
January 15, 201214 yr Author Yes. That is the plan to have this as another tool to use along with the others to try and help determine where the issue may be. Not a replacement for the other tools, just another tool that might help.
January 16, 201214 yr Forgive a noobs intrusion if you would... is this all in an attempt to help find which drive is causing a parity error when one shows up? As I would be very interested in knowing how once can easily find which drive has caused the bad parity calculation. It is and it isn't. The idea is to find if a disk or the related disk hardware is not working consistantly and is returning bad data when the drive is read. So, you will know if a certain disk is causing parity errors when the disk isn't consistantly returning correct data. This isn't the only cause of parity errors though. Peter
Archived
This topic is now archived and is closed to further replies.