WeeboTech Posted August 14, 2013 Share Posted August 14, 2013 Badblock's equivalent is the -b option, and badblocks seem to ignore bytes that don't constitute a full block. My prior comment may have been in haste, I believe an incomplete -b (block size) is ignored. -c option is how many -b's to read in one instance. Bad blocks will calculate how many -b values can be read from the total size if it's not supplied. with remainder discarded if it is not evenly divisible. If there is a partial block at the end it probably will ignore it. Hence why we do not see the expected results at the end for 1 extra byte. What is the exact condition of your test with a hard drive. That's the tell tale. We can use Joe.L's example from the original post: That's not an accurate example The numbers are actually divisible by block size. It's based on complete blocks of -b value. I don't think the user of badblocks should burden himself with such calculations. I would expect that every single byte of the disk will be tested, or at least a warning be given otherwise. Every block should be read unless indivisible values are supplied on the command line or the program is not used as expected. This is why I wanted to see your exact test that dealt with the drive itself. Something is flawed here. Every single 'BLOCK' of the disk should be read. unless incorrect values are provided by -c and -b Calculating -b and -c incorrectly will surely provide incorrect results. The -b option is for a single block read, -c is how many -b's to read. If it's short, and error is printed. I will re-confirm this since I kept seeing these weird errors printed out when I was trying different values. I believe badblocks calculates from 0 to n blocks based on -b if there is an extra byte or two after the end, they are not read. I believe the total size must be evenly divisible by -b, Anything else would cause a problem. Normally these values are automatically calculated so I would expect it to operate normally under regular conditions. If you were to fill a hard drive with nulls, then try to add 4 more bytes at the end, you would fail. Disks normally are in a set number of blocks. You don't usually have an incomplete block on a disk. That's why the testing criteria is important. If there's a bug that's reproducible, then it can be repaired. I've never seen a disk with n blocks plus half a block, unless something funky is going on with an HPA. 2,000,398,934,016 is evenly divisible by 1024. I would really like to go through this more so I can compare your test against the source code. I can see exactly why adding 4 bytes to the end of a flat file comes up with what looks like a program bug, but the reality it's not working with complete disk blocks so it will miscalculate how many blocks to read. I'm not so sure this is a bug since it's actually expecting to work with disk blocks of a fixed size. Once I locate the math for automatically calculating the last_block, then it should be apparent. My tests of using small minute values with -b 1 -c 1 show that data at the end of the file is picked up as invalid. I think the issue is based in even divisibility of -b with the total size of what it is you are testing. Any remainders are discarded since it's expecting to work with fixed blocks. I'll have to review it further. Another emergency is taking precedence. Quote Link to comment
Barziya Posted August 14, 2013 Share Posted August 14, 2013 Badblock's equivalent is the -b option, and badblocks seem to ignore bytes that don't constitute a full block. My prior comment may have been in haste, I believe an incomplete -c (block size) is ignored. -b option is how many -c's to read in one instance. You've mixed up the -b and the -c. It's actually the other way around: -b it the block-size, and -c is the nubmer of such blocks to be tested at a time. That lead you to completely misinterpret my previous post. The numbers there still hold: 2000398934016 is not evenly divisible by 65536. So any garbage you have in the last 24576 bytes of that disk will not be detected by the command Joe.L used in the OP. Quote Link to comment
WeeboTech Posted August 14, 2013 Share Posted August 14, 2013 You are right, I flipped -b and -c. I was dealing with another situation. Power loss at office with the phone ringing constantly. I'm going to re-edit my prior post so as not to confuse the issue. The root cause I'm finding is that using badblocks on a flat file that is not evenly divisible by the block size will cause it to not read the last incomplete block. This should not happen on a disk since the disk is probed for it's size and it is expected to already be divisible by block size unless a user supplied something different. root@unRAID:/tmp# badblocks Usage: badblocks [-b block_size] [-i input_file] [-o output_file] [-svwnf] [-c blocks_at_once] [-d delay_factor_between_reads] [-e max_bad_blocks] [-p num_passes] [-t test_pattern [-t test_pattern [...]]] device [last_block [first_block]] case 'c': blocks_at_once = parse_uint(optarg, "blocks at once"); break; case 'b': block_size = parse_uint(optarg, "block size"); break; ... bb_count = test_func(dev, last_block, block_size, first_block, blocks_at_once); ... set_o_direct(dev, buffer, try * block_size, ((ext2_loff_t) current_block) * block_size); ... got = read (dev, buffer, try * block_size); ... Normally last_block is determined by device size. (which should be divisible by block_size). device_name = argv[optind++]; if (optind > argc - 1) { errcode = ext2fs_get_device_size2(device_name, block_size, &last_block); if (errcode == EXT2_ET_UNIMPLEMENTED) { com_err(program_name, 0, _("Couldn't determine device size; you " "must specify\nthe size manually\n")); exit(1); } if (errcode) { com_err(program_name, errcode, _("while trying to determine device size")); exit(1); } } else { errno = 0; last_block = parse_uint(argv[optind], _("last block")); last_block++; optind++; } if (optind <= argc-1) { errno = 0; first_block = parse_uint(argv[optind], _("first block")); } else first_block = 0; 2000398934016 is evenly divisible by -b 1024. The total amount of blocks needs to be evenly divisible by the block_size not the count. It reads in chunks of block size * count, up until last_block. Last block is calculated with block_size and the total size of the file being tested without the remainder. Thus why partial blocks at the end are not being read if the values are incorrect. dd if=/dev/zero of=zeros bs=1024 count=1024 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.0040629 s, 258 MB/s ls -l zeros -rw-rw-rw- 1 root root 1048576 Aug 14 18:15 zeros root@unRAID:/tmp# badblocks -b 1024 -c 50000 -t0x00 -v zeros Checking blocks 0 to 1023 Checking for bad blocks in read-only mode Testing with pattern 0x00: done Pass completed, 0 bad blocks found. root@unRAID:/tmp# echo "testing" >> zeros echo "testing" >> zeros root@unRAID:/tmp# ls -l zeros -rw-rw-rw- 1 root root 1048584 Aug 14 18:18 zeros root@unRAID:/tmp# badblocks -b 1024 -c 50000 -t0x00 -v zeros Checking blocks 0 to 1023 Checking for bad blocks in read-only mode Testing with pattern 0x00: done Pass completed, 0 bad blocks found. NOT DETECTED SINCE LAST BLOCK IS STILL 1023 1048584/1024 = 1024.0078125 Add a full block. root@unRAID:/tmp# dd if=/dev/zero bs=1024 count=1 >> zeros 1+0 records in 1+0 records out 1024 bytes (1.0 kB) copied, 3.8096e-05 s, 26.9 MB/s root@unRAID:/tmp# badblocks -b 1024 -c 50000 -t0x00 -v zeros Checking blocks 0 to 1024 Checking for bad blocks in read-only mode Testing with pattern 0x00: 1024 done Pass completed, 1 bad blocks found. INCORRECT VALUE IS NOW DETECTED Notice how it calculated checking blocks differently now. last_block seems to be based on int(total_size/block_size); The program wasn't designed to work with an uneven (total_size/block_size). It's clear. It's designed to work with fixed disk blocks. If you can supply the test case where the disk was written to with invalid values and the program failed to detect it I would like to see it. If I can reproduce it, I can fix it. From what I'm seeing in my tests, the start is detected, the middle is detected and the end is detected up and including the last block. i.e. int(total_size/block_size); This is a known factor. It's unexpected to have a remainder of a partial block on a physical disk. Quote Link to comment
Barziya Posted August 14, 2013 Share Posted August 14, 2013 2000398934016 is evenly divisible by -b 1024. You're still confusing things. The OP used "badblocks -b 65536" If you can supply the test case where the disk was written to with invalid values and the program failed to detect it I would like to see it. If I can reproduce it, I can fix it. There's your test case, in the first post of this thread: If you use the exact same command that the OP did, then badblocks won't touch whatever's in the last 24576 bytes of that disk. The full badblocks command I used was: badblocks -c 1024 -b 65536 -vsw -o /boot/badblocks_out_sdl.txt /dev/sdl The smart report looks like this: User Capacity: 2,000,398,934,016 bytes Quote Link to comment
WeeboTech Posted August 15, 2013 Share Posted August 15, 2013 2000398934016 is evenly divisible by -b 1024. You're still confusing things. The OP used "badblocks -b 65536" If you can supply the test case where the disk was written to with invalid values and the program failed to detect it I would like to see it. If I can reproduce it, I can fix it. There's your test case, in the first post of this thread: If you use the exact same command that the OP did, then badblocks won't touch whatever's in the last 24576 bytes of that disk. The full badblocks command I used was: badblocks -c 1024 -b 65536 -vsw -o /boot/badblocks_out_sdl.txt /dev/sdl The smart report looks like this: User Capacity: 2,000,398,934,016 bytes It's incorrect usage. It will ignore anything not evenly divisible. This is what I've been stating. It's a known and always has been for years. I remember this being the case from many years ago. The problem is the program doesn't report it, it's expecting you to know what your doing when you redfine what would have been retrieved from the file system call. I remember the warnings from years ago on the format command too. Redo the test with them switched around and it should work correctly. Quote Link to comment
WeeboTech Posted August 15, 2013 Share Posted August 15, 2013 I wrote all zeros to a disk, then intentionally wrote a few non-zero bytes here and there. To my dismay, a read-only badblocks pattern test for zeros passed with flying colors: Checking for bad blocks in read-only mode Testing with pattern 0x00: done Pass completed, 0 bad blocks found. (0/0/0 errors) Yet, I could hexdump the disk and see the non-zero bytes. This is the test that needs review. Joe l's usage will have issues. My own tests using those conditions reveal issues. It's known that ending blocks are not read if you put values that do not make sense with the geometry of a disk then let it auto calculate block count from uneven amounts. It's the here and there that's the issue. If you don't tell badblocks the first and last block, it calculates the last_block based on a block_size. If you give it a block size that is not evenly divisible, it will not work correctly. If the badblocks is able to use what it detects from the ext2fs call, do these conditions come up? Fact is, it tells us what blocks it is reading with Joe's command line. None of us caught that it missed a block. With Normal usage I don't touch the block size unless it's a multiple of the disk. In every invocation I've done, I let it calculate the first and last block from what it finds on the file system. I have used -c to increase the count. From my other tests it shows it works. Disk /dev/md3: 3000.6 GB, 3000592928768 bytes root@unRAID:/tmp# badblocks -vs /dev/md3 Checking blocks 0 to 2930266531 root@unRAID:/tmp# badblocks -vs -c 1024 /dev/md3 Checking blocks 0 to 2930266531 root@unRAID:/tmp# badblocks -vs -b 1024 -c 65536 /dev/md3 Checking blocks 0 to 2930266531 root@unRAID:/tmp# badblocks -vs -c 1024 -b 4096 /dev/md3 Checking blocks 0 to 732566632 root@unRAID:/tmp# badblocks -vs -c 1024 -b 65536 /dev/md3 Checking blocks 0 to 45785413 This clearly is not going to work as a some blocks have now been discarded due to truncation. 3000592928768 / 65536 = 45785414.5625 Quote Link to comment
Barziya Posted August 15, 2013 Share Posted August 15, 2013 It's the here and there that's the issue. The "here and there" was a poor choice of words, I appologize. How I discovered this issue was, I wanted to "preclear" the disk with a single write-pass of badblocks with a zero pattern. Then I hexdumped the disk for peace of mind, and I saw the garbage at the end. Trying to reproduce the thing, I zeroed out the whole disk with dd and just wrote a few non-zero bytes near the end. The single read-pass of badblocks passed that, and that freaked me out, and I reported it here. So, the non-zero bytes weren't really "here and there", they were mostly near the end of the disk. Now I understand what's happened. Still, that's not how I would expect badblocks to work. I guess I'll always have to make careful calculations myself before invoking badblocks. That's certainly not an intuitive behavior. Quote Link to comment
WeeboTech Posted August 15, 2013 Share Posted August 15, 2013 It's the here and there that's the issue. The "here and there" was a poor choice of words, I appologize. How I discovered this issue was, I wanted to "preclear" the disk with a single write-pass of badblocks with a zero pattern. Then I hexdumped the disk for peace of mind, and I saw the garbage at the end. Trying to reproduce the thing, I zeroed out the whole disk with dd and just wrote a few non-zero bytes near the end. The single read-pass of badblocks passed that, and that freaked me out, and I reported it here. So, the non-zero bytes weren't really "here and there", they were mostly near the end of the disk. Now I understand what's happened. Still, that's not how I would expect badblocks to work. I guess I'll always have to make careful calculations myself before invoking badblocks. That's certainly not an intuitive behavior. It works correctly if you let it get the information from the file system call. Joe's example, albeit for speed, overrode what the program would have retrieved from the file system call. At the very least you've brought up a real potential issue for people who use the program and alter the block size. Thanks for your patience with this one. I was in fireman mode today. It's not that it did a short read. It's that the calculated values say only read from 0 to here. Where 'here' is calculated incorrectly if an uneven block size is supplied. I suppose if we were ever to use badblocks to replace the DD part, we would put a warning if the calculated blocks have a remainder. However, it's designed to let you pick your own values if you want to go over a specific set of blocks on your disk. For example, I remember reading a few pages years back about using badblocks to read from/to problematic blocks in a -p num_passes method. I.E. to force a reallocation. Even in my attempt to recover about a year or two back, it was important to supply specific values to read and re-read blocks in specific areas to help recover a failing disk. Quote Link to comment
Barziya Posted August 15, 2013 Share Posted August 15, 2013 It works correctly if you let it get the information from the file system call. Joe's example, albeit for speed, overrode what the program would have retrieved from the file system call. No, badblocks doesn't do any calculations about what the -b and -c values should be. If you don't supply -b and -c, it just uses default values of 1024 and 64 respectively, which coincidently work for (most?) disks. The only calculation apparently is the total number of bytes integer-divided by the -b block size to get the number of blocks it will do work on. Now that we know that badblocks cuts off along the -b block size line, I just need to do some more investigation to see whether it also cuts off along the -c blocks-at-a-time line. If that turns out to be true, then there's also a potential for garbage resulting from improper -c values. Quote Link to comment
WeeboTech Posted August 15, 2013 Share Posted August 15, 2013 It works correctly if you let it get the information from the file system call. Joe's example, albeit for speed, overrode what the program would have retrieved from the file system call. No, badblocks doesn't do any calculations about what the -b and -c values should be. If you don't supply -b and -c, it just uses default values of 1024 and 64 respectively, which coincidently work for (most?) disks. The only calculation apparently is the total number of bytes integer-divided by the -b block size to get the number of blocks it will do work on. Now that we know that badblocks cuts off along the -b block size line, I just need to do some more investigation to see whether it also cuts off along the -c blocks-at-a-time line. If that turns out to be true, then there's also a potential for garbage resulting from improper -c values. It calculates "last_block" based on block size and what is returned by the device/filesystem calls. typedef __u64 blk64_t if (optind > argc - 1) { errcode = ext2fs_get_device_size2(device_name, block_size, &last_block); if (errcode == EXT2_ET_UNIMPLEMENTED) { com_err(program_name, 0, _("Couldn't determine device size; you " "must specify\nthe size manually\n")); in another library /* * Returns the number of blocks in a partition */ errcode_t ext2fs_get_device_size2(const char *file, int blocksize, blk64_t *retblocks) { Examples here: http://libext2fs-wii.googlecode.com/svn-history/r20/trunk/source/getsize.c the -c should not be an issue. in do_read the read amount is block_size * try (count). It will return what ever was read. up to and including the max if it was available. Just like before. Quote Link to comment
Barziya Posted August 15, 2013 Share Posted August 15, 2013 Now that we know that badblocks cuts off along the -b block size line, I just need to do some more investigation to see whether it also cuts off along the -c blocks-at-a-time line. It turned out that weird -c blocks-at-a-time values don't cause badblocks to ignore any blocks if the last batch of blocks is less than the -c blocks-at-a-time value. So we need not worry what -c is. The only thing we need to ensure is that the total bytes of the disk|partition|file is evenly divisible by the -b block size, and then every single byte of it will be tested. Once I had that verified to my satisfaction, I did some speed tests on a couple of disks. I noticed that bumping up the default -b block size of 1024 to values higher than 4096 doesn't bring any additional speed improvements. The command I liked the best at the end was: badblocks -vsw -b 4096 -c 1024 -d 1 /dev/sdX While the above was running, I had a separate background task that was doing a read from a random place on the disk once every few seconds. That's a preclear setup I am happy with. I am glad we sorted this out. Thanks Weebo! Quote Link to comment
BobPhoenix Posted August 15, 2013 Share Posted August 15, 2013 That is too bad! I had very high hopes for badblocks. See, I was investigating alternate ways to do a preclear, before Google brought me to this thread here. The thing that started me was the sum command preclear uses for the post-read, which is a very very slow way to do that. I might as well capture and parse the output of a `hexdump /dev/sdX | head` and do the job five times faster. At the end, I narrowed down my choices to cmp and badblocks. I especially liked badblocks for its -d option (read delay factor) which could make the system more responsive dirung a long-running preclear. It's too bad I can't rely on badblocks to go through every single byte on the disk, that's a show-stopper. You do realize that Joe L is making preclear do random seeks in addition to the compares to stress out the disks more with additional seeks? That is one reason the post read is slow compared to the pre-read and write. Quote Link to comment
Barziya Posted August 15, 2013 Share Posted August 15, 2013 That is too bad! I had very high hopes for badblocks. See, I was investigating alternate ways to do a preclear, before Google brought me to this thread here. The thing that started me was the sum command preclear uses for the post-read, which is a very very slow way to do that. I might as well capture and parse the output of a `hexdump /dev/sdX | head` and do the job five times faster. At the end, I narrowed down my choices to cmp and badblocks. I especially liked badblocks for its -d option (read delay factor) which could make the system more responsive dirung a long-running preclear. It's too bad I can't rely on badblocks to go through every single byte on the disk, that's a show-stopper. You do realize that Joe L is making preclear do random seeks in addition to the compares to stress out the disks more with additional seeks? You do realize that I do realize that? For homework, patch that script to do the `sum` with -s, and watch how it will do the job twice as fast, which is still nowhere near how fast it does with other methods, especially on older CPUs or on ULV CPUs. Quote Link to comment
BobPhoenix Posted August 15, 2013 Share Posted August 15, 2013 That is too bad! I had very high hopes for badblocks. See, I was investigating alternate ways to do a preclear, before Google brought me to this thread here. The thing that started me was the sum command preclear uses for the post-read, which is a very very slow way to do that. I might as well capture and parse the output of a `hexdump /dev/sdX | head` and do the job five times faster. At the end, I narrowed down my choices to cmp and badblocks. I especially liked badblocks for its -d option (read delay factor) which could make the system more responsive dirung a long-running preclear. It's too bad I can't rely on badblocks to go through every single byte on the disk, that's a show-stopper. You do realize that Joe L is making preclear do random seeks in addition to the compares to stress out the disks more with additional seeks? You do realize that I do realize that? For homework, patch that script to do the `sum` with -s, and watch how it will do the job twice as fast, which is still nowhere near how fast it does with other methods, especially on older CPUs or on ULV CPUs. Just making sure. Not disputing it could be faster just didn't seem like what you wanted would stress test it. It seemed like it was to just read it from start to finish and compare without the random seeks but I will take your word for it. Quote Link to comment
WeeboTech Posted August 16, 2013 Share Posted August 16, 2013 Is the whole disk read with random seeks? I thought there was a test where there were tons of random read seeks, but that this was just to shake the heads around. I.E. it did not actually compare data of the whole drive. Am I mistaken? In any case, badblocks can do the random seeks. you give it a starting block and a last block. So if a program had a table, it could read every block by telling badblocks which starting and ending block, then check the return value. mark those blocks as read. Once all blocks have been read the test is over. Quote Link to comment
Barziya Posted August 16, 2013 Share Posted August 16, 2013 In any case, badblocks can do the random seeks. No need for that, really. The random seeks are just to add some extra stress to the heads while badblocks is doing the real job of writing/verifying. The random seeks are not speed-critical, and they can be done from a simple Bash script. What I do is, my "preclear" script spawns a simple child shell that does random reads from the disk every second or so. Then the preclear script invokes the badblocks command, and once badblocks is done, the script just kills the child shell. These's no need for anything more complicated than that. Quote Link to comment
BobPhoenix Posted August 17, 2013 Share Posted August 17, 2013 In any case, badblocks can do the random seeks. No need for that, really. The random seeks are just to add some extra stress to the heads while badblocks is doing the real job of writing/verifying. The random seeks are not speed-critical, and they can be done from a simple Bash script. What I do is, my "preclear" script spawns a simple child shell that does random reads from the disk every second or so. Then the preclear script invokes the badblocks command, and once badblocks is done, the script just kills the child shell. These's no need for anything more complicated than that. Ah! Now that sounds like a plan. I understand now why it is better. Quote Link to comment
WeeboTech Posted August 17, 2013 Share Posted August 17, 2013 In any case, badblocks can do the random seeks. No need for that, really. The random seeks are just to add some extra stress to the heads while badblocks is doing the real job of writing/verifying. The random seeks are not speed-critical, and they can be done from a simple Bash script. What I do is, my "preclear" script spawns a simple child shell that does random reads from the disk every second or so. Then the preclear script invokes the badblocks command, and once badblocks is done, the script just kills the child shell. These's no need for anything more complicated than that. Ah! Now that sounds like a plan. I understand now why it is better. What I like about Joe's test is it really shakes, (rattles and rolls) the drive for a few moments. I remember seeing the 'status' come up and hearing the drive go crazy for a short while. While it concerned me, it also gave me a level of confidence in the drive. I think it's a worthy test. Quote Link to comment
Barziya Posted August 17, 2013 Share Posted August 17, 2013 What I like about Joe's test is it really shakes, (rattles and rolls) the drive I think it's a worthy test. Yes, it is. I am talking exactly the same shakes, rattles, and rolls, as Joe's script is doing them. Only I don't really need 2266 lines of Bash code to accomplish that, if I can do it in about 50 lines. Quote Link to comment
darkside40 Posted July 14, 2015 Share Posted July 14, 2015 After three years one of my Hitache 5k3000 showed some write errors so i replaced it with a new drive. No i let do it a cycle with the pre-clear script. What do you think, if it passes that cycle would it be okay to use that HDD somewhere where not so important data are handled (outside of my unRaid)? Quote Link to comment
c3 Posted July 14, 2015 Share Posted July 14, 2015 After three years one of my Hitache 5k3000 showed some write errors so i replaced it with a new drive. No i let do it a cycle with the pre-clear script. What do you think, if it passes that cycle would it be okay to use that HDD somewhere where not so important data are handled (outside of my unRaid)? Without data (a smart report), it just a guess. Quote Link to comment
darkside40 Posted July 14, 2015 Share Posted July 14, 2015 Okay here is the result of the preclear run: Disk: /dev/sdb smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 5K3000 Device Model: Hitachi HDS5C3020ALA632 Serial Number: XXXXXXXXXXXXXXXXXX LU WWN Device Id: 5 000cca 369d50055 Firmware Version: ML6OA5C0 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: 5940 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Tue Jul 14 22:47:03 2015 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (24664) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 411) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 103 3 Spin_Up_Time 0x0007 161 161 024 Pre-fail Always - 356 (Average 325) 4 Start_Stop_Count 0x0012 098 098 000 Old_age Always - 9528 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29 9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 12112 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3386 192 Power-Off_Retract_Count 0x0032 093 093 000 Old_age Always - 9533 193 Load_Cycle_Count 0x0012 093 093 000 Old_age Always - 9533 194 Temperature_Celsius 0x0002 146 146 000 Old_age Always - 41 (Min/Max 15/43) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 21 SMART Error Log Version: 1 ATA Error Count: 21 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 21 occurred at disk power-on lifetime: 12026 hours (501 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 7e cc 00 00 Error: ICRC, ABRT at LBA = 0x0000cc7e = 52350 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 77 cc 00 e0 00 05:53:28.285 WRITE DMA ef 10 02 00 00 00 a0 00 05:53:28.285 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 05:53:28.285 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 05:53:28.284 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 05:53:28.284 SET FEATURES [set transfer mode] Error 20 occurred at disk power-on lifetime: 12026 hours (501 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 7e cc 00 00 Error: ICRC, ABRT at LBA = 0x0000cc7e = 52350 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 77 cc 00 e0 00 05:53:27.970 WRITE DMA ef 10 02 00 00 00 a0 00 05:53:27.970 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 05:53:27.970 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 05:53:27.969 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 05:53:27.969 SET FEATURES [set transfer mode] Error 19 occurred at disk power-on lifetime: 12026 hours (501 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 7e cc 00 00 Error: ICRC, ABRT at LBA = 0x0000cc7e = 52350 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 77 cc 00 e0 00 05:53:27.655 WRITE DMA ef 10 02 00 00 00 a0 00 05:53:27.655 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 e0 00 05:53:27.655 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 a0 00 05:53:27.654 IDENTIFY DEVICE ef 03 46 00 00 00 a0 00 05:53:27.654 SET FEATURES [set transfer mode] Error 18 occurred at disk power-on lifetime: 12026 hours (501 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 7e cc 00 00 Error: ICRC, ABRT at LBA = 0x0000cc7e = 52350 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 77 cc 00 e0 00 05:53:27.340 WRITE DMA c8 00 08 7f cc 00 e0 00 05:53:27.340 READ DMA c8 00 08 77 cc 00 e0 00 05:53:18.370 READ DMA e5 00 00 00 00 00 40 00 05:52:38.165 CHECK POWER MODE e5 00 00 00 00 00 40 00 05:51:38.527 CHECK POWER MODE Error 17 occurred at disk power-on lifetime: 11618 hours (484 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 66 dd 00 00 Error: ICRC, ABRT at LBA = 0x0000dd66 = 56678 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ca 00 08 5f dd 00 e0 00 02:48:16.360 WRITE DMA c8 00 08 5f dd 00 e0 00 02:48:16.352 READ DMA ea 00 00 00 00 00 a0 00 02:48:16.341 FLUSH CACHE EXT ca 00 08 6f dd 00 e0 00 02:48:16.341 WRITE DMA ca 00 08 67 dd 00 e0 00 02:48:16.341 WRITE DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 90% 12046 - # 2 Short offline Completed without error 00% 12046 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. I cant see anything obvious. Maybe it was just a faulty cable? Quote Link to comment
RobJ Posted July 15, 2015 Share Posted July 15, 2015 Over 12000 hours of operation, and all I can see wrong are 21 corrupted packets! Nothing wrong with the drive. And the CRC errors would not have caused write errors, they are retried until successful. About 86 operational hours ago, there was a cluster of 4 CRC errors, which is odd, possibly strong interference near the cable, or possibly some really flaky power issues. Perhaps there were power issues then, and they caused the write errors, but at the moment, that looks like a fluke. No reason at all you should not trust this drive. The cable may not be perfect, but certainly isn't terrible, with only 20 errors in over 12000 hours. Quote Link to comment
darkside40 Posted July 15, 2015 Share Posted July 15, 2015 Yep the UDMA_CRC_ERROR_COUNT is far more higher on my Parity drive. root@HTMS:~# smartctl --all /dev/sdb | grep 199 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 77 root@HTMS:~# smartctl --all /dev/sdc | grep 199 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 root@HTMS:~# smartctl --all /dev/sdd | grep 199 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 root@HTMS:~# smartctl --all /dev/sde | grep 199 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 4 root@HTMS:~# smartctl --all /dev/sdf | grep 199 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 2 root@HTMS:~# smartctl --all /dev/sdg | grep 199 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 2 root@HTMS:~# smartctl --all /dev/sdh | grep 199 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 Maybe cable is not the best? Quote Link to comment
RobJ Posted July 15, 2015 Share Posted July 15, 2015 Yep the UDMA_CRC_ERROR_COUNT is far more higher on my Parity drive. Maybe cable is not the best? I'd replace it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.