Jump to content

vca

Members
  • Posts

    321
  • Joined

  • Last visited

Everything posted by vca

  1. The preclear of a pair of 4TB Seagate desktop drives finished on the weekend. So here are the results for the old and new (32bit) preclears. Note the old preclear was done on a 2 pass basis. == invoked as: ./preclear_disk.sh -c 2 /dev/sdd == ST4000DM000-1F2168 Z30093E3 == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 2 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 10:57:55 (101 MB/s) == Last Cycle's Zeroing time : 10:02:21 (110 MB/s) == Last Cycle's Post Read Time : 22:53:14 (48 MB/s) == Last Cycle's Total Time : 32:56:35 == == Total Elapsed Time 76:56:37 == invoked as: ./pc15b.sh -f /dev/sdc == ST4000DM000-1F2168 Z30093E3 == Disk /dev/sdc has been successfully precleared == with a starting sector of 1 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 11:07:18 (99 MB/s) == Last Cycle's Zeroing time : 9:55:48 (111 MB/s) == Last Cycle's Post Read Time : 11:39:35 (95 MB/s) == Last Cycle's Total Time : 32:43:43 == == Total Elapsed Time 32:43:43 == invoked as: ./preclear_disk.sh -c 2 /dev/sdc == ST4000DM000-1F2168 W3002WDC == Disk /dev/sdc has been successfully precleared == with a starting sector of 1 == Ran 2 cycles == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 10:56:18 (101 MB/s) == Last Cycle's Zeroing time : 9:36:47 (115 MB/s) == Last Cycle's Post Read Time : 23:18:09 (47 MB/s) == Last Cycle's Total Time : 32:55:57 == == Total Elapsed Time 76:35:13 == invoked as: ./pc15b.sh -f /dev/sdd == ST4000DM000-1F2168 W3002WDC == Disk /dev/sdd has been successfully precleared == with a starting sector of 1 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 11:07:22 (99 MB/s) == Last Cycle's Zeroing time : 9:56:03 (111 MB/s) == Last Cycle's Post Read Time : 11:39:59 (95 MB/s) == Last Cycle's Total Time : 32:44:25 == == Total Elapsed Time 32:44:25 Regards, Stephen
  2. I have two of these drives, I used one in my unRAID server for about a year without any issues, but recently I have replaced it with the NAS version (I'll use the desktop version for backup storage). The thing that was bothering me about these drives was the UDMA_CRC_Error_Count, though most of that may have come from one cable problem. I just finished doing a retest of these drives by preclearing them without any indications of trouble. Both of these drives also show large values for the Seek_Error_Rate and they also have the 60/30 numbers for the worst and thresh normalized values - so I figure these are typical of this particular drive. Here are the smarts from my drives so you can compare (note the newer version of unRAID has an updated version of the smart tool that gives some better attribute names than the version you have): ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 120 099 006 Pre-fail Always - 2137384 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5765 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 071 060 030 Pre-fail Always - 13846410 9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 6620 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 36 183 Runtime_Bad_Block 0x0032 098 098 000 Old_age Always - 2 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 098 098 000 Old_age Always - 2 190 Airflow_Temperature_Cel 0x0022 071 059 045 Old_age Always - 29 (Min/Max 21/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9 193 Load_Cycle_Count 0x0032 092 092 000 Old_age Always - 16430 194 Temperature_Celsius 0x0022 029 041 000 Old_age Always - 29 (0 20 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 195 000 Old_age Always - 6949 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1089h+21m+15.431s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 51506131152 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 222555473047 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 125981968 3 Spin_Up_Time 0x0003 092 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2281 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always - 5927616 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2670 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 31 183 Runtime_Bad_Block 0x0032 099 099 000 Old_age Always - 1 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 091 091 000 Old_age Always - 9 190 Airflow_Temperature_Cel 0x0022 069 050 045 Old_age Always - 31 (Min/Max 21/36) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always - 6649 194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31 (0 21 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 194 000 Old_age Always - 668 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 475h+23m+44.493s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 44747895302 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 147844353977 Regards, Stephen
  3. Here's my first result from the 32 bit version of the first beta, this is on a pair of old WD 2TB green drives: == invoked as: ./pc15b.sh -f /dev/sdc == WDCWD20EARS-00J2GB0 WD-WCAYY0100121 == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 8:08:34 (68 MB/s) == Last Cycle's Zeroing time : 10:56:21 (50 MB/s) == Last Cycle's Post Read Time : 8:00:06 (69 MB/s) == Last Cycle's Total Time : 27:06:01 == invoked as: ./pc15b.sh -f /dev/sdd == WDCWD20EARS-00MVWB0 WD-WCAZA6293604 == Disk /dev/sdd has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 8:09:39 (68 MB/s) == Last Cycle's Zeroing time : 10:57:49 (50 MB/s) == Last Cycle's Post Read Time : 7:59:14 (69 MB/s) == Last Cycle's Total Time : 27:07:44 I'll rerun these drives through an old preclear next. And here's the results with the old preclear: == invoked as: ./preclear_disk.sh /dev/sdc == WDCWD20EARS-00J2GB0 WD-WCAYY0100121 == Disk /dev/sdc has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 7:37:40 (72 MB/s) == Last Cycle's Zeroing time : 11:13:52 (49 MB/s) == Last Cycle's Post Read Time : 15:02:26 (36 MB/s) == Last Cycle's Total Time : 33:54:57 == == Total Elapsed Time 33:54:57 == invoked as: ./preclear_disk.sh /dev/sdd == WDCWD20EARS-00MVWB0 WD-WCAZA6293604 == Disk /dev/sdd has been successfully precleared == with a starting sector of 63 == Ran 1 cycle == == Using :Read block size = 8388608 Bytes == Last Cycle's Pre Read Time : 7:37:22 (72 MB/s) == Last Cycle's Zeroing time : 11:13:41 (49 MB/s) == Last Cycle's Post Read Time : 14:51:50 (37 MB/s) == Last Cycle's Total Time : 33:43:53 == == Total Elapsed Time 33:43:53 So the new preclear cut the second pass time from 15 hours to 8 hours, which is great. One odd thing is the preread time is about 30 minutes longer with the new code. Stephen
  4. Just finished a two pass preclear (the old, slow, version) on a pair of 4TB Seagate NAS drives. Took about 75 hours to run. Looking forward to a faster version. Regards, Stephen
  5. I'm doing preclears right now and for the next week... I've got a pair of new Seagate NAS 4TB drives to burn in and a pair of Seagate Desktop 4TB drives to retest and a pair of old WD 2TB greens that I'm taking out of service (so I'm preclearing them to erase and test). The 4tb NAS drives are just at 26% complete on the post-read of the first of 2 cycles and have taken about 26 hours so far, they'll probably take about 34-38 hours if I recall correctly. So I'd be interested in running the new beta. Regards, Stephen
  6. When the tape drive I was using for backups started to die back in about 2004 or 2005 I ended up writing my own backup utility, initially to store the backups to DVDs and then as the cost of hard drives dropped I switched to using external drives. This utility is written in Python and I use it to backup my unRAID server to removable drives attached to my Windows desktop. It is built on the notion of a single full backup followed by an unlimited number of incrementals, so while the first backup takes a lot of time the incrementals run pretty quick. Typically I run an incremental pass on the weekend to grab all the new media files, a process that might take a half hour or so. The backups are written in user-configurable chunks, typically about 500MB (the system will automatically split large files across multiple chunks), to a drive in my Windows desktop machine. From there they get copied to an external drive in one of my backup media sets. I have two media sets, one is kept at a remote location (to further protect against fire, flood or theft - but not far enough away to protect against a meteor strike). Periodically I will take the external drive I am currently saving backups to over to the external location, swap it for the last disk in that set and bring that disk back. When I return with the swapped disk I then update it with the backup chunks that were kept on the workstation in its absence and then I can delete those from the work station and repeat the process. In this way I have quadruple redundancy for all the backed up data almost all the time: 1. the unraid disk where the data resides 2. the unraid parity protection (not truly a copy, but close) 3. the copy on the workstation internal cache drive 4. the copy on the local external drive once the data is swapped to site the items 3 and 4 become the local external drive and the remote external drive. About once every year or two I restart the whole process, because by then I'll have some higher capacity drives that I can use to remove the older (and smaller) back up drives from service. The last time I did this I was able to retire a handful of 500GB drives, replacing them with 2TB units that I had removed from the unRAID box when I started moving to 4TB drives. The data on the external drives is check summed both at the chunk level and at the individual file level. And the database that manages this has a SHA1 hash of all the individual files as well, so in theory I could use it to check against the current contents of the unRAID server without having to access any of the external drives. But I've not written that code yet. The backup utility is called ArcvBack and is available on: http://arcvback.com/arcvback.html It currently uses Python 2.5, one of these days I'll have to update it to the Python 3.x series. Regards, Stephen
  7. The point is that I, and probably others, would worry about it. Yes, there's a two year warranty, but the fact is that they specify a two year warranty constrained by a ridiculously low power on hours figure, bearing in mind that they also promote (in the first document quoted) the drives as suitable for home servers and NAS devices, which are typically powered 24/7. It simply does not make sense. I would generally prefer to buy drives and other parts where I know for sure what I am getting. I agree, I had bought the first drive on the basis of their "Best-fit applications" list and that it had a 2 year warranty, it was when I was considering buying a second that I looked further (I wanted to check the power requirements) and saw the odd 2400 hour issue and this has made me stop to consider what to do. Heck I'm spending the first 100 hours (4.2%) of the drive's life just running a 2-cycle preclear... Regards, Stephen
  8. I picked up one of these Seagate 4TB drives (internal packaging) and verified that it has a 2 year warranty (via Seagate's web site). Reading the "Desktop HDD Data Sheet": http://www.seagate.com/files/staticfiles/docs/pdf/datasheet/disc/barracuda-desktop-hdd-ds-1770-1-1212us.pdf I saw them list "Best-Fit Applications": -Desktop or all-in-one PCs -Home servers -PC-based gaming systems -Desktop RAID -Direct-attached external storage devices (DAS) -Network-attached storage devices (NAS) So sounds like a good choice for my unRAID box... But on the next page they list the detailed specs and here they quote "Power-On Hours: 2400", which is only 100 days in 24x7 mode. My guess is that they might use this as a get out of warranty card. So time to turn the drive spin down mode back on. Regards, Stephen
  9. Yup, sounds correct. The most likely cause of this sort of suddenly appearing parity errors is an unclean machine shutdown - such as a power issue or someone just hitting the ctrl-alt-del to get a Windows login prompt. At this point it would be useful if there was a utility that could take the list of parity error locations and output a list of the files (on all your disks) that occupy those locations, then you would have a short list of things to check for possible corruption. Other than that, unless you have a backup that you can compare to you are probably out of luck. Stephen
  10. Yes, if you have two memory sticks you should be able to run the system on one at a time to see if that improves things. Most of the past recommendations are to run the memtests for at least a night, though in the two cases I have encountered personally where a memory stick went bad memtest was able to detect the issue in one or two passes (usually one of the 8 tests it runs will fail). So, if you've already done 10 full passes (say about 5 hours or so of testing) I would say your memory (and probably also the CPU) are fine. Regards, Stephen
  11. It is telling you the sector relative to the start of the partition where the parity error was detected. The unRAID software has absolutely no way to know which bit(s) are wrong. Just that an exclusive or of all the bits of some word in the sector did not end up with an even number of bits set to a "1" It could be a but set to zero when it should be a one, or a bit set to one when it should be zero, and it could be on any of your disks, or it could be transposed in RAM if memory is flaky, or in the disk controller chipset, or on the motherboard itself. It is VERY hard to diagnose. It is one reason why old Nforce4 chipsets are to be avoided. They exhibited random parity errors that caused some to lose their hair. Many time is is a single disk, and repeated md5sums of the same data on a single disk result in different results. The fix, replace the disk. These disks often show no errors otherwise. Joe L. In this post: http://lime-technology.com/forum/index.php?topic=10364.msg98580#msg98580 I reported my experiences with resolving an ongoing parity error issue. You might be experiencing something similar, but you need to do further tests. First run some more parity checks in the "nocorrect" mode. If these show exactly the same errors (to the block) then what you probably have is an issue where data was being written to the drives when the machine got shut down unexpectedly. This will leave the parity drive out of sync with the data. Doing a parity check (in the normal check and update mode) will bring the parity disk back up to date with the contents of the drives and further partiy checks should show no more errors. If the errors go away by themselves (for a few parity checks) or change locations between checks - which was what was happening to me - then you have a harder to diagnose problem. It might be as simple as faulty cable/connector or it might be flaky memory, motherboard chip set or a drive. In my case it turned out to be one of my 8 drives and the SMART tests (both long and short) gave no indication that the drive was acting up. Anyway, don't panic, run one test at a time and keep a written log of what you test along the way and relevant results (like SMART test logs and the reported locations of the parity errors) of the tests and you should sort it out in time. Regards, Stephen
×
×
  • Create New...