Jump to content

RobJ

Members
  • Posts

    7,135
  • Joined

  • Last visited

  • Days Won

    4

Everything posted by RobJ

  1. That is where they started as a brand new drive, notice that the VALUE and WORST are 100, essentially perfect. The problem is that on some of these newest attributes the 'SMART' developers have set the THRESH to be very close to the starting value. I suppose the thinking is that with Spin_Retry_Count you get 3 strikes and you're out, with End-to-End_Error you only get one strike and you're dead. Keep in mind though that for both of these cases, they have not yet decided (for whatever reason) that either of these is a critical attribute, one that fails the drive, even though both clearly are probably predictive of impending drive failure.
  2. It's probably getting bogged down, waiting for the drive to deal with all of the bad sectors. Current count of reallocated sectors is 3168, and there are 4904 bad sectors still to be dealt with, and as you said, it's only gotten through about a third of the drive. Not good.
  3. I agree with Joe, that combo of drive model and firmware is not confidence inspiring. Any chance of a firmware update for it?
  4. This error line, with the BadCRC flag set: It indicates corrupted packets across the cabling. That is almost always a bad SATA cable, easily replaced. It could also be a power issue to the drive, but if it was a power issue you would probably have had more BadCRC errors on other drives too. Since you didn't indicate any others, it's probably a bad cable to this drive. Both drives, after the bad sector handling, indicated "configured for UDMA/100", so I figured you must have them connected to an older disk controller, with UDMA/100 speed only. That is going to significantly impact their read and write speeds.
  5. Strangely though, it looks like later in the log those errors are showing for a different drive: Advice? Are these drives hosed? Not hosed, but you do have some bad sectors on 2 drives, sdd and sdb. You also probably have a bad SATA cable to the drive sdb.
  6. The number 65535 is not 65 thousand something, it's the unsigned integer representation of the 2 byte signed integer value of -1, so I wouldn't put too much significance in it. A return of -1 is usually a flag value, indicating a possible error or the current unavailability of the true number or something else but not a valid return value. If possible and I saw a value of 65535, I would grab a few more SMART reports, to see if it would change to a valid value. If not, I'd reboot and check again. If still not, then I would have to assume a bug in the SMART functions of the drive's firmware. As to the End-to-End error, I'm not sure we should give that much significance either, because it is a relatively new attribute, and looks to me to be still experimental. That does not mean it is not significant, it IS informational, but the SMART reports where both of these occurred (this one and one a week or 2 ago) both indicate that this was flagged 'Old_age' not 'Pre-fail', and therefore this is NOT considered a critical attribute. Since it is not considered critical, it is only informational, and as such that may be useful, but I'm not sure it should carry much weight in your decision making. There are many critics of the SMART system, and both of these situations just give them more ammo. Some don't trust SMART reports at all, but I feel that once you understand and have some experience with the vagaries and inconsistencies of SMART numbers and report info, then there is useful info there. I just wish they would get their act together a little better.
  7. Both drives listed in your attached file look fine, no issues, brand new.
  8. The Maxtor looks fine, no issues at all. It's not that old either, with less than 20000 hours on it. At 250GB, it's a bit small, but that depends on what you want to use it for. The Samsung is a little older, with 28000 hours on it, but it too looks fine.
  9. You are probably experiencing resource contention of some kind. They are probably each waiting on some resource the other has. Since you did not attach a syslog, I can assume you've already looked there for clues and found nothing. Joe L. Thanks - no i didn't look - actually didn't even think to attach the log, thinking that since the array isn't on, what could the log show.. sorry that was nooby of me. anyhoo - after a little while, it took off agian, and is in the 90MB/s range - i guess knowing it was an EARX, i paniced too quickly. chugging along now. the parity check also finished on my other VM. Glad that the machine didn't croak with all these disks chugging at the same time. *whew*. OK, are you saying you had a parity check running in another VM on the SAME physical machine? That would be major resource contention! UnRAID, especially during a parity check/build is I/O bound, meaning it will be making maximum use of the available I/O bandwidth, and everything else has to wait their turn. A VM is great for sharing unused resources, so multiple VM's can use idle CPU time, and can use unused RAM, but NOT any unused I/O because there isn't any! Running 2 VM's will not double your available I/O capabilities! So of course it sped up close to normal, once the parity check finished.
  10. These increases in Current Pending Sectors *after* a Preclear are not a good sign! I would follow Joe's advice in this post.
  11. Unfortunately that disables all daily cron jobs. It would be better to examine /etc/cron.daily and determine what may be running that is causing the disk usage. Also, if cron was set to run daily jobs at 4:40am and your spindown delay is one hour, then the drives would spin down after 5:40am, not before that. What is your default spindown delay? If one hour, then it looks like something is running at 4:30am and accessing for several minutes Disk 3 and Disk 4, and once in a while Disk 2. It ran for 4 to 7 minutes last week, but only about 3 and a half minutes this week.
  12. That looks like one hour after the mover runs, so you probably have the spindown timer set to one hour. So the question for you becomes - why are all of my drives except Disk 2 spinning up for the mover?
  13. This should perhaps be a FAQ item! Short answer is that the pre-read is a straight linear read, the post-read is much more complicated, with extra testing. Better answer is provided by Joe earlier in this thread, not too long ago I think.
  14. This drive is not off to a good start, with numerous bad sectors right off the bat. This drive wasn't drop-shipped to you, it was drop-kicked to you! You may want to *try* to capture a SMART report, but if you can't - RMA it. On the bright side, it's always good to find out these things *before* you trust your data to it.
  15. This is a new disk, and it appears there was a small region with sectors not correctly initialized. Perhaps there had been an electrical spike that scrambled them? I don't know. But the zeroing phase appears to have had no problems rewriting them, so they are correct now. Why not preclear the disk one more time, to reassure yourself that the disk is fine. I really doubt you will see any further issues with this disk.
  16. I don't see any connection with memory here. I'll just add though that if I have any suspicions at all about the memory of a computer, then I consider that computer to be completely unusable! Period. When you get the new memory sticks, test them with memtest overnight, until you are completely confident in them. That pretty well rules out bandwidth differences, unless there was an issue with that specific port or cable. You might try preclearing the slow drive one more time connected to the cable and port used by one of the faster drives. And if you have the syslog during the slow drives preclear, check it for any drive-related errors/exceptions.
  17. In your BIOS setup, make sure that both the onboard and addon SATA controllers are configured for native SATA mode, preferably AHCI mode, *NOT* in an IDE emulation mode. Having drive symbols appear as hdx usually means they are IDE drives or configured to emulate IDE drives.
  18. I'll start by saying that all 3 drives look fine, no issues at all. For the following, ignore the temperature SMART attributes (190 and 194), they have their own interpretation rules. For the rest, you should not look at the Raw numbers for most SMART attributes, just the VALUE column (and perhaps the WORST column), which are an attempt by the manufacturer to indicate its own valuation of the numbers. Generally the VALUE's will be from 1 to 100, but often they may be from 1 to 200, or even for a few Maxtors 1 to 253 (the 200's can be halved and considered as 100's). That means you can generally think of them as percentages of perfect, as in 100 is considered factory perfect and 1 is bottomed-out bad and 50 is probably not very good. So even though a Raw_Read_Error_Rate may have a very high raw number, if its VALUE is 100, then the drive manufacturer considers its read error rate to be perfectly normal. I think we look too fast at the raw numbers, most of them should be ignored. We should look first at all the 100's and/or 200's in the VALUE and WORST columns. As to a possible cable issue, I see no evidence of that. Would need the corresponding syslog to know for sure. From here on, I'm moving from factual to speculative, trying to come up with ideas why one drive performed somewhat slower, even though all 3 drives were identical, and had the same firmware version. The most likely reason is that different controller chipsets or busses were involved, and the slow drive was stuck with the slower hardware, was bullied out of a fair share of the I/O bandwidth available. You could verify this by swapping its connection with one of the fast drives, and retesting. But since we're already speculating ... we'll drift a little farther out in left field. All 3 SMART reports were essentially identical, as would be expected by identical drives. But there were 2 odd differences, one was that the slow drive had a far higher SEEK_ERROR_RATE raw number (which we should normally ignore!), and the other was that the slow drive reported a much longer time needed for offline data collection, a fact that is really strange for identical drives! The fast drives sda and sdc reported they only need 80 seconds for offline data collection. The slow drive sdd reports it needs 139 seconds! I don't want to put too much importance on that, since the manufacturers do not provide any info on properly interpreting these numbers, but it does seem very odd, and perhaps indicative of a slower drive. And while I really don't want to draw any conclusions from the SEEK_ERROR_RATE raw number, it plausibly *may* represent the need for many more seeks than the other drives, and seeks are relatively slow actions. You might want to run a drive speed testing tool (HDTune?) on both the slow drive and a fast drive, and compare. Now for some crazy speculation... Manufacturers like to cut corners. What if, to make 1TB, 2TB, 3TB, and 4TB drives, they just set up production lines for 4TB platter sets, and then when factory testing them, sell the partially defective ones as smaller drives. So if one cannot support 4TB, determine how much it CAN support and sell it accordingly. Now, what if a platter set had a bad region only in the faster tracks, but had 3TB available in the slower tracks? You would create a good 3TB drive, but it would be significantly slower than the average 3TB drive. No easy way to know for sure...
  19. SMART attribute 4, Start_Stop_Count, is not a critical attribute, so does not count as part of the SMART pass/fail testing. You can tell which are critical attributes by the Type column of the SMART report, Pre-fail indicates a critical item, Old_age just indicates other non-critical items. It looks to me as if the Threshhold for Start_Stop_Count should be 000 just like Load_Cycle_Count, which has also bottomed out. These just indicate you have a lot of wear on this drive. It appears the Power_On_Hours value is corrupted, perhaps also due to old age. All other numbers look great, you should still have considerable life left in this drive. By the way, this drive is a 160GB Seagate Momentus ST9160823AS drive, it is not an SSD.
  20. The final SMART report looks clean, but indicates it has had some questionable sectors, just as the Preclear report indicated, when it said that Current_Pending_Sector count rose from 2 to 9, then ended up cleared. The fact that all questionable sectors were cleared for further use seems to indicate that the drive's media surface is OK, but there was some event that scrambled the data in a few sectors, perhaps an electrical spike or sudden power outage while writing to the drive. SMART actually says there have been 319 ATA errors, but only shows the last one (which is somewhat odd, usually shows the last 5). You mention it may have been a factory reconditioned drive, so perhaps they reset some of the SMART parameters. It *claims* to be a very young drive, with only 205 hours on it. There's no reason to RMA it now, but it does seem a little suspicious. I recommend running one or two more Preclears on it, just to be sure you can trust it.
  21. 1. Would very much appreciate, if someone verifies the results, that drives are ok. All 3 drives look great. 2. Whats the preferred method to post the results: via attached files, or in a "code" insert? Attached files is better (as you did). 5. It's been more than 10 hours since I ran preclear, but myMain always shows the drives as spinning. Why haven't they been spun down after an hour of inactivity? Is it because they're unassigned? (see picture below) UnRAID manages spindown only for assigned drives, and you haven't done that yet. You can manually spin them down in the UnMENU interface using the little spin icons for each drive.
  22. I don't know about your Samsung, but this Seagate 2TB drive looks fine. Device Model: ST2000DL004 HD204UI Serial Number: *********2495
  23. Remember that the device symbols (sda, sdg, etc) may change at each boot. Is it possible that the drive was sdg on the previous boot, but not in the current session? Try a 'preclear_disk.sh -l' first, to verify current symbol assignments.
×
×
  • Create New...