Jump to content

RobJ

Members
  • Posts

    7,135
  • Joined

  • Last visited

  • Days Won

    4

Everything posted by RobJ

  1. No problems at all. Since the rate of zero (the RAW number) is the same, there is no real change here. For some reason, the scaled numbers, VALUE and WORST, have been reset. The number 253 usually seems to indicate "Not Used Yet".
  2. Drive looks perfect, brand new with less than one day of service!
  3. That is not normal, both read and write caching are usually enabled. I don't have any ideas as to what to make of it though. You are using a board based on VIA chipsets, which is usually problematic. I can't say it won't work, but I have seen little success with VIA based boards here. I know you said the 3 drives are "Seagate Barracuda ES 750Gb" drives, but they don't look like anything I have ever seen before. Neither the Linux kernel or Smartctl 5.38 were able to identify the manufacturer. They are identified by SMART as (with different serial numbers): Device Model: GB0750C4414 Serial Number: 5QD51Y9T Firmware Version: HPG4 Perhaps someone with experience with the ES series of drives can help here. Squeaking sounds are definitely not normal either, I don't think I have ever heard a hard drive squeak. Are you positive that the squeaks are coming from the drive, and not a fan?
  4. No problems that I can see. Neither the VALUE or WORST dropped from 200, so it is still considered essentially perfect. At some point, it would be good for someone to prepare a table of these attributes, with expected ranges, and comments on their seriousness (or lack thereof) of particular results. Unfortunately, the table would have to take into account all of the inconsistencies between brands and models.
  5. No, I don't know of any reason (for now) that High_Fly_Writes should be given any significance at all. They are not a Pre-fail item, so they aren't considered critical, and don't impact the Pass/Fail health test of the drive. I'm going to suggest to Joe and Brian that they consider dropping any checks of 189 High_Fly_Writes and all of the 240's, as there is no reason to unnecessarily alarm users. In researching through SMART reports that I have seen, only the latest large Seagate drives actually use the High_Fly_Writes attribute. They appear to be counting something associated with High Fly Writes in the RAW_VALUE. VALUE and WORST are just 100 minus the count in RAW_VALUE, until RAW_VALUE reaches 99, and VALUE and WORST hit bottom at 001. The threshold THRESH is only used for items marked as Pre-fail, is otherwise usually meaningless. It sometimes appears to have a good practical threshold value, which I assume means "it is preferable that WORST stay above this value", but has no other impact. In this case, since VALUE drops from 100 to 001, I have to assume that they haven't fully implemented this attribute, or haven't decided a reasonable scale of values. Future drives will undoubtedly change how High_Fly_Writes is implemented and counted and scaled. Somewhere, Brian did some research on it, and had some comments, but I don't know where they are.
  6. I just want to be clear, I did not see any problems with sdq, the other drive, and only what appeared to be cable or connection related errors with sdn. At least from the info I had available, I don't see any problems with either of the drives themselves. I would like to note that although the Power-Off_Retract_Count did increment, its VALUE did not budge from 200 (its peak or starting value), which indicates to me that this is well within expected values, and probably not a concern (at least as far as you can trust SMART data!).
  7. That syslog is a mess! And it's only the latter part too, it is missing the 600 to 900 odd lines of system setup at the beginning. The drive with ID of sdn probably has a poor quality cable. I would replace it if at all possible. And Joe is right, there were page allocation failures for many subsystems, including the share file system, Samba, and possibly involving the networking and Reiser file system modules, which is worrying. In this piece of the syslog, I don't see any kernel panics, so I don't think we can say for sure that there is any damage, such as evidence of flaky memory, or corrupted Reiser file systems, but I never fully trust a system that has crashed. Always better to restart fresh. I certainly would not try to run anything important, once I saw the first sign of suspicious system operation. Those 'Call Traces' definitely qualify as suspicious system operation. Grabbing the syslog and waiting for advice was the correct thing to do. Even though I saw no 'panics' here, to be safe, I would reboot and run a full memory test first, then run reiserfsck on each of the data drives (see the Check Disk File systems page for instructions). I'm sorry, it is somewhat time-consuming, but it is better to be safe. The memory test is probably not needed, so you can postpone it if you wish, but I like to be thorough, and know whether a system is truly trustworthy, especially when I have just had extensive memory-related problems. I would like to say test only the data drives you were actually using, but it appears that there were numerous spin downs to many drives, and the mover ran at least twice, so it looks like all or most of your drives may have been written to. 2 GB of memory should have been more than enough. I can't see any reason so far for the problems, at least not from this syslog.
  8. Aaaaaaaa!!!!!!! HOW do I do that?? I've been searching these boards for the mail command in unraid... I apologize for this being off topic, but if it has been discussed elsewhere on the boards, can somebody please point me in the right direction? If I recall, there have been 3 mail scripts written, although they probably build on each other. This should get you started. (The info about the missing library was for older versions of unRAID, the newest versions include it.) * Email Notifications
  9. The BadCRC error flag is usually associated with a poor cable, not the drive. Try replacing/upgrading the cable to sdl on ata10.00. The Devices tab or your syslog should help you determine which drive that is.
  10. I don't think I have ever seen a drive with support for those, so I too think it may be only in SCSI drives, or perhaps enterprise class drives. To be honest, that seems like overkill to me. In my own view, 2 or 3 passes is enough to thoroughly test a drive. All you want to do is test it hard enough to force weak or fragile drives to fail now, rather than later when online with your data stored on them. Beating them to death does not seem necessary.
  11. I don't see anything wrong with the drives, after taking a quick look at your post and your syslog. How do you know it froze at 88%? Is there some other reason you think the drives are bad?
  12. Short answer - nothing at all is wrong. It is a good illustration of why I think that the value of 253 represents 'Un-initialized' or 'Unused yet'. I have not seen anything in the SMART docs or literature yet about this, but there does seem to be a quiet convention that if a variable has not yet been used, it is given the 'factory installed' value of 253. I think most or all of the drive manufacturers do this, although none of them do it in a standard or completely consistent way. You will notice in your example that the WORST value is 253, and I have come to see that as the originally installed value, which is then set to a true initial value by whatever the particular programmer of that part of the firmware decides to use, on the first actual usage of that value. In your case, he (or his internal docs) had decided to initialize Raw_Read_Error_Rate as 200, on its actual first internal usage. These are incredibly inconsistent between manufacturers. Some of these line items are only set when offline testing occurs, while others may be partly initialized, but never actually set until a relevant event occurs, and others are updated all of the time.
  13. I was getting ready to reply, then discovered your original message had completely vanished! A little disconcerting! The script itself is at the very bottom of the very first post in this thread. An intro and summary of it is at the Preclear Disk section of the UnRAID Add Ons wiki page. For Telnet, check the Telnet page, and the FAQ, unRAID Console and Addon Questions section. After checking these myself, I can tell I need to reorganize and 'raise' the visibility of the Telnet and PuTTY help.
  14. As Joe said, that is not good! Especially for a drive that says it has zero Power_On_Hours. The one test you might try is a SMART long test, it is supposed to resolve those 'pending' sectors into 'remapped' or recovered. See the bottom of the Troubleshooting page, Obtaining a SMART report section, for info on running the long SMART test. Just a clarification, this is a Western Digital green drive, not a Maxtor green?
  15. These are just followup to the original real error. Locate the first error sequences involving sdd or sd 6:0:0:0. Also determine which drive sdd is, whether it is your new Maxtor Green, or a different drive that has decided to fail now.
  16. Completely agree. Plus, the UDMA_CRC_Error_Count increased from 1 to 3, which is also indicative of cable or other interface issue. Most of those syslog errors occurred after the drive was disabled at 03:20:40, which is like 'pulling the plug'. It's generally fatal, and you can ignore all errors that subsequently occur. I would not bother with any further testing until you can replace that SATA cable, or discover something loose in the power cabling or connectors. The drive itself looks fine.
  17. Tom had to deal with this quite awhile back, it's somewhere in the Release Notes. We used to have occasional posts about temps not showing for certain drives, and we would try to work through why SMART was not enabled for that drive. Tom decided he might as well always enable it first, and I don't think we have seen problems like that since. It baffles me why there is even an option to have SMART disabled, in any drive. You don't have to use the SMART data. What advantage could there possibly be to having SMART disabled? (I'm not referring to unRAID, just drives in general.) And I really find it incomprehensible that a tool like Drobo (possibly the closest and most similar competitor to unRAID) would have SMART turned off on the drives that GoChris pulled! Does that make any sense at all? Can you imagine choosing to run a tool like unRAID without SMART data?
  18. Looks good. Actually, it's claiming to be 'better than good'. If you check the Raw_Read_Error_Rate, the VALUE starts at an initialized value of 100, and rises to 118! It's like asking for maximum effort from someone, and they responding they will give 110%, which is not possible but you know what they mean. In this case, their scientists have measured and calculated appropriate scales for these error rates, and your drive must have such a low error rate, relative to their statistical norms, that their algorithm determined a higher than 100 value. The Seek_Error_Rate stayed at 100. The idea with these seems to be that as the rate of errors increases with wear and tear, the *_Error_Rate will drop from 100 down to the threshold value, at which point they have decided that the error rate is too high to trust the drive, and it will return a failing SMART grade. With Error_Rate's, you should probably ignore the RAW values. They may or may not be actual error counts, but what is being monitored with these attributes is not how many, but at what rate the errors are occurring, and how these numbers fit within the expected norms for that drive model.
  19. I have seen that in 1 or 2 of the other pre_clear reports earlier. It is completely harmless, just means a test was aborted, so no result from it. I suspect that something in the pre_clear script is initiating an offline test that is aborted by a later command to the drive. Possibly just a timing issue.
  20. I personally don't think there are any experts yet, as to interpreting SMART reports. It is too new a data analysis tool, plus they keep changing and adding attributes, *and* they are different for each drive vendor. All I can give is my impressions from what I have seen so far, and learn from each new one I see. The Raw_Read_Error_Rate and the corresponding Hardware_ECC_Recovered are quite normal for a Seagate, would be very high for any other brand. The one attribute I would keep an eye on is the Seek_Error_Rate, which seems higher to me than it should be, and has taken a hit in its VALUE. The High_Fly_Writes attributes is rather new, and I don't think anyone really understands it yet. What troubles me about it, is the VALUE and WORST have already bottomed out, but perhaps Seagate themselves did not know how to appropriately scale it. Overall, it looks fine, but I would monitor it once a month for awhile. I think you will learn from watching it, what is OK and what bears continued monitoring.
  21. Looks very good, about typical for new Seagates. I checked and there is STILL nothing newer than smartctl v5.38. Need a newer one to properly interpret attributes 240, 241, and 242. ( Apparently, logger is too dumb to display tabs (^I, ASCII 9) correctly! )
  22. There is no need to format a disk before a drive rebuild, and in fact if it had formatted it first, the 'format' would then be overwritten by the rebuild. All a format is, is the creation of a brand new file system on the drive, including the hidden file system structures, and an empty root directory, but no files and folders. In a sense, a drive rebuild is almost the same thing, in that it copies a complete file system with all of its hidden structures to the drive, but it *also* copies all of the files and folders. Put another way, a drive rebuild copies the 'format' of the previous drive.
  23. No, unless it continues to increase. If it rises further, then you may want to replace its SATA cable with a better one.
  24. Your helpful info appears to be applicable in general to recent machines with an AMI BIOS. Another more general page with helpful info near the bottom about BIOS settings for bootable USB drives: http://www.weethet.nl/english/hardware_bootfromusbstick.php.
×
×
  • Create New...