February 1, 200917 yr I've been enjoying my unRaid server since April, 08. I'm running 4.4 beta 2. I currently have 6 of these 1TB drives (ST31000340AS/firmware SD15) installed in my array. They have been running good so far, with no major SMART errors or symptoms of failing. However, the increasing reports of problems with the Seagate drives has me worried. My options are: 1. Leave them alone. If one of them does fail, replace it immediately (along with the parity drive) with a WD green drive (probably the 2TB models given the price will be affordable if/when my Seagates die). 2. Try to update the firmware. I have a spare Windows machine I can do this on, but I've read many post on various forums of bricked drives as a result. I'm wary of this option. 3. Replace all the drives in my array with WD green drives - specifically the new EADS models. If I choose option 3 - what would be the proper (and safest) way to accomplish this? I've read a lot of info here and from the Wiki about replacing drives, etc. I'm assuming I would need to replace parity first, then replace each drive one at a time, rebuilding the array as I go. Also, what recommendations for drives are there besides the WD greens? Does anyone have a preference besides Seagate drives? I value the knowledge of the members here very much and would appreciate any advice.
February 1, 200917 yr OF COURSE you leave them alone! If some time you find a way to firmware upgrade your drives, do it one at a time. Chances are if you succeed with the first one, then you did it right and all will succeed. Now if you have the money to replace SIX 1TB drives (i.e. NEW drives), with other same or larger just because a drive MIGHT fail, then I have a mortgage I'd like to discuss with you in private... In that case the safest way would be to take them all out, put all the new in and let the box build all the disks from the old ones (keep them close though). Don't forget to ship the "buggy" ones to me for testing. OK after three possibly stupid jokes (although the mortgage exists), the best way to replace the drives is just "one at time" (of course). Doesn't matter which way you do it. See if you replace parity first, then you let parity rebuild once and then replace each data drive one after the other after each new one is rebuild from the new parity. If you replace parity last, then each new drive is rebuilt by the old parity and then you replace it and also gets rebuilt from the data drives. Doesn't really matter.
February 2, 200917 yr the best way to replace the drives is just "one at time" (of course). Doesn't matter which way you do it. See if you replace parity first, then you let parity rebuild once and then replace each data drive one after the other after each new one is rebuild from the new parity. If you replace parity last, then each new drive is rebuilt by the old parity and then you replace it and also gets rebuilt from the data drives. I understand that you will get the result you are looking for like this but you'll be giving up a week of your time and a very nice torture test to the drives... If you have the space, plug everything in, all 12 drives. If not, do as many as you can. Then go in through telnet and move the files from old disk to new. I think you can stop the array before the move to save the parity update and gain some speed but I'm not sure I would. Either way are better than rebuilding 6 drives from parity (my opinion anyway) Gog
February 2, 200917 yr I would do the following. Install Joe L. Unmenu utility. Install Joe L. preclearing utility. It makes it easier to mount the hard drives and look at them. http://lime-technology.com/forum/index.php?topic=2595.0 http://lime-technology.com/forum/index.php?topic=2696.msg22267#msg22267 Run a parity check and then replace the parity drive and rebuild parity onto the new drive. Connect the new drives to the machine. You can do one or a few at a time. Doing one at a time makes it easier to keep track of them and not screw up. Use the preclearing utility to clear each new drive. Use Unmenu to mount the drive and copy the data from one of the existing drives to the new drive. Old disk 1 to the new disk 1, 2 to 2 and so on until you have done them all. Best is to use a command prompt but you can do it over the network too if you want to. Now, install the 6 new drives in place of the old ones. Assign the new drives to the proper slots (you must keep track of the drives as you filled them and assign them where they belong. Use the "Trust Parity" command to re-start the array without building parity. If you can't be bothered with that then just let the parity re-build. Either way, you still have the old set of drives with the data on them in case something goes wrong while you rebuild. http://lime-technology.com/wiki/index.php?title=Make_unRAID_Trust_the_Parity_Drive%2C_Avoid_Rebuilding_Parity_Unnecessarily If you are not going to use this trust parity command then you can skip the pre-clearing and just format the drives instead. You can also copy data to one drive, install and assign it, then "trust parity" to replace it right away so you go drive at a time. There are a lot of ways to do it. A straight copy will be quicker than a re-build for each drive. Peter
February 2, 200917 yr guys you miss one thing - we don't know if and how many empty SATA ports he has...
February 2, 200917 yr I just have to speak up for a different perspective, perhaps the poor man's perspective. If you bend with every wind of trouble, you will be replacing most of your equipment and software constantly, at a high price, and sometimes causing yourself more problems with the transitions and migrations, than if you had experienced one of the reported possible failures. You often need to step back, and get a feel for how serious the problem really is, and how well understood it is too. In this and many industries, you hear of failures far far more often than the successes. You are far more likely to hear about the few drives that are DOA, than the hundreds of thousands that work fine. In this case, there are certainly reports of failures, including a few among us, but (and this is just my opinion) it does not sound like a thousandth of the many hundreds of thousands of Seagates out there. I can't help but think your chance of problems would be higher by trying to move all of the data to new drives. But perhaps I'm cheap! I still say, don't over-react, don't create more problems than there really are. All drives have a percentage of early failures, from possibly faulty mechanics, media, connectors, or firmware. You can't be sure that the new replacement drives don't have their own problems, as yet unknown, but possibly worse than what you have. And, once you have an easy and safe, well-tested way to upgrade your drives, you can do so. I don't in any way want to tell you what to do, it is completely under your control, but I don't like to see people jumping too quickly. You make the best decisions when you keep a level head, a balanced perspective.
February 3, 200917 yr I just have to speak up for a different perspective, perhaps the poor man's perspective. If you bend with every wind of trouble, you will be replacing most of your equipment and software constantly, at a high price, and sometimes causing yourself more problems with the transitions and migrations, than if you had experienced one of the reported possible failures. You often need to step back, and get a feel for how serious the problem really is, and how well understood it is too. In this and many industries, you hear of failures far far more often than the successes. You are far more likely to hear about the few drives that are DOA, than the hundreds of thousands that work fine. In this case, there are certainly reports of failures, including a few among us, but (and this is just my opinion) it does not sound like a thousandth of the many hundreds of thousands of Seagates out there. I can't help but think your chance of problems would be higher by trying to move all of the data to new drives. But perhaps I'm cheap! I still say, don't over-react, don't create more problems than there really are. All drives have a percentage of early failures, from possibly faulty mechanics, media, connectors, or firmware. You can't be sure that the new replacement drives don't have their own problems, as yet unknown, but possibly worse than what you have. And, once you have an easy and safe, well-tested way to upgrade your drives, you can do so. I don't in any way want to tell you what to do, it is completely under your control, but I don't like to see people jumping too quickly. You make the best decisions when you keep a level head, a balanced perspective. VERY well put!!! I agree 100% I don't think of that as a cheap perspective at all, it is a well rounded approach to solving any problem. Cheers, Matt
February 3, 200917 yr Author Thanks, fellows. I've been reading so many tech forums lately I got spooked. I'm sticking with option 1 and hope I don't have a problem until the new 2 TB drives come down in price and have a proven reliability record. Then, I'll replace the parity drive so as to best prepared for upgrades/failures. Thanks for the good advice. (NLS, If any of those Seagates fail prematurely, I'll shoot you a PM so you can give me your shipping address... )
February 3, 200917 yr According to Seagates site you have one of the affected firmwares. They also go on to say not all drives have problems. Below is a link so you can check your serial numbers. Now if you could empty one disk completely you could then flash it. Move data to it from another disk and flash that one. Rinse repeat until all 6 disks are completed. It's unfortunate seagate ran into this problem but as long as you bought the disk prior to 2009 you have a 5 year warrenty on them. http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951
February 3, 200917 yr Thanks, fellows. I've been reading so many tech forums lately I got spooked. I'm sticking with option 1 and hope I don't have a problem until the new 2 TB drives come down in price and have a proven reliability record. Then, I'll replace the parity drive so as to best prepared for upgrades/failures. Thanks for the good advice. (NLS, If any of those Seagates fail prematurely, I'll shoot you a PM so you can give me your shipping address... ) Good decision IMO. I wouldn't be in a big hurry for a 2T drive. Let other people find the problems. I'd suggest letting the dust settle on the firmware updates, then apply to your drives. Do not try to apply firmware updates to drives that do NOT have an affected firmware, even if you read that someone did this successfully! (The following advice applies to all of us ...) 1. Run monthly parity checks. If there are drive problems, parity checks are the best way to find them. 2. Run smart reports before and after parity checks, and be VERY conscious of any unexpected changes in attributes (esp. reallocated sectors and pending sectors). An easy way to do this is via the smart view of unMENU, which displays all smart attribute issues on all drives on one screen. 3. Run smarthistory if you want to monitor smart attributes on a daily basis. Realize, however, that THIS IS NOT A TEST. If the drive is idle or lightly used for days at a time, smart reports are not going to show you much for those days. (This is why parity checks are so useful, because they DO exercise the drives on every sector. Right after a parity check is the only time you are getting (almost) realtime feedback about the status of the drive via a smart report.) unRAID's redundancy will protect you against a single drive failure. If you follow these steps and know how to recover, the chances of losing data even with this firmware issue are pretty darn small, comparable to chances of destributive hardware failures (like a PSU going haywire), house fires, theft, war, and other unavoidable risks. If you really want to protect yourself, backup your more precious data and burn to DVD or other media. Put in your safety deposit box or give to a family member at another house. (I do this with irreplacable digitial photos and scanned records.)
February 4, 200917 yr Author Yes, I have the great unMenu installed and use myMain to monitor my SMART data on a regular basis. I perform monthly parity checks as suggested, and have zero errors to date. I also scrutinize the detailed SMART reports. My really important data is backed-up on a weekly basis to an external USB drive - stored in a safe location. Here is mainly why I'm a little worried: ALL of my drives have the affected firmware. The spin_retry_counts are an annoyance at best, because they haven't risen lately. I suspect they are a remnant statistic from when I first built the array and power-cycled the machine several times. The high temp readings on drives 1 & 2 must be anomalous (probably defective temp sensors on the drives), but bother me nonetheless. None of the drives have any re-mapped sectors. I have recently set the spin-down time to never to prevent unnecessary wear (I generally powerdown the server in the morning when I go to work and then start it up each night for regular use). I also have 2 hot swap bays available - along with a spare 1 TB drive of the same type - so transferring data and/or flashing drives shouldn't be a problem if there ever is an issue. Thanks for all the valuable help and guidance.
February 4, 200917 yr I'd be worried about 54C drive temps!!! I see your other drives are much cooler. Is it possible that these two particular drives are in a place with near zero airflow, while other otters are not? I have heard people comment that their sensors seem off by several degrees, and even heard it hypothesized that some manufacturers intentionally locate the sensor in places other than the hot spot. But, if this is true, it will be the first time I've ever seen a sensor report a VERY hot temp that wasn't true. One relatively easy way to see if the temp is real is to shut down the server, let it cool overnight, power up in the morning, and check the temps immediately. If they are cool and you can watch them heat up, you know you are dealing with a REAL temperature emergency. If not, I would RMA the drives. You want functioning sensors.
February 4, 200917 yr Author All my drives are in the Athena backplanes. The two drives in question are in the lower-most slots of the bottom cage. It must be a result of defective sensor readings, as these are the reported temps moments after cold booting the server when it has been off for many hours. Of particular note - these two drives were the first ones I purchased for my server as OEM drives from NewEgg. They are actually replacements of originally defective drives I received. The original two drives immediately exhibited the click of death. They were promptly RMA'd and they sent these two "new" drives as the replacements. I have often wondered about the QC at NewEgg; perhaps these drives were refurbished or returns from someone else? All the other drives were purchased as retail kits from B&M stores. This is the only way I will purchase drives from now on. Is there a simple way to get the actual temp reading on those drives without relying on SMART/software data? In any event, I feel confident that those temps are bogus and the 'real' temps must be similar to the other drives.
February 4, 200917 yr Interesting. If you can't rely on the sensors, you are back to using more conventional means of taking temperature readings. I have an indoor/outdoor thermometer that has a long cord with a temperature sensor on the end. It is made to be put outside a window to take outdoor temps, but I have used it in the past to take temperature readings inside my computer case. You could use something like that. But if after a cold boot you are getting these high temps immediately after boot, I'd tend to agree that it is likely a sensor malfunction.
Archived
This topic is now archived and is closed to further replies.