June 24, 201016 yr About 2 weeks ago I added a new 2 TB drive with the western Digital 64MB cache. I read they are compatible if you add the jumper to the 7-8 pins, so I added the jumper cleared the disk and added it to the array. However, now I am having issues where I have no choice but to do a hard reset. The first time the issue appeared, I was copying data to the disk and then the system went belly up. I tried to run a powerdown from the console and the system hung. After the hard reset, the parity check ran and everything looked OK, so I left it. Then today, I have issues again. I come home from work and the array cannot be accessed. I can ping the system but the shares are not accessible and neither is the GUI. So again, I go to the console and try to log in, but it just hangs. I cant do anything with the system at this point but do a hard reset. Unfortunately I dont think I have a log for any of this activity, but I can say there was a bunch of crazy text outputed on the console when I got there. I hit enter and the login prompt appeared, but like I said, it hung and after a while more of the same text was displayed. The drive is under warranty so i can get it replaced, but I dont know if it is the drive or potentially something else, much more fatal looming. I would hate to lose my data...after a reboot everything is fine, but then randomly this failure appears and twice in one week have me worried. Please help, I am fairly new to unRAID and I dont know where to go from here. thanks. here is my attached syslog as it is my most recent. the others are dated a few days out. syslog-2010-06-24.txt
June 24, 201016 yr What is your motherboard and power supply? odd that the drive is recognized as hda and hdd. Any special hardware or IDE to SATA converters in place? Jun 24 13:55:39 Tower kernel: atiixp 0000:00:14.1: IDE controller (0x1002:0x439c rev 0x00) Jun 24 13:55:39 Tower kernel: ATIIXP_IDE 0000:00:14.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Jun 24 13:55:39 Tower kernel: atiixp 0000:00:14.1: not 100%% native mode: will probe irqs later Jun 24 13:55:39 Tower kernel: ide0: BM-DMA at 0xfa00-0xfa07 Jun 24 13:55:39 Tower kernel: ide1: BM-DMA at 0xfa08-0xfa0f Jun 24 13:55:39 Tower kernel: Probing IDE interface ide0... Jun 24 13:55:39 Tower kernel: Probing IDE interface ide1... Jun 24 13:55:39 Tower kernel: hdc: WDC WD1001FALS-75J7B0, ATA DISK drive Jun 24 13:55:39 Tower kernel: hdd: WDC WD20EARS-00MVWB0, ATA DISK drive Jun 24 13:55:39 Tower kernel: hdc: host max PIO4 wanted PIO255(auto-tune) selected PIO4 Jun 24 13:55:39 Tower kernel: hdc: UDMA/100 mode selected Jun 24 13:55:39 Tower kernel: hdd: host max PIO4 wanted PIO255(auto-tune) selected PIO4 Jun 24 13:55:39 Tower kernel: hdd: UDMA/100 mode selected Jun 24 13:55:39 Tower kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Jun 24 13:55:39 Tower kernel: ide1 at 0x170-0x177,0x376 on irq 15 Jun 24 13:55:39 Tower kernel: ide-gd driver 1.18 Jun 24 13:55:39 Tower kernel: hdc: max request size: 512KiB Jun 24 13:55:39 Tower kernel: hdc: Host Protected Area detected. Jun 24 13:55:39 Tower kernel: ^Icurrent capacity is 1953523055 sectors (1000203 MB) Jun 24 13:55:39 Tower kernel: ^Inative capacity is 1953525168 sectors (1000204 MB) Jun 24 13:55:39 Tower kernel: hdc: 1953523055 sectors (1000203 MB) w/32767KiB Cache, CHS=65535/255/63 Jun 24 13:55:39 Tower kernel: hdc: cache flushes supported Jun 24 13:55:39 Tower kernel: hdc: hdc1 Jun 24 13:55:39 Tower kernel: hdd: max request size: 512KiB Jun 24 13:55:39 Tower kernel: hdd: 3907029168 sectors (2000398 MB), CHS=65535/255/63 Jun 24 13:55:39 Tower kernel: hdd: cache flushes supported Jun 24 13:55:39 Tower kernel: hdd: hdd1 ... Jun 24 13:55:39 Tower emhttp: pci-0000:00:11.0-scsi-0:0:0:0 host1 (sda) WDC_WD20EADS-65R6B0_WD-WCAVY2217816 Jun 24 13:55:39 Tower emhttp: pci-0000:00:11.0-scsi-1:0:0:0 host2 (sdb) WDC_WD20EADS-65R6B0_WD-WCAVY2136899 Jun 24 13:55:39 Tower emhttp: pci-0000:00:11.0-scsi-2:0:0:0 host3 (sdc) WDC_WD15EADS-00P8B0_WD-WMAVU0856021 Jun 24 13:55:39 Tower emhttp: pci-0000:00:11.0-scsi-3:0:0:0 host4 (sdd) WDC_WD15EADS-00P8B0_WD-WMAVU0526394 Jun 24 13:55:39 Tower emhttp: pci-0000:00:14.1-ide-1:0 ide1 (hdc) WDC_WD1001FALS-75J7B0_WD-WMATV2054515 Jun 24 13:55:39 Tower emhttp: pci-0000:00:14.1-ide-1:1 ide1 (hdd) WDC_WD20EARS-00MVWB0_WD-WMAZ20082367 From what I can see you have 2 drives on an IDE interface sharing one cable (and interrupt). (is this true?) This is not a good practice in any raid environment. What kind of cable or hardware is in place for those drives.
June 25, 201016 yr Author OK, it is an AMD board with 6 on-board SATA connections. I believe 2 can actually function as a RAID controller, but with RAID disabled in the BIOS it is supposed to be just another SATA port. They all have their own SATA cable. Currently there are 6 drives, 5 data and 1 parity. Is it bad practice to use these on-board ports? I know the next step is to add a controller, I was thinking one of these I believe, http://www.newegg.com/product/product.aspx?item=n82e16815121009&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-_-RSSDailyDeals-_-na-_-na&AID=10521304&PID=3136390&SID=, but plans for this weren't going to be until my next drive upgrade. So, everything is supposed to be SATA, all drives are SATA and the 5th drive hda I guess at that point has been in the system for months with no issue, but adding the 6th drive made things go crazy and hard resets make me nervous as they are never good for the hardware, so I would hate to break something else along the way.
June 25, 201016 yr OK, it is an AMD board with 6 on-board SATA connections. I believe 2 can actually function as a RAID controller, but with RAID disabled in the BIOS it is supposed to be just another SATA port. They all have their own SATA cable. Currently there are 6 drives, 5 data and 1 parity. Is it bad practice to use these on-board ports? You've probably got the BIOS set to emulate IDE mode on those SATA ports. On many BIOS that is the default. I know the next step is to add a controller, I was thinking one of these I believe, http://www.newegg.com/product/product.aspx?item=n82e16815121009&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-_-RSSDailyDeals-_-na-_-na&AID=10521304&PID=3136390&SID=, but plans for this weren't going to be until my next drive upgrade. So, everything is supposed to be SATA, all drives are SATA and the 5th drive hda I guess at that point has been in the system for months with no issue, but adding the 6th drive made things go crazy and hard resets make me nervous as they are never good for the hardware, so I would hate to break something else along the way. Hard resets could easily indicate a power supply issue, or a memory issue. Are you using a single 12 volt rail power supply? Or a multiple-rail supply? Which exact brand/model if multiple rail?
June 25, 201016 yr One more thing, Your syslog shows you are running the version of unRAID with the "format" bug. Whatever you do, be absolutely certain ONLY the one new drive you are adding is showing as "unformatted" before you push the "Format" button, otherwise, it will format ALL your disks. Upgrade to 4.5.4 as soon as possible and that bug will not be there to bite you. Also, it appears as if several of your disks have an HPA. Did you have them on a Gigabyte MB previously? /dev/sdd Jun 24 13:55:39 Tower kernel: ata4.00: HPA detected: current 2930275055, native 2930277168 Jun 24 13:55:39 Tower kernel: ata4.00: ATA-8: WDC WD15EADS-00P8B0, 01.00A01, max UDMA/133 Jun 24 13:55:39 Tower kernel: ata4.00: 2930275055 sectors, multi 16: LBA48 NCQ (depth 31/32), AA Jun 24 13:55:39 Tower kernel: ata4.00: configured for UDMA/133 Jun 24 13:55:39 Tower kernel: scsi 4:0:0:0: Direct-Access ATA WDC WD15EADS-00P 01.0 PQ: 0 ANSI: 5 Jun 24 13:55:39 Tower kernel: sd 4:0:0:0: [sdd] 2930275055 512-byte logical blocks: (1.50 TB/1.36 TiB) and /dev/hdc Jun 24 13:55:39 Tower kernel: hdc: WDC WD1001FALS-75J7B0, ATA DISK drive Jun 24 13:55:39 Tower kernel: hdc: Host Protected Area detected. Jun 24 13:55:39 Tower kernel: current capacity is 1953523055 sectors (1000203 MB) Jun 24 13:55:39 Tower kernel: native capacity is 1953525168 sectors (1000204 MB) Jun 24 13:55:39 Tower kernel: hdc: 1953523055 sectors (1000203 MB) w/32767KiB Cache, CHS=65535/255/63 Neither is a problem if you are not on a motherboard with a BIOS that is adding them as both are data drives that are just being made to look a tiny bit smaller than they actually are. This has nothing to do with your hard-resets/crashes, but you may want to address it after you get stable once more. Joe L.
June 25, 201016 yr What is your motherboard and power supply? I'll ask again. What is your motherboard and power supply?
June 25, 201016 yr Author You've probably got the BIOS set to emulate IDE mode on those SATA ports. On many BIOS that is the default. Will turning off IDE emulation potentially break something within my configuration? When I turn it off if it is on, I assume it will switch these to sdx drive vs hdx so the assignment within unraid will change. the new drive is not a hug issue, but I am concerned with my 4th data drive which has a lot more data on it. As far as the mobo model and PSU model, I will get those this afternoon. \ Thanks
June 25, 201016 yr You've probably got the BIOS set to emulate IDE mode on those SATA ports. On many BIOS that is the default. Will turning off IDE emulation potentially break something within my configuration? When I turn it off if it is on, I assume it will switch these to sdx drive vs hdx so the assignment within unraid will change. since they are on the same ports on the same disk controller, odds are unRAID will figure it out by itself. It does not depend on a specific "device" name. It should not "break" anything. If not, the array will not start and you just need to go to the "devices" page to assign the disks back to their respective slots in the array. Joe L.
June 26, 201016 yr Author OK, in the BIOS, I have a setting, on chip sata type, native IDE, RAID, AHCI. It is currently on native IDE. If I select AHCI, it opens up another setting, on chip sata ports 4/5 with the option of IDE or SATA. this setting is not available when in native IDE. The mobo is a gigbyte GA-MA74GM-S2 and the PSU is a Rosewill RG530-2: http://www.newegg.com/product/product.aspx?item=N82E16813128342, http://www.newegg.com/Product/Product.aspx?Item=N82E16817182160&cm_re=rg530-2-_-17-182-160-_-Product
June 26, 201016 yr I would try AHCI. & SATA Take a screen capture of the drive setup before you make any changes. The power supply looks satisfactory. 20A on the 12V line should be plenty. But that does not rule it out if it's getting flaky.
June 26, 201016 yr OK, in the BIOS, I have a setting, on chip sata type, native IDE, RAID, AHCI. It is currently on native IDE. If I select AHCI, it opens up another setting, on chip sata ports 4/5 with the option of IDE or SATA. this setting is not available when in native IDE. The mobo is a gigbyte GA-MA74GM-S2 and the PSU is a Rosewill RG530-2: http://www.newegg.com/product/product.aspx?item=N82E16813128342, http://www.newegg.com/Product/Product.aspx?Item=N82E16817182160&cm_re=rg530-2-_-17-182-160-_-Product Your power supply is a 4 rail supply. Typically, only one of the rails is available on the molex and sata power connectors. If you figure that some drives can draw up to 3 amps per drive on spin-up, and between 2 and 2.5 amps is very common, you can see there might be an issue. Your rails are typically current limited at roughly 20 Amps, and probably rated for 18 Amps continuous since that is the ATX standard. With that supply, you are basically limited to an absolute max of roughly 6 or 7 disk drives, less if you are powering case fans, etc, and that is if the manufacturer is not playing marketing games inflating the actual capacity of the power supply. That might explain the instabilities when you add drives. It has nothing directly to do with the disks, but the total power drawn when they are spun up. I honestly think you have the wrong supply for your server, especially if you are seeing hard resets. You'll need to consult your power supply manual to learn the actual distribution of the 12 volt rails, but I'll be shocked if more than one rail feeds ALL your disks. As far as the ACHI setting, you probably want SATA on the ports 4 and 5. Joe L.
June 26, 201016 yr I've had more drives on a less powerful power supply with no issues. So it may not aways be a straight numbers game. 8 drives, 4 WD 1TB green 1 Seagate 5900 RPM 2 Seagate 1.5TB 7200 RPM 1 Seagate 1TB 7200 RPM. The supply was rated for 17A on the 12V line. When I put in the 9th drive I started to have spin up issues, but not hard resets. If the supply is flaky or beefed up with marketing hype then that could add to it. I would disable anything on board that you do not use. Floppy, PATA, Serial, sound,etc, etc. Keep in mind, reviews on newegg said the board was flaky too. Try re-adjusting bios settings and see what happens. If the other drives were in a spin down state, with new drive being used heavily, this should not cause lockups.
June 26, 201016 yr Reading some of the "bad" reviews on the newegg site for that supply I found these comments: Cons: Randomly reboots my system when far under the rated wattage (intel i5 idle with 1 HD and basic graphics). After reading all of the other user comments, I'm disappointed that this issue hasn't been resolved by Rosewill and I'm disappointed in myself for buying this. This seems like some basic QA testing could have saved people time, money, and frustration. I need my machine for grad research and now I'm set back a week because someone was either careless or greedy. Cons: Random reboots of system-Last time twice in 10min at less then 25% processor use. Max 231w seen with system at 100%. Returned to my previous 400w power supply-stable before and stable once again. Requested RMA for refund. Cons: Definitely does not output 492w on the 12v rails (combined). I have this power supply hooked up to an audio amplifier rated at 400w RMS, and the power supply protection circuits kick in well before 492w. I estimate that I get about 200-300w at most out of the 12v, maybe less. Any of these symptoms sound similar to yours?
June 26, 201016 yr Author Awesome, thanks guys. After the AHCI and SATA setting, I had to assign the disks again and things are back online. I will monitor things and see if it fails/breaks again. I have my disks set to stay spun up. It is a debate wha thte answer is, use power and keep the disks spun or spin them down till needed and risk failure. Who knows. But I do have some fans running to cool the drives, so the PSU theory can be real especially since it started when a new drive was added, although it is strange that it is not constant, if it was power, would it not fail right away, or is the usage still fluctuating, so it is random when it fails. Suddenly there is added load and boom, the system craps out and cannot shut down safely requiring me to hit the reset button. As far as a good PSU that will be ready to grow past 6 drives, what is recommended that wont break the bank. I figure thats a good place to start. Depending on time to failure, if it is short, i should be able to test with a new PSU. Then, if things fail with a new PSU it isnt the PSU, but if they suddenly remain stable, then we know it was the PSU and I go about my business and everything is cool. Now I can say i have a BFG, BFGr550wgspsu http://www.newegg.com/Product/Product.aspx?Item=N82E16817702010&nm_mc=OTC-Froogle&cm_mmc=OTC-Froogle-_-Power+Supplies-_-BFG+Technologies-_-17702010 in another system I can swap with if that would be any better. Otherwise, I guess I will try the option of something totally new as I have plans to grow in the future, but I also dont want to go crazy. Thanks again. At least we are on to something now.
June 26, 201016 yr I've had more drives on a less powerful power supply with no issues. So it may not aways be a straight numbers game. Very true. The max combined current available on all the 12V lines is 41 Amps, granted we don't have a motherboard drawing 20 amps and a PCIe video card drawing high current, but it all adds up. But there are some reviews where the supply did not seem to perform as well as it should for its "rated" capacity. Who knows, it might be that the 5 Volt line is being overstressed. The disk drives do use it too. Joe L.
June 26, 201016 yr Author OK, I guess I should clarify hard reset. What this means is that the system is hung and i have to manually press the reset button to reboot the machine to get it back online. Yes, the system is on a UPS.
June 26, 201016 yr I would definiately try the alternate supply, especially if you already have it. It is a 2 rail supply with two 18 Amp 12 Volt rails. It will be very similar to your existing supply in its capacity, but a good fix for swapping in to see if the hard resets stop. or is the usage still fluctuating, so it is random when it fails. Suddenly there is added load and boom, the system craps out and cannot shut down safely requiring me to hit the reset button.Every time a disk has to move its heads to a different position there is a fluctuation in the current needed. It is nowhere near constant. Joe L.
June 26, 201016 yr You can also set your other drives to spin down, spin them down, access the ONLY the drives you need and do what you've been doing. With problems like this, It's hard to determine. I've never had a PSU and lack of power cause a machine to lock up like that. Hard lockup like that is usually a hardware communication issue. I'm not ruling out PSU as the stimulus to that. For all we know the extra drain on power starves the network adapter from performing correctly or the NB/SB from moving data correctly. With modern drives today, I do not think you have to worry about spinups/spin downs like it used to be. Before I moved hardware, I would do the spin down adjustment and test it out. The supply should be handling 6 drives. How many fans ya got in there? Possible to move any to the motherboard headers? Corsair has some nice single rail supplies. I would suggest checking into that.
June 26, 201016 yr Reading some of the "bad" reviews on the newegg site for that supply I found these comments: Cons: Randomly reboots my system when far under the rated wattage (intel i5 idle with 1 HD and basic graphics). After reading all of the other user comments, I'm disappointed that this issue hasn't been resolved by Rosewill and I'm disappointed in myself for buying this. This seems like some basic QA testing could have saved people time, money, and frustration. I need my machine for grad research and now I'm set back a week because someone was either careless or greedy. Cons: Random reboots of system-Last time twice in 10min at less then 25% processor use. Max 231w seen with system at 100%. Returned to my previous 400w power supply-stable before and stable once again. Requested RMA for refund. Cons: Definitely does not output 492w on the 12v rails (combined). I have this power supply hooked up to an audio amplifier rated at 400w RMS, and the power supply protection circuits kick in well before 492w. I estimate that I get about 200-300w at most out of the 12v, maybe less. Any of these symptoms sound similar to yours? Now these sound like PSU issues. Random reboots and such is what I expect out of starved for power problems. I think I'm more aligned with Joe now on starved for power, but I also think there are other circumstances here. The board reviews say it locks up intermittently too. Turn off what you can in BIOS. Spin down drives, then test. If the problem eases, then you have your answer. you can swap with the other PSU, or shop for a new one.
June 26, 201016 yr Author thanks guys. I will monitor it a bit more with the current PSU and HD config. The more I think about it the more I think it is something else. When I go to the console the system was still available, but I just couldn't get the system to shut down. it would start but fail. This tells me a potential drive issue causing system instability, or something with unraid 4.5.4 and this new drive. I have upgraded to 4.5.5, set the BIOS to AHCI, 4/5 as SATA, re-added the drives to the array, so now I will wait. If it breaks again I will get a PSU and start there since I have time to RMA the drive if it comes to that, but I figure it is good since pre-clear completed fine, or so i thought. Maybe even run it through the WD tools to check it. From what I am reading I will need a new PSU anyway to move past 6 drives anyway with my current PSU. Is the wattage OK and I just need a single rail unit like the Corsairs? There is a nice one for like 80 bucks right now at newegg, if that will work. http://www.newegg.com/Product/Product.aspx?Item=N82E16817139004&cm_re=corsair_power_supply-_-17-139-004-_-Product As far as the number of fans, there are 4 80mm ones, 2 for the case and 2 blowing on the drives to keep them cool, so nothing crazy. the CPU is an FX64, a bit much I know.
June 26, 201016 yr Try spinning down the unused disks also. Without seeing the actual failure messages, it's hard to tell. I would have swapped the cable also. I've seen this resolve issues like this. The CORSAIR CMPSU-550VX 550W is a good choice. The rosewill PSU "should" support 6 drives without issue. But from what is posted about the quality of it, there is good reason for suspect. How many drives do you plan to go with? >> I figure it is good since pre-clear completed fine, or so i thought. This is good in some respect, but it does not use the parity drive at the same time. My issue was with spin up when the drive had been put in standby. There were also random dropouts of drives in the array. I removed the last drive and those issues stopped. (but I was at 8 drives adding 9th).
June 27, 201016 yr Author the total number of drives in un-known today, but it is possible with growth that one day it will go to 20 drives.
July 14, 201015 yr Author OK, just wanted to report back that things seem to have settled down. Since changing the BIOS settings things seem to be a bit more stable, been copying data to the new drives and things remain accessible. Watched some movies, no issues, so I am pretty happy with that. I was really worried for a while there. I wanted to go ahead and attach my latest syslog, just for a quick health check. I want to make sure that even if I dont notice something on the surface, that there isnt possibly another issue potentially brewing deep down. Thanks guys. As far as the comment to spin down the drives, what are the thoughts on that? I know there are arguments that say drives fail at spin-up, so if they stay spinning they last longer, but with todays drives is this really an issue any more? I keep mine spinning 24/7, but I would like to hear from the more experienced veterans as to what they do and how they handle the spin up spin down. syslog.txt
July 28, 201015 yr Author Well, it looks like things were too good to be true. Well, at least this time I have some better logs. I was copying 2 files at roughly the same time, one movie and some pictures. First copying my pictures failed and then I found my movie copy failed. Movie wasto disk 5 and Pictures were to Disk 2. Before doing anything drastic, I wanted to first get a good log that hopefully captures what is going on. I would guess 2 things, either it is the MOBO or the suspected PSU. However, I hope we can get somethign concrete. I would hate to drop $100 for a new PSU and then learn that that isnt even my issue. I have attached the latest syslog. If there is potentially another file I can pull, let me know as I am currently still able to access some things, but I can see that it is limited. Thanks! syslog-2010-07-27.zip
Archived
This topic is now archived and is closed to further replies.