phatmack

Members
  • Posts

    53
  • Joined

  • Last visited

Everything posted by phatmack

  1. Well, thank you all for your help. Seems the new controller solved (at least temporarily) my problems. Parity Re-Sync and checks have worked flawlessly since replacing with the SAS2LP-MV8. Wonder if the old controllers just couldn't handle all of the simultaneous reads or something. Who knows. Regardless, it seems fixed now. Thanks again everyone.
  2. Fail. FML. Here's hoping there replacement controller works.
  3. Memory test came back clean... Updated MOBO BIOS (was acting strangely, taking forever to boot, etc.) and decided to run another parity re-sync while I wait for the SAS2LP-MV8 to show up today. We'll see!!
  4. Thanks y'all. I'm going to run my memory test tonight but I thought I'd reach out and ask about this in advance (in case the memory test doesn't come back with issues). Thanks again.
  5. I don't know guys... thinking ahead I'm wondering if it is, in fact, my PCI Express x1 4 Port SATA cards: http://www.newegg.com/Product/Product.aspx?Item=N82E16816124064 Too good to be true? What is a decent replacement? I only have the following expansion ports: 1 - PCI Express 3.0 x16 2 - PCI Express x1 Doesn't seem to be very much out there in the way of 4 port PCI Express x1 Controller Cards. All of them seem to require x4 or x8. Is this not doable? I need to get 8 drives on those three expansion ports. I've been looking through the hardware compatibility but I can't seem to find anything that fits the bill... Please advise! Thanks.
  6. The second drive down, disk 8, was the one that started causing issues as soon as I started using the NFS share and got up to a quarter million errors before Disk 7 failed... Disk 7 is also a part of the NFS share and apparently is also having problems. Lastly, disk 9 decided to log 6 thousand+ errors. I'll post SMART reports on these three HDDs but I'm 99.9% sure that the HDDs are not the issue. I've run SMART tests against them before with no problems and the drives do not report a single error when parity isn't being computed. I guess I need to buckle down and run that memtest now. Server is headless in the basement so I've been avoiding it because it's a pain... I'm exhausted though. This is SO frustrating. I just want it to work. Been MONTHS now of nothing but issues. EDIT: To get the array back into a working state I stopped it and all three of the drives listed above (notice that two are "green ball") have dropped out of the system and are showing as missing. Same freaking jacked up behavior as always.
  7. Hey y'all. I swapped out the parity drive, unbundled my SATA data cables, and re-seated every connection. It's currently computing parity. Very interesting behavior going on currently... the parity rebuild is running and is at 47% done. Everything was running smoothly with no errors on any of my disks until (and immediately after) I re-enabled a couple of VMWare guests that are running on the NFS datastore that unRAID is hosting. This is very suspect to me. One of my NFS share disks has 6429 errors (and climbing, rapidly) but is still "green ball" and the parity rebuild continues. Again, this didn't happen until I started using the NFS share... errors are as follows: Jun 24 19:35:48 hoarder kernel: md: disk8 read error, sector=61752 Jun 24 19:35:48 hoarder kernel: md: disk8 read error, sector=61760 Jun 24 19:35:48 hoarder kernel: md: disk8 read error, sector=61768 Jun 24 19:35:48 hoarder kernel: md: disk8 read error, sector=61776 Jun 24 19:35:48 hoarder kernel: md: disk8 read error, sector=61784 Jun 24 19:35:48 hoarder kernel: sd 10:0:0:0: [sdi] Unhandled error code Jun 24 19:35:48 hoarder kernel: sd 10:0:0:0: [sdi] Result: hostbyte=0x04 driverbyte=0x00 Jun 24 19:35:48 hoarder kernel: sd 10:0:0:0: [sdi] CDB: cdb[0]=0x28: 28 00 00 00 f1 a0 00 00 08 00 Jun 24 19:35:48 hoarder kernel: end_request: I/O error, dev sdi, sector 61856 Should I be concerned? (Just to re-iterate, these errors never occurred prior to computing parity and not until I started using the NFS share... albeit sparingly). Thanks!!
  8. As you know, I'll try anything to get this sucker working. I've spent too much money and too many hours to let it sit unprotected forever.
  9. Using all the drives is not the same as actively reading from all of them at once. Unless you had a LOT of simultaneous reads from multiple systems, each of which was from a different drive, you would NOT be stressing them nearly as much as a parity check. Doesn't mean that's absolutely the issue -- but it IS strange that that's the only time you see problems, which could easily mean you have a power-related issue. r.e. "... I guess what I mean is that if it were a cabling problem, the drives with poor cabling would have problems with or without the parity drive... right? ..." ==> Not necessarily. If the cables were bad, then you're correct. But if the problem is you're having a "noise" issue on the cables, then not necessarily. Do you have your SATA cables "bunched" (tied together) ?? If so, unbundle them !! And how many drives do you have on a single cable from the PSU? Interesting. I do have all of the SATA data cables bunched together. I'll unbundle them. PSU cables are spread out as evenly as possible. I think I have 3 per cable and maybe only one splitter. I'll have to look. Been about a month since I've opened it up.
  10. Not necessarily all that strange. When you're computing and/or checking parity is probably the ONLY time you're actually using all 12 drives at once. [A drive rebuild also does this; but clearly you aren't doing any rebuilds without good parity] I guess what I mean is that if it were a cabling problem, the drives with poor cabling would have problems with or without the parity drive... right? I definitely use all drives on the share with regularity...
  11. Worth a shot. Server hasn't been touched though and I would think if it were a cabling problem it would manifest itself with the parity drive disconnected as well. Super strange that it's rock solid sans parity drive.
  12. That's what I would think too... which is why I upgraded to the Seasonic 850 Gold PSU (http://www.newegg.com/Product/Product.aspx?Item=N82E16817151102) and I would think that would be enough to power my 12 drives...? I hope! I don't think my wife is going to let me buy another PSU.
  13. Hi All: Just came back to see what was shakin and excited for the release of RC15. Going to give it a go this week. I also wanted to provide some more data for all of you awesome troubleshooters to see if you can help me continue to pinpoint my continuing problem. I've been running unprotected, without a parity drive, since May 21st. Zero problems whatsoever. Runs great. As soon as I add that parity drive in, problems galore. Same stuff I've seen previously. Random drives drop out. I'm wondering what you all think this could be pointing to. Seems like it could be that my parity drive itself is having issues. Would this cause other drives to drop out? Gary, I've not run the memory test yet and I haven't put it on a UPS. I know I know. I need to do these things. Especially if you think this behavior points to a memory issue. I'll try and run it sometime this week. Just not looking forward to the 24hrs+ of downtime. Thanks y'all!
  14. Yes... in the last 24 hours I have had issues with drives on all three controllers (mobo and one on each of the pci controllers). I hope it's a memory issue. That should be fairly straightforward to fix. I'll also look into BIOS and firmware updates for the motherboard and cards respectively.
  15. That is all correct, yes. I'll try the mem test as you mentioned above and report back. Right now I'm running the array without the parity drive as it seems when I am doing a parity-sync is when the issues show up. Maybe that points to memory. I don't know but I'll try anything at this point.
  16. Ugh. Not solved. Still having intermittent HDD problems. Parity disk, data disk, doesn't matter. Seem to drop off at random. Re-seated cables. Doesn't seem to matter. This is getting REALLY ridiculous really quickly. Do I have to rebuild from scratch? Seems insane. I'll get a new syslog up here once it happens again... which will be soon I'm sure.
  17. Just wanted to give everyone that's helped me out a quick update. I did switch back to the Thermaltakes and immediately got a couple HDD errors. So I swapped BACK to the CoolerMasters and been rock solid ever since. For those of you that were counting that was three issues that needed resolving: 1) Bad PSU, 2) Bad drive enclosures, 3) Bad HDD. But we got there! Thanks for all your help. FYI, here are the temps I'm seeing with the CoolerMasters: Thanks y'all! I'm super duper happy!
  18. Well I know you all have been waiting with baited breath on what the status is of my particular issue... I believe this is solved!! w0ot. Power supply replacement FTW. I'm gonna keep an eye on it for a while before I authoritatively state that it's solved but it's been running pretty solid since I swapped out the PSU last Wed (and replaced the crap HDD). I'm back to the Thermaltake enclosures and things seem good. Gary, I'll get you temps tomorrow but they're going to be pretty inaccurate because my ambient has been changing quite a bit as of late. Yaay spring. THANK YOU ALL SO MUCH FOR YOUR HELP! Seriously, fantastic community, really appreciate it. -Dan
  19. Well, you could always go back :) ... however, before doing so (if you're really so inclined), I'd do the experiment I noted in post #50 Once (if?) it all gets fixed I think I am going to go back. Mostly just because I can return these CoolerMasters but would have to go through the trouble of selling the Thermaltakes on eBay. Call me lazy! I'll definitely get drive temp numbers though and hopefully likely write up a review on the Thermaltakes. Oh, and I misspoke earlier, the SATA connection that was mangled was on the drive side so I went ahead and replaced it with another drive I have... it's pre-clearing at the moment. SMART status report shows that this drive has 38972 Power_On_Hours! I think I'll be shopping for a replacement soon.
  20. Well, now it gets interesting... Disk 10 (sdj) failed (again) as soon as I tried to make changes to the files that live on that drive... so this is promising to me that it's the same disk that failed. That makes me happy. IIRC, this drive has a slightly mangled SATA data connector... luckily, I have a spare that I can swap out when I get home tonight. More later!! (Makes me wish I had kept my hot-swap bays in there though... Oh, and I always love it when troubleshooting and you find out that it's actually multiple issues. Just compounds the required effort! )
  21. So it will be done! Thanks for the heads up. Unrelated: I have noticed that my system takes about 2-3x the time to boot since switching to AHCI mode... but that could be attributed to something else. If I ever get this sucker stable I'll look into that... as you can imagine, boot time isn't my top priority at the moment! Haha.
  22. Thanks Gary... I'm sure I'll eventually get to that (memtest) point. I'm definitely willing to try anything but I'm skeptical of it being a memory issue because those sticks of memory ran in a separate box for 2 years prior to being brought over to my unRAID setup. Anything is possible though! After running several SMART tests on the (previously) failed disk, the array is back up. Currently, I'm running a parity check and no errors so far. Of course, this means nothing at this point in the game but I'm curious if it will complete without any errors. If that happens, I'll start using it heavily again and see if the problem re-surfaces with a different disk. Then I'll go the memtest route. After that, I'll look at a new motherboard (off of the approved list). Next, new controllers (also off of the approved list). Finally, if I'm still having issues, I'll roof test the whole blasted thing.
  23. Well... Extend offline SMART Self-test completed without error so I guess that's really crappy news. Drive seems like it's okay... I guess I'm out of ideas. A couple of days ago I switched my BIOS' SATA Mode Selection from IDE to AHCI but that clearly didn't make any difference... so what now? Swap out motherboard and CPU and get new PCIe HDD controllers? Ugh. Thanks in advance. -Dan
  24. Here is the SMART status report: http://pastebin.com/MKGgTZya I guess the following items concern me (but I know nothing, at the moment, about SMART status reports): 1 Raw_Read_Error_Rate 0x000f 113 097 006 Pre-fail Always - 103272373 ... 7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 383254921 Thanks for the AWESOME community support! You guys rock. EDIT: Short offline SMART Self-test "Completed without error". Running long test now. Will report back in an hour...