NAS

Moderators
  • Posts

    5040
  • Joined

  • Last visited

Posts posted by NAS

  1. When i try to telnet to the box it works but takes about 10 seconds to give a prompt. This is with a few different telnet clients so the prob is unRAID.

     

    I suspect it is to do with static IP and no real entries in resolv.conf but before i start messing about with it does anyone else get this?

  2. An update.

     

    I have changed to 4.2-beta-2 and started again slowly testing every piece of equipment. As it stands i am back to all original hardware including the 3rd TX4 card and the system is rock solid. Zero (not a single) error ever.

     

    The last test will be to add in the 9th drive but i want to run with the current hardware for a bit to see if it glitches before this.

  3. Swings and round abouts. I would suggest that one bigger one would be better since there is an economy of scale in UPSs (to a point).

     

    The thing is though unless you have some serious systems one UPS to cover 3 of them still wont be that huge or expensive. It all comes down to uptime in the event of a power failure thats where the money disappears.

     

    If you want 3 systems to be up for 2 hours after a complete power outage you will spend alot more than if you only need 5 minutes.

     

    I would suggest that the main protection people here need is from short black outs but more importantly brown outs (the real kit killer).

     

    If unRAID could implement some UPS integration agent then all you would need is 10 minutes of battery coverage to allow for graceful shutdown.

     

    One final note pay careful attention to weight. UPS companies have deals with delivery specialists to make delivery cheap.. so you may be unpleasantly surprised how heavy some of these things are.

  4. Ahh that explains it.

     

    FYI this "may" have fixed a problem i reported in my TX4 thread. I need to do alot of incremental testing to confirm that though. (I only post this incase someone else is in the same boat)

     

    Update: thinking about it since this is an event perhaps this along with any other relevant ones should be sent to the syslog i.e. when unRAID does something log it.

  5. What i would do is buy a big brand UPS such as APC.

     

    Now before you say "too expensive" check out ebay and get one even if it has no battery. Then you can get a brand new proper battery by mail order. What you end up with is essentially a brand new top end UPS for a fraction of the cost.

     

    As for how big once you go to a proper brand your no longer have to think about quality problems of protection and merely are calculating the amount of time the system will run without mains.

  6. A small update.

     

    Ive found a pattern that may be a red herring or not. Severla of the fatal lockups seem to happen after usr/sbin/hdparm -y /dev/sdg >/dev/null like:

     

    Sep  2 17:37:44 NASBOX emhttp[1294]: shcmd (14): /usr/sbin/hdparm -y /dev/sdg >/dev/null

    Sep  2 17:37:54 NASBOX kernel: [18173.780487] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

     

     

    Does that hint at anything?

  7. Thanks for the replies :)  Hopefully i answer all the questions here:

     

    department of longshots:

    I seem to recall something about problems with more than two Promise controllers on one motherboard. Not a Unraid issue, but a Promise issue.

    /Rene

     

    If anyone can point me at any of these it would be appreciated. I will keep looking as well.

     

    The posts I saw on this topic referred to it being solved by swapping PCI slots, but you did that

     

    I have even tried swapping cards out with new ones as i have spares. Have i tried every mathematical combination of cards and slots, probably not, but im not sure thats completely practical... thats an awful lot of reboots.

     

    * You mention not having changed anything but APCI, does this mean it worked before?

     

    No sorry that was perhaps a red herring. All the APCI fix done was remove a recurring syslog entry complqaining about a second CPU (which didnt exist).

    Fix applied as per a post here and the error went away.

     

    * Did you try turning APCI back to its original setting?

     

    Yes and the problem persists. As the problem is sporadic it sometimes takes a while to come back so at first i thought the APCI fix had done it but as i mentioned previousoly it was a red herring

     

    * Have you tried getting a smaller number of drives working?  Then add them until it is no longer stable.

     

    The system has grown so in essence i have done this. It wasnt until i added this latest drive (the first on the 3rd card) that the

    problem became quite so prevalent, although its always been there.

     

    * Are there drives hooked up directly to the mobo?

     

    No they all exist on the TX4 cards. In fact the mobo BIOS has all the SATA ports turned of as well as the secondary IDE. The primary IDE has a DVDR in it.

     

    * Are your drives set so they work with SATA150?

     

    Thats a new one on me what needs to be done to set a drive to SATA150?

     

    * Did you reseat the cards and cables (power and signal)?  SATA cables can get come loose fairly easily.

     

    Obviously there are alot of cables etc but they are relatively neat as I believe they are all ok having checked them several times.

     

    * Is your PSU up to the task?  For nine drives I would expect at least a good 500W supply

     

    Its a Hyper Type R 580W power supply so I assume yes.

     

    * Please post your complete config - be specific (mobo, age, cpu, drives, psu, memory, other devices, brands/models, how hooked up, ...)

    Hyper Type R 580W power

    Asrock K7S8XE+

    AMD Athlon XP 2500+

    2 Gail 512MB PC3200 Memory Sticks

    3 Promise SATA150 TX4 controllers in PCI slots 1,2 and 3

    DVDR (Would need to identify the brand but brand new)

    Parity  WDC WD5000AAKS-0

    disk1 WDC WD5000YS

    disk2 WDC WD5000AAKS

    disk3 WDC WD5000YS

    disk4 WDC WD5000YS

    disk5 WDC WD5000AAKS

    disk6 WDC WD5000YS

    disk7 WDC WD5000AAKS

    disk8 WDC WD5000YS

     

  8. I am having alot of problems getting my unRAID box stable. It has nine 500GB WD SATAII drives connected three Promise SAT150 TX4.

     

    The symptoms are that sporadically i receive countless syslogs (posted next) and then the box becomes completely unresponsove cant even ping it. I have everything that is unnecessary turned of in the BIOS, tried swapping PCI slots, rolling back unRAID versions, swapping NICs, turning of spin down and as many other things as i can think of.

     

    I am prepared to post as much information as anyone would like and basically try anything necessary to get to the route of this problem as it exceptionally annoying and its just a matter of time before i lose data.

     

    Nothing has been changed in unRAID except turning of apci via syslinux boot code.

     

    Syslog samples

     

    Sep  2 17:37:43 NASBOX emhttp[1294]: shcmd (12): /usr/sbin/hdparm -y /dev/sde >/dev/null

    Sep  2 17:37:44 NASBOX emhttp[1294]: shcmd (13): /usr/sbin/hdparm -y /dev/sdf >/dev/null

    Sep  2 17:37:44 NASBOX emhttp[1294]: shcmd (14): /usr/sbin/hdparm -y /dev/sdg >/dev/null

    Sep  2 17:37:54 NASBOX kernel: [18173.780487] ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen

    Sep  2 17:37:54 NASBOX kernel: [18173.780500] ata9.00: cmd e0/00:00:00:00:00/00:00:00:00:00/00 tag 0 cdb 0x0 data 0

    Sep  2 17:37:54 NASBOX kernel: [18173.780502]          res 40/00:00:00:40:00/0c:00:3a:00:00/00 Emask 0x4 (timeout)

    Sep  2 17:37:55 NASBOX kernel: [18174.110115] ata9: soft resetting port

    Sep  2 17:37:55 NASBOX kernel: [18174.110123] ata9: SATA link down (SStatus 614 SControl 300)

    Sep  2 17:37:55 NASBOX kernel: [18174.110131] ata9: failed to recover some devices, retrying in 5 secs

     

    Sep  2 17:47:12 NASBOX kernel: [  234.527461] ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x100 action 0x2

    Sep  2 17:47:12 NASBOX kernel: [  234.527469] ata9.00: (port_status 0x20200000)

    Sep  2 17:47:12 NASBOX kernel: [  234.527478] ata9.00: cmd 25/00:f8:ef:c9:0b/00:01:00:00:00/e0 tag 0 cdb 0x0 data 258048 in

    Sep  2 17:47:12 NASBOX kernel: [  234.527480]          res 51/0c:97:50:cb:0b/0c:00:00:00:00/e0 Emask 0x10 (ATA bus error)

    Sep  2 17:47:12 NASBOX kernel: [  234.850346] ata9: soft resetting port

    Sep  2 17:47:12 NASBOX kernel: [  235.010193] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

    Sep  2 17:47:12 NASBOX kernel: [  235.060666] ata9.00: configured for UDMA/33

    Sep  2 17:47:12 NASBOX kernel: [  235.060680] ata9: EH complete

    Sep  2 17:47:12 NASBOX kernel: [  235.129956] sd 9:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)

    Sep  2 17:47:12 NASBOX kernel: [  235.178340] sd 9:0:0:0: [sdg] Write Protect is off

    Sep  2 17:47:12 NASBOX kernel: [  235.178347] sd 9:0:0:0: [sdg] Mode Sense: 00 3a 00 00

    Sep  2 17:47:12 NASBOX kernel: [  235.255839] sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO