Trying larger Zacate board - suggestions?


Recommended Posts

I'm ordering a Zacate board for my backup unRAID, and as something of an experiment. I'm considering larger boards like the Asus E35M1-M PRO, primarily for future expansion. I've seen the little ASRock but the extra $45 buys:

 

5th SATA on the interior.

Another PCIE - (x1)

2 PCI

and USB3, should things ever move in that direction.

 

If anyone has suggestions or alternatives I'm all ears.

 

Thanks.

Link to comment

Depends on your needs, of course, but in general I would give that upgrade a solid 'meh'.  Essentially you are gaining only three extra SATA ports (one on the motherboard, two via the PCIe x1 slot).  You could put a fourth SATA drive on one of the PCI slots, but any more than that and you have to start worrying about bottlenecks (since PCI slots generally share bandwidth).  USB3 is definitely a nice feature if you have some other devices to make use of it, but generally speaking eSATA is more useful in the unRAID environment as it allows you to add more array drives (USB3 is faster, but you can only mount USB3 drives outside the array).

 

What are your future expansion goals?  If you buy a board that just gives you options with no specific goal in mind, then there's a good chance that you'll never even make use of those options (at least that's what I always do).

Link to comment

I hear you about the "expandability" thing. Can't count the number of boxes I've built with future-proofing that never saw daylight again.

 

This will be a 10 drive unraid. 7 drives now to back up my primary and room for mild growth. It's also my first zacate of course, but honestly, beyond the inevitable day of messing around, once running it should hum along out of sight for multiple years. The supposed low power consumption was a plus for this.

 

This seemed to nicely make use of my parts bin debris. I have too many 2 and 1-port SATA PCIE and PCI cards so hitting 10 drives would be free. It probably won't even get an external preclear bay.

 

The original plan was something Atom based, but these little guys seem to raise the bar both on low-power consumption and performance/W, and I'd like to give one a try. If you think Atom boards are still a better solution I'd certainly consider them. I'll admit I'm trying to squeeze a new toy into the project, if it makes any sense.

Link to comment

In that case I think the Zacate board is just right for you.  I do think the Zacate CPUs are a better option than the Atoms at this point.  The Atom based server that I've been playing with recently has been giving me problems (though I may have a defective unit).  The Zacate also gives you for more options for repurposing at a later date.

 

So by my count you can run:

 

5 drives on the motherboard

2 in the PCIe x16 slot

2 in the PCIe x1 slot

1 in the PCI slot

 

Assuming you already own all the SATA expansion cards you need for this, you should be able to reach 10 drives easily (and you have room to expand via a SASLP card later if you need to).

Link to comment

Early results: Asus E35M1-M PRO M1 board (w/cpu fan), single PCIE SATAx2 card, 2x120mm case fans, 5x2TB drives (3 Hitachi 7K3000, 1 Hitachi 5K3000, 1 WD EARS) idles at 36W all drives spun down, 59-68W all spun up. Just a curiosity data-point so far. Actual drives will be 1x7k3000 parity + 5ks for data. I'll transplant components into the case and start a real array tomorrow.

Link to comment

Early results: Asus E35M1-M PRO M1 board (w/cpu fan), single PCIE SATAx2 card, 2x120mm case fans, 5x2TB drives (3 Hitachi 7K3000, 1 Hitachi 5K3000, 1 WD EARS) idles at 36W all drives spun down, 59-68W all spun up. Just a curiosity data-point so far. Actual drives will be 1x7k3000 parity + 5ks for data. I'll transplant components into the case and start a real array tomorrow.

 

I am also using the Asus E35M1-M Pro but without the cpu fan.  Seems to be working great and can't complain with the amount of power used.

 

Link to comment

I am also using the Asus E35M1-M Pro but without the cpu fan.  Seems to be working great and can't complain with the amount of power used.

 

Any issues getting 4.7 running on yours? Everything behaved fine here until I started testing under load, then the realtek driver seems to die.

"r8169: eth0: link up...last message repeated 139 times..." :o then the console freezes with what looks like a dying-gasp-trace"

 

I'll try to narrow things down.

Link to comment

Bah, I got the onboard NIC its own interrupt and things improved, but the NIC still goes down and returns under load, often taking the system with it. If the system is quiet I can copy files to it, but if much is happening it stumbles (eth0: link up repeated xx times) and eventually dies. Feels like polled IO or a driver problem. I hate being so out of touch with things. Time to google. I'm listening if anyone has suggestions.

Link to comment

Cyrnel there is a whole thread on this mobo on AVS, and some have the same issue and suggest turning off a few energy savings param on the NIC. That seems to be able to solve the issue you describe.

I'm buying mine now, will report on it as well.

And one more warning : I read on the ASUS forum some issue reports with RAID cards, so beware of this as well!

Link to comment
I am using 5.0b6a and I haven't had an issue.  What kind of load triggers the problem?  Maybe I haven't stressed my box enough to see it.

 

I started a thread in general support, but basically large-file Gb copies from a client to the server start out great then cause havoc on the server. The time between initial "eth0 up" "repeat" messages and freeze is maybe 5 seconds. If caught before the freeze the server will recover but the problem doesn't go away. Preventing shared interrupts and disabling all non-essential devices seemed to help but it just delays the inevitable.

 

I'm playing with Slackware 13 now but given life and my usual pace of remembering things my Intel card will arrive first. Still, it'd be nice to figure this one out. If it can work on some units of the same board model then it's more likely to be a config issue than something insurmountable.

Link to comment

Same experience here as cyrnel with my ASUS E35m1-m :

* Great platform for Unraid : cool, quiet, consumption as low as it gets - 35W without HDD with a Corsair 620W power supply (not the greenest on earth)

* Managed to also reduce my HDD temps vs previous board (Asus P5B with Intel C2D) since board really runs cool!

* Huge issue with the network card, seems to dislike Unraid for some reason, crash under load. Will change it soon!

Link to comment

Under 4.7:

        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: g
        Current message level: 0x00000033 (51)
        Link detected: yes

 

It's working better now. No SATA problems, and I can write to it over the nic at 65MB/s, but copying big files to the array during a parity check still gets progressive "link up" death.

 

I'll be able to work on it again Sunday.

Link to comment

IO is respectable. I'm stress-testing now so have lots going on:

 

- Repeating parity check on the 5 drives, all connected to on-board ports

- rsync several TB of movies & music from another unRAID (15-20MB/s)

- Crashplan slowly receiving files over the net

- Playing 4 movies (3 DVDs & 1 avi) from unRAID volume on networked pc (annoying but smooth)

 

unMenu's disk performance total ranges between 80-230MB/s. Load ~4.5. System responds quickly to unRAID & unMenu page requests. It went all night and the only syslog additions are normal activity.

 

This is still with an Intel Pro PCI nic. The on-board Realtek just wouldn't play nice under load. Using pci=nomsi for the kernel to eliminate interrupt problems. Don't know how this might affect you guys with the multiport pcie cards.

Link to comment

I was becoming anxious about not having the backup server functional so have had to change directions on this thing. I threw in an old 939 board. Ironically it's cranking along with its onboard Marvel nic. The bad part is 2x power consumption vs. the Zacate board.

 

Back to the Zacate, I suspect the root of my problem has been my inability to separate work between interrupts. This EFI-based board wants to assign eth, sata, usb, and more to two interrupts. I started playing with affinity, in case fixing cache misses would be enough, but I'd still get "interrupt disabled" and fall into polling mode when the right disks were used at the same time as big network loads. Hate to blame new technology but I think this EFI may need to grow before it's ready for a server. Take that with the salt of my re-introduction to Linux after a very long absence; the problem may very well be myself, lack of time, and need for backup.

 

Anyway, I'll pop the Zacate board into a test box and continue as time allows. If any of you with Asus or ASRock boards want to defend the platform I wouldn't mind seeing your /proc/interrupts. My examples are offline but basically everything related to external IO would be assigned to IRQ 17 & 18 (or higher depending on apic parameters), neither of which would have core affinity. Seems like the worst combination.

Link to comment

Yes, I saw interrupt problems with the Intel nic though it dealt with them better. When I was pushing the nic full speed then did a local disk to disk copy the system would disable the nic/SATA interrupt every few minutes. It would then move along in what seemed like polled mode until it later recovered on its own. Under the same conditions the onboard Realtek nic would reboot or freeze with a call trace. To me the Intel card & driver worked but couldn't make up for lost events.

 

"Stale me" disclaimers apply, but it seems the board's EFI just doesn't want to spread the work across IRQs and its simple gui doesn't provide overrides (that I know of). I was able to trick it temporarily by adding/removing cards & other devices but a few reboots later I'd notice it had reverted to eth0, sata, usb, etc all using the same IRQ. Somewhat frustrating.

 

More gui controls or a proper shell to talk to the EFI could change things completely.

 

Link to comment

I was a day away from ordering the board too.  Now I'm not so sure.  I'm not sure what kind of impact the interrupt issue has on the system either.  Does it lead to performance issues or something more serious?

 

Also, people have been reporting no problems with even the onboard NIC with the 5.0 beta.  I'm just wondering if the new kernel fixes the issues or if they just haven't properly tested the board.  I'm guessing you haven't tried the beta?

Link to comment

I've tried both. This week has been 5.0b6a.

 

The reports of it working for others here has crossed my mind. I'm guessing people aren't nailing their systems as badly. It did seem to work fine for me until I saw mild problems and tried to get rid of them.

 

Do you know if the ASRock boards use EFI?

Link to comment

Just saw that myself. It could be different enough that it doesn't matter. Anyway, I shouldn't start down the anti-EFI path. At this point the most I can claim is one bad apple. Also, others seem to be having better luck. It could be they know more, got lucky, or aren't as picky. I'm hoping some notice us here and relate their experiences.

 

The simple version of interrupts is they're the concierge telling you your car is ready. He tells you, you go get your car, then another car arrives for the next guest. When interrupts are storming people don't have time to get their car before the next arrives. That leads to overruns or worse. Fallback to polling which would sort of be the concierge checking to make sure you were gone before allowing the next car to arrive. Much slower and cpu intensive but predictable.

 

Helpful? Too simple?

 

Interrupts aren't as simple as they used to be. Pin/cost reduction has shifted lots of it away from what I used to write to intermediate levels, but the basic ideas remain.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.