Trying larger Zacate board - suggestions?


Recommended Posts

If any of you with Asus or ASRock boards want to defend the platform I wouldn't mind seeing your /proc/interrupts

 

Here are mine, Asus E35M1-M Pro.

 

          CPU0       CPU1       
 0:      13218     446293   IO-APIC-edge      timer
 1:          0          2   IO-APIC-edge      i8042
 9:          0          0   IO-APIC-fasteoi   acpi
12:          0          3   IO-APIC-edge      i8042
17:         13       1019   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
18:         12        125   IO-APIC-fasteoi   ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19:      12404    9601262   IO-APIC-fasteoi   ahci
40:          0          0   PCI-MSI-edge      xhci_hcd
41:          0          0   PCI-MSI-edge      xhci_hcd
42:          0          0   PCI-MSI-edge      xhci_hcd
43:        487       7216   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:     446679      13562   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:   65138088    1002131   Rescheduling interrupts
CAL:         87         39   Function call interrupts
TLB:      11747      10737   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:         16         16   Machine check polls
ERR:          0
MIS:          0

 

Don't know if the 'rescheduling interrupts' indicates a problem or not.  From what I've read here, https://help.ubuntu.com/community/ReschedulingInterrupts, it may be.

Link to comment

That's easy. Wow, there must be an EFI combination or secret handshake that works better than anything I tried.

 

What was the approximate uptime when you snapped /proc/interrupts?

 

Have you noticed any performance issues? "IRQ xx disabled" or worse messages in your syslog? Freezes with a call trace on the console? What are your network transfer speeds like? If all seems well you may not want to mess with it.

 

One thing I'd want to try, since your eth0 has an interrupt to itself, is to assign it to one or the other core. As far as I remember a single TCP/IP stack doesn't appreciate SMP. If one core is sleeping to save power then being woken up to do something badly, well... If that's what's happening. Which is where disabling power saving features and/or setting core affinity would tell more.

Link to comment

I'm such a stooge. After your posts I couldn't help but start another test system with my board. Used the CLRTC jumper, copied a fresh unraid and bam, my /proc/interrupts is clean. My old habit of disabling unimportant mobo features first seems to have been an error. Now for fresh tests...

 

BTW, this was the first I've tried a USB3 flash drive in a USB3 port. It makes a huge improvement in boot time. The loading dots take about 3.5 seconds. (SuperTalent Express Duo 16)

Link to comment
What was the approximate uptime when you snapped /proc/interrupts?

 

It wasn't very long, a couple of hours maybe.  I moved my server back to my closet and it's been up for 16 hours, and this what my proc interrupts looks like now:

 

          CPU0       CPU1       
 0:       3239    5831142   IO-APIC-edge      timer
 1:          0          2   IO-APIC-edge      i8042
 9:          0          0   IO-APIC-fasteoi   acpi
12:          0          3   IO-APIC-edge      i8042
17:         21       1135   IO-APIC-fasteoi   ehci_hcd:usb1, ehci_hcd:usb2, ehci_hcd:usb3
18:         16         85   IO-APIC-fasteoi   ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19:       1016      85187   IO-APIC-fasteoi   ahci
40:          0          0   PCI-MSI-edge      xhci_hcd
41:          0          0   PCI-MSI-edge      xhci_hcd
42:          0          0   PCI-MSI-edge      xhci_hcd
43:       2153     500058   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:    5831528       3583   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     989307     919345   Rescheduling interrupts
CAL:        389        355   Function call interrupts
TLB:     177042     177892   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        195        195   Machine check polls
ERR:          0
MIS:          0

 

The rescheduling interrupts are more sane now.  I enabled EPU power savings in the BIOS which was off by default and off prior to my last reboot.  I would have thought that would have more ill effect on the rescheduling interrupts, but they seem to have improved.  Odd.

 

Have you noticed any performance issues? "IRQ xx disabled" or worse messages in your syslog? Freezes with a call trace on the console? What are your network transfer speeds like? If all seems well you may not want to mess with it.

 

No, it's been very, very stable.  I had an unexpected reboot yesterday morning which I can't explain.  I left the closet door open and I have a theory that the cat stepped on the reset/power button.  It's been running 24/7 for a few weeks with no problems until yesterday. It's also the first time the server was accessible to the cat.  I recently put the server on a new UPS, a cyberpower, which may not be compatible with the installed power management software.  Maybe a random control signal could have caused the reboot?  I had it attached to an APC prior to moving it to the closet.  Will be keeping an eye on it in the coming weeks.

 

Syslog is clean.  Transfer speeds are good but not spectacular.  I'm in the 40's on writes (with cache drive) and well above that on reads.  That's a HUGE improvement from my ReadyNAS NV+ so I haven't worried too much about it.  

 

One thing I'd want to try, since your eth0 has an interrupt to itself, is to assign it to one or the other core. As far as I remember a single TCP/IP stack doesn't appreciate SMP.

 

I may look into that but don't know if it's possible.

 

I'm wondering how much power can be conserved by tweaking the BIOS and setting all settings to 'max power savings', disabling the 'turbo switch', and maybe undervolting the cpu and ram.  Right now, with all 5 of my drives powered down, my server is at 24 watts.  I just love the power consumption of these zacate systems.

Link to comment

It doesn't look bad in this copy. Most eth0 work is going to the 2nd core so I doubt there's much of a performance impact. If you want to try it, something simple like:

 

echo "2" > /proc/irq/43/smp_affinity

 

should set it to use the 2nd core. (assumes eth0 is still on 43) It'll take effect immediately.

 

Cats. My gf's cat has done that kind of thing to me a few times. I don't recall ever having to go down that troubleshooting path in my cubicle days. :D

 

If you increase power saving features keep an eye out for problems when you hit the server for files. The transitions in and out of low-power modes are a classic place where chipsets and drivers start failing.

Link to comment
  • 2 weeks later...

I am much more the layman than you guys, but I'm struggling with a Gigabyte Zacate board at the moment with the eth0: link up issue, as is everyone else. Is there a general consensus yet as to which power saving settings in the BIOS or other tweaks will fix or at least stabilise the lan port?

 

Using a PCI card for ethernet isn't an option for me - the board has a single pci-express slot which I am using for an 8-port sata controller, and I am not keen to go for a USB to ethernet adapter.

 

Thanks!

Link to comment

The effect of BIOS settings will vary by board so I can't help on that one. My suggestion would be to disable everything power saving related and work back from a known-good situation.

 

I could never get the Asus board working reliably. With the kindest settings it might test good for 12hrs but could then throw several errors within the next hour. I'm currently waiting for a replacement on the off chance the specific board was at fault, but I'm not holding my breath. More likely it'll be revisions to the EFI and drivers that address the problems, but that doesn't explain how some seem to have theirs working without much effort.

 

Also, my last tests were using the Intel Pro nic. Using the onboard Realtek nic only ensured things would fail under a much lighter load.

Link to comment

My Asus zacate system has now been up for 12 days without an issue.  Once I closed the closet door and kept the cat away from the reset button, it's been rock solid.  I have been using it, backing up a couple of pc's with CrashPlan and storing/streaming DVD's.  I don't understand why my build has been so trouble free and others has been a nightmare.

Link to comment

Apologies, but my dead horse is back.

 

I'm now trying the new EFI BIOS 1002.

 

Looks better so far. To start with, it's keeping eth0 and ahci apart, and separate from USB.

 

I threw a bunch of rsync & simultaneous local copies at it, assuming it would fail quickly. Since it survived I'm letting it run a 5-drive parity check for the night. If it's okay in the morning I'll start the usual torture test.

 

I thought this was pre-release but it looks like they just posted:

 

http://www.asus.com/Motherboards/AMD_CPU_on_Board/E35M1M_PRO/#download

Link to comment
  • 2 months later...

No.

 

1) The 1002 release changed symptoms but would not work reliably with unRAID or stock Slackware 13.1. Interrupt problems would sometimes hide but would resurface, not always related to system load. Tried another of the same board without improvement.

2) Six weeks after the first 1002 release, another 1002 release appeared. Go figure. I haven't tested the 2nd one yet.

 

The tech and development support with this board has renewed my caution about new platforms. Too many EFI BIOS issues and they do not put adequate resources into resolving their bugs for non-Windows installations. I'd say go with an AM3 Sempron combination at the low end, or an i3 for more processing headroom. Both can have have quite low power consumption but with much higher processing potential. The Sempron would be the least expensive of the three.

 

Link to comment

Thanks for your post Cyrnel, but the problems you describe are only with the network, right ?

So if I buy an Intel card, that would be fine.

 

My problem ist, that I bought this mainboard not long ago, so I do not like to buy another one 2 months later :/

 

I bought my NAS to use WHS, so I do not check the compatibility with unraid. Now I know WHS is crap and I need UnRaid :(

 

On the Asus Forum I read some Problems with the PCIe slot, not recognizing any Sata card, but it seems that this is solved.

So I am noch sure to switch to UnRaid, nice OS but it seems to have many problems with my hardware.

Link to comment
  • 3 weeks later...

None of the posted BIOS versions have resolved all the problems, even when adding an Intel Pro nic. I'm trying a development BIOS now which seems to help but no idea about release dates, and honestly, it's somewhat back burner now. I had to go with another board for my backup server. I'm hopeful the new Kernel and drivers of 5.0b12 plus eventual BIOS fixes will help. My E35m1pro sits in a test rig waiting for updates. Until those things stabilize I can't imagine starting a new build with it.

 

IMO, look at some of the recent i3 configurations. They'll match the power savings at the low end, scale far better if needed, and without the headaches.

Link to comment
  • 3 weeks later...

I have the Asus E35M1-M (so NOT the Pro Version) running Beta 12a and really no problems. I transfered already 3,5TB via LAN to my array and no Problems (have to say that I used only 100mbit LAN).

BIOS Version is latest 1002.

So imo nothing against this Board.

On another Forum there are two user with the Pro Version of the board, latest BIOS and unraid V4.7 without problems.

Link to comment
  • 4 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.