ysss Posted April 9, 2014 Share Posted April 9, 2014 Hey y'all, I've just started setting up a supermicro H8DME-2 based server from Tam's Solution (w/ 24bay 4U supermicro case, 2x quad core cpu and 16gb ram) and migrated my old 19-drives unraid server which was housed in a Norco 4220 case to it. First, the good news... everything booted up normally and all I had to do was start the array to get things running, but once I started the parity check, I was really disappointed to see that it was only hitting 15-20MB/sec whereas I've read post that Tam's server should hit 105MB/sec-ish without any changes. I have 19 drives, about 2/3 are 3 or 4TB ones which clock in at 130-150MB/sec each (on hdparm -tT), and the slowest ones are WD 2TB EARS which go for 100-105MB/sec. These same set of drives were getting parity check speed of 55-65MB/sec from the get go (and goes up to 90-100MB/sec above 2TB) in my Norco setup. I've played with the disk settings to no avail (md_num_stripes at 2048, md_sync_window at 1280). There are no AHCI settings in the bios and I've disabled the IDE and SATA controller anyway, since no drives are connected to the motherboard. Copying files from a windows machine to the server, I got 33MB/sec. Things to try: - I have yet to disable int13h on the SAT-MV8s. - I haven't updated the BIOS to 3.5a I've also noticed that the openwyrn-openssh.plg plugin takes a helluva long time to start up (nearly 10 minutes). It may have always been like that in my old Norco setup, but if so I didn't realize it since the machine hardly ever reboots. What am I doing wrong? How can I speed things up? Thanks syslog.txt Quote Link to comment
dgaschk Posted April 9, 2014 Share Posted April 9, 2014 The syslog does not include a parity check and doesn't show any problems. Try starting in SAFE-MODE and run the check. Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 thank you for checking my syslog. This is what a parity check look like right now (started it, then stopped it not long after): md: recovery thread woken up ... md: recovery thread checking parity... md: using 5120k window, over a total of 3907018532 blocks. mdcmd (87): nocheck md: md_do_sync: got signal, exit... md: recovery thread sync completion status: -4 The parity check speed stays around 16-20MB/sec whether I do completely nothing on the server or when I put slight load on it (stream a movie off it, run preclear). Currently I'm preclearing a 4TB disk (started at 148MB/s, now it's hovering around 96MB/s at the 3TB mark). I will try a firmware update and post an update here again. Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 I've flashed the motherboard's firmware to the latest one (3.5a) and there doesn't seem to be any difference at all. There are no newer firmware for the SAT2-MV8 cards either. Parity check maxed at 21.6MB/s just now and curiously it seems to be really tying up the /mnt/user share (shfs). I tried to do a syslog dump to /mnt/user/syslog.txt and it just hung there for minutes, until I cancelled parity check. Attached is the syslog (w/ parity check start and stop) and also a `ps -aux` dump. And these: oot@archive:~# dmesg | grep IRQ ACPI: BIOS IRQ0 override ignored. ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. NR_IRQS:2304 nr_irqs:744 16 spurious 8259A interrupt: IRQ7. ACPI: PCI Interrupt Link [LNKA] (IRQs 16 17 18 19) *10 ACPI: PCI Interrupt Link [LNKB] (IRQs 16 17 18 19) *0, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 16 17 18 19) *0, disabled. ACPI: PCI Interrupt Link [LNKD] (IRQs 16 17 18 19) *0, disabled. ACPI: PCI Interrupt Link [LNEA] (IRQs 16 17 18 19) *14 ACPI: PCI Interrupt Link [LNEB] (IRQs 16 17 18 19) *0, disabled. ACPI: PCI Interrupt Link [LNEC] (IRQs 16 17 18 19) *5 ACPI: PCI Interrupt Link [LNED] (IRQs 16 17 18 19) *0, disabled. ACPI: PCI Interrupt Link [LUB0] (IRQs 21 22 23) *14 ACPI: PCI Interrupt Link [LMAD] (IRQs 20) *11 ACPI: PCI Interrupt Link [LUB2] (IRQs 21 22 23) *7 ACPI: PCI Interrupt Link [LMAC] (IRQs 20) *10 ACPI: PCI Interrupt Link [LAZA] (IRQs 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [LSMB] (IRQs 21 22 23) *11 ACPI: PCI Interrupt Link [LPMU] (IRQs 21 22 23) *5 ACPI: PCI Interrupt Link [LSA0] (IRQs 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [LSA1] (IRQs 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [LATA] (IRQs 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [LSA2] (IRQs 21 22 23) *0, disabled. PCI: Using ACPI for IRQ routing ACPI: PCI Interrupt Link [LUB0] enabled at IRQ 23 ACPI: PCI Interrupt Link [LUB2] enabled at IRQ 22 Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 20 ACPI: PCI Interrupt Link [LNEC] enabled at IRQ 19 sata_mv 0000:03:04.0: Gen-II 32 slots 8 ports SCSI mode IRQ via INTx ACPI: PCI Interrupt Link [LNEA] enabled at IRQ 18 sata_mv 0000:03:06.0: Gen-II 32 slots 8 ports SCSI mode IRQ via INTx sata_mv 0000:04:06.0: Gen-II 32 slots 8 ports SCSI mode IRQ via INTx ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 20 root@archive:~# dmesg | grep irq ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) nr_irqs_gsi: 40 NR_IRQS:2304 nr_irqs:744 16 CPU 0 irqstacks, hard=ed00a000 soft=ed00c000 CPU 1 irqstacks, hard=ed0c0000 soft=ed0c2000 CPU 2 irqstacks, hard=ed0cc000 soft=ed0ce000 CPU 3 irqstacks, hard=ed0ec000 soft=ed0ee000 CPU 4 irqstacks, hard=ed0f8000 soft=ed0fa000 CPU 5 irqstacks, hard=ed10c000 soft=ed10e000 CPU 6 irqstacks, hard=ed128000 soft=ed12a000 CPU 7 irqstacks, hard=ed13c000 soft=ed13e000 pcieport 0000:00:0a.0: irq 40 for MSI/MSI-X pcieport 0000:00:0d.0: irq 41 for MSI/MSI-X pcieport 0000:00:0e.0: irq 42 for MSI/MSI-X pcieport 0000:00:0f.0: irq 43 for MSI/MSI-X ehci-pci 0000:00:02.1: irq 22, io mem 0xfc2bec00 ohci_hcd 0000:00:02.0: irq 23, io mem 0xfc2bf000 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 ata1: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd722000 irq 19 ata2: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd724000 irq 19 ata3: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd726000 irq 19 ata4: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd728000 irq 19 ata5: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd732000 irq 19 ata6: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd734000 irq 19 ata7: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd736000 irq 19 ata8: SATA max UDMA/133 mmio m1048576@0xfd700000 port 0xfd738000 irq 19 ata9: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd622000 irq 18 ata10: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd624000 irq 18 ata11: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd626000 irq 18 ata12: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd628000 irq 18 ata13: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd632000 irq 18 ata14: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd634000 irq 18 ata15: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd636000 irq 18 ata16: SATA max UDMA/133 mmio m1048576@0xfd600000 port 0xfd638000 irq 18 ata17: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb22000 irq 18 ata18: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb24000 irq 18 ata19: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb26000 irq 18 ata20: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb28000 irq 18 ata21: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb32000 irq 18 ata22: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb34000 irq 18 ata23: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb36000 irq 18 ata24: SATA max UDMA/133 mmio m1048576@0xfeb00000 port 0xfeb38000 irq 18 root@archive:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 51 2 0 15 97 2450 13047 127350 IO-APIC-edge timer 1: 0 0 0 0 0 0 0 2 IO-APIC-edge i8042 7: 1 0 0 0 0 0 0 0 IO-APIC-edge 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 0 0 0 0 3 IO-APIC-edge i8042 18: 1 0 0 17 10 549 1078002 10708699 IO-APIC-fasteoi sata_mv, sata_mv 19: 1 0 0 13 30 746 1948720 2820399 IO-APIC-fasteoi sata_mv 22: 0 0 0 0 0 0 3 1247 IO-APIC-fasteoi ehci_hcd:usb1 23: 0 0 0 0 0 0 1 38 IO-APIC-fasteoi ohci_hcd:usb2 44: 1 0 0 3 8 1810 9139 124227 PCI-MSI-edge eth0 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 12653 18382 20300 14410 22636 21462 27219 47093 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries RES: 53601 14237 10420 8528 865819 834927 62112 6919 Rescheduling interrupts CAL: 59286 2328 2320 1701 41 28 20 20 Function call interrupts TLB: 133 824 319 225 275 744 306 142 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 10 10 10 10 10 10 10 10 Machine check polls ERR: 1 MIS: 0 syslog-041014.txt ps-041014.txt Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 I've dropped the RAM from 16GB down to 4GB. I've moved the SAT2-MV8 cards so they each get their own IRQ: (compare sata_mv assignments to the one above) root@archive:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 48 1 2 7 2 0 16 7344 IO-APIC-edge timer 1: 0 0 0 0 0 0 0 2 IO-APIC-edge i8042 7: 1 0 0 0 0 0 0 0 IO-APIC-edge 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 0 0 0 0 3 IO-APIC-edge i8042 17: 0 0 0 2 0 0 0 435 IO-APIC-fasteoi sata_mv 18: 0 0 0 0 0 0 1 451 IO-APIC-fasteoi sata_mv 19: 0 0 0 4 0 0 0 214 IO-APIC-fasteoi sata_mv 22: 0 0 0 0 0 0 1 1127 IO-APIC-fasteoi ehci_hcd:usb1 23: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ohci_hcd:usb2 44: 0 0 0 0 0 0 5 8106 PCI-MSI-edge eth0 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 2099 3768 2294 2152 1723 1719 2696 138 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries RES: 2340 3533 3436 1313 1947 3985 1961 2722 Rescheduling interrupts CAL: 94 80 757 63 12 14 15 14 Function call interrupts TLB: 49 382 172 59 57 450 114 61 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 2 2 2 2 2 2 2 2 Machine check polls ERR: 1 MIS: 0 I'm still stuck at 24MB/s parity check. If any HDD is bottlenecking this process, it should show up with hdparm -tT, no?? Quote Link to comment
LinuxGuyGary Posted April 10, 2014 Share Posted April 10, 2014 Have you verified that all the drives are connecting at full speed ? I have the same server, and had swapped to some new cables with latches during a server cleanup, and found that my parity check speeds were very low. It turned out that the new cables were junk. I double / triple checked that they were full seated etc, but nothing made the drives connect all full speed. I reverted to the original cables (no latch though) and link speeds were once again solid. use dmesg|grep "SATA link" to look for the speeds, I was seeing some drives at 1.5 Gbps and some at 3.0, moving the new but obviously poor cables around moved the slow link speed to a different drive. dmesg |grep "SATA link" ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Surprisingly, these Monoprice cables were the problem. But nothing against Monoprice because I have used their 8087-sata forward breakout cables with 100% success on other servers. Quote Link to comment
vl1969 Posted April 10, 2014 Share Posted April 10, 2014 original cards in this server does not support SATA3 cables (with latches) I tried it and could not connect all cables to the card. only SATA2 cables (no latches ) are supported. Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 @LinuxGuyGary: I'm using the stock SATA cables that came with the system; they don't look pristine, but seems to be working well: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata17: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata14: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata15: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata16: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata18: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata19: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata20: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata21: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata22: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata23: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata24: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (ata4, 5 and 6 are supposed to be empty) @vl1969: Thanks, I'll stick with the stock cables for now. Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 Finally... this seems to be the problem? Apr 10 23:21:28 archive kernel: INFO: rcu_sched self-detected stall on CPU { 7} (t=6000 jiffies g=1434 c=1433 q=7796) Apr 10 23:21:28 archive kernel: Pid: 2836, comm: unraidd Not tainted 3.9.11p-unRAID #5 (Errors) Apr 10 23:21:28 archive kernel: Call Trace: (Errors) Apr 10 23:21:28 archive kernel: [<c1062c2a>] print_cpu_stall+0xbc/0x107 (Errors) Apr 10 23:21:28 archive kernel: [<c1062eba>] __rcu_pending+0x4f/0x12a (Errors) Apr 10 23:21:28 archive kernel: [<c1063008>] rcu_check_callbacks+0x73/0x9b (Errors) Apr 10 23:21:28 archive kernel: [<c1032ed9>] update_process_times+0x2d/0x53 (Errors) Apr 10 23:21:28 archive kernel: [<c105520b>] tick_sched_timer+0x77/0xa1 (Errors) Apr 10 23:21:28 archive kernel: [<c1040e02>] ? __remove_hrtimer+0x25/0x7a (Errors) Apr 10 23:21:28 archive kernel: [<c1040f45>] __run_hrtimer+0x45/0xaf (Errors) Apr 10 23:21:28 archive kernel: [<c10412ad>] hrtimer_interrupt+0xf1/0x1e7 (Errors) Apr 10 23:21:28 archive kernel: [<c10483d9>] ? sched_clock_cpu+0x3f/0x13f (Errors) Apr 10 23:21:28 archive kernel: [<c101c43a>] smp_apic_timer_interrupt+0x6d/0x7f (Errors) Apr 10 23:21:28 archive kernel: [<c1401411>] apic_timer_interrupt+0x2d/0x34 (Errors) Apr 10 23:21:28 archive kernel: [<c124408a>] ? xor_sse_5_pf64+0x70/0x32c (Errors) Apr 10 23:21:28 archive kernel: [<c12435de>] xor_blocks+0x74/0x7c (Errors) Apr 10 23:21:28 archive kernel: [<f88d50b8>] check_parity+0x96/0xcc [md_mod] (Errors) Apr 10 23:21:28 archive kernel: [<f88d5bfb>] handle_stripe+0xa29/0xceb [md_mod] (Errors) Apr 10 23:21:28 archive kernel: [<c1044f5f>] ? __wake_up+0x3b/0x42 (Errors) Apr 10 23:21:28 archive kernel: [<f88d5f2e>] unraidd+0x71/0xb5 [md_mod] (Errors) Apr 10 23:21:28 archive kernel: [<f88d2cb2>] md_thread+0xd3/0xea [md_mod] (Errors) Apr 10 23:21:28 archive kernel: [<c103f031>] ? wake_up_bit+0x5b/0x5b (Errors) Apr 10 23:21:28 archive kernel: [<c103ebf1>] kthread+0x90/0x95 (Errors) Apr 10 23:21:28 archive kernel: [<f88d2bdf>] ? import_device+0x166/0x166 [md_mod] (Errors) Apr 10 23:21:28 archive kernel: [<c1401837>] ret_from_kernel_thread+0x1b/0x28 (Errors) Apr 10 23:21:28 archive kernel: [<c103eb61>] ? kthread_freezable_should_stop+0x4a/0x4a (Errors) Apr 10 23:21:40 archive kernel: mce: [Hardware Error]: Machine check events logged (Errors) Apr 10 23:23:28 archive kernel: INFO: rcu_sched self-detected stall on CPU { 7} (t=6000 jiffies g=1439 c=1438 q=7157) Apr 10 23:23:28 archive kernel: Pid: 2836, comm: unraidd Not tainted 3.9.11p-unRAID #5 (Errors) Apr 10 23:23:28 archive kernel: Call Trace: (Errors) Apr 10 23:23:28 archive kernel: [<c1062c2a>] print_cpu_stall+0xbc/0x107 (Errors) Apr 10 23:23:28 archive kernel: [<c1062eba>] __rcu_pending+0x4f/0x12a (Errors) Apr 10 23:23:28 archive kernel: [<c1063008>] rcu_check_callbacks+0x73/0x9b (Errors) Apr 10 23:23:28 archive kernel: [<c1032ed9>] update_process_times+0x2d/0x53 (Errors) Apr 10 23:23:28 archive kernel: [<c105520b>] tick_sched_timer+0x77/0xa1 (Errors) Apr 10 23:23:28 archive kernel: [<c1040e02>] ? __remove_hrtimer+0x25/0x7a (Errors) Apr 10 23:23:28 archive kernel: [<c1040f45>] __run_hrtimer+0x45/0xaf (Errors) Apr 10 23:23:28 archive kernel: [<c10412ad>] hrtimer_interrupt+0xf1/0x1e7 (Errors) Apr 10 23:23:28 archive kernel: [<c10483d9>] ? sched_clock_cpu+0x3f/0x13f (Errors) Apr 10 23:23:28 archive kernel: [<c101c43a>] smp_apic_timer_interrupt+0x6d/0x7f (Errors) Apr 10 23:23:28 archive kernel: [<c1044d0b>] ? check_preempt_curr+0x39/0x64 (Errors) Apr 10 23:23:28 archive kernel: [<c1401411>] apic_timer_interrupt+0x2d/0x34 (Errors) Apr 10 23:23:28 archive kernel: [<c12440c9>] ? xor_sse_5_pf64+0xaf/0x32c (Errors) Apr 10 23:23:28 archive kernel: [<c12435de>] xor_blocks+0x74/0x7c (Errors) Apr 10 23:23:28 archive kernel: [<f88d50b8>] check_parity+0x96/0xcc [md_mod] (Errors) Apr 10 23:23:28 archive kernel: [<f88d5bfb>] handle_stripe+0xa29/0xceb [md_mod] (Errors) Apr 10 23:23:28 archive kernel: [<c1044f5f>] ? __wake_up+0x3b/0x42 (Errors) Apr 10 23:23:28 archive kernel: [<f88d5f2e>] unraidd+0x71/0xb5 [md_mod] (Errors) Apr 10 23:23:28 archive kernel: [<f88d2cb2>] md_thread+0xd3/0xea [md_mod] (Errors) Apr 10 23:23:28 archive kernel: [<c103f031>] ? wake_up_bit+0x5b/0x5b (Errors) Apr 10 23:23:28 archive kernel: [<c103ebf1>] kthread+0x90/0x95 (Errors) Apr 10 23:23:28 archive kernel: [<f88d2bdf>] ? import_device+0x166/0x166 [md_mod] (Errors) Apr 10 23:23:28 archive kernel: [<c1401837>] ret_from_kernel_thread+0x1b/0x28 (Errors) Apr 10 23:23:28 archive kernel: [<c103eb61>] ? kthread_freezable_should_stop+0x4a/0x4a (Errors) Apr 10 23:24:27 archive kernel: mdcmd (68): nocheck (unRAID engine) I looked up 'rcu_sched self-detected stall on CPU' and it relates to PowerNow setting. Here's the pertinent BIOS settings: MTRR Mapping [Continuous] Thermal Throttling [Disabled] PowerNow [Disabled] Secure Virtual Machine Mode [Enabled] CPU Page Translation Table [Enabled] CPU Prefetching [Enabled] IO Prefetching [Enabled] Probe Filter [Auto] I have a pair of 2346 HE Opterons installed (default type from Tam's), using fan-less heatsinks. CPU monitor pegs the temp around 30'C as the machine is stored in a pretty cool room (21'C ambient temperature) Quote Link to comment
vl1969 Posted April 10, 2014 Share Posted April 10, 2014 DO NOT, I repeat DO NOT use fanless heat sinks in this box. I can sell you several after my experience with them. after several strange issues and reboots and shutdowns and even compleat system going into power off mode with strange alarm blaring all over my basement, all with in one week period. I traced it to CPU over heating. I used a solid copper fan-less HS for similar CPU but designed for a 1U server. the air flow in this chassis is not enough to cool it. I ended up getting a pair of cooler master T-4 HS there is a similar model 212 EVO but it is just a litle bit higher than the case. T-4 fits perfectly, it actually fits on existing mounts (the black plastic kind), (I had removed the mounts initially to fit the fanless HS but put it back) for this one. using just the spring holder with the HS fit it perfectly. the fans are very quiet and if need be you can mount the second fan on the other side of the HS, my CPU stais cool as it is. best coolers for this box... Quote Link to comment
ysss Posted April 10, 2014 Author Share Posted April 10, 2014 @vl1969: Undone! As soon as I typed that last post, I went back to the server room to replace the stock AMD fan on one of the Opterons and took out the second CPU; to reduce the offending variables. Still no joy Parity check at 21MB/sec. I'm not sure what to think now. Edit: cat /proc/cpuinfo shows the cpu freq at 1000mhz for all cores? (It's supposed to be 1.8ghz) I think i should double check power connectors to the mobo and maybe swap the cpu if that doesnt work. And swap back the psu to the default one; and try to build a new array with my spare hdd and a fresh unraid stick. Quote Link to comment
ysss Posted April 17, 2014 Author Share Posted April 17, 2014 This issue has completely baffled me... So I let the parity check ran to completion (nearly 2 days): It started at 24MB/s As soon as it passed 2TB mark, the speed went up to 36MB/s Then passing 3TB, it went up to 55MB/s or so. I have 18 drives plus parity, about 8 of them still 2TB, the rest are mixed between 3 and 4TB. I'm still slowly moving away from the 2TB drives because their performance and age. At first I thought this was a clear sign of bus bottlenecking; but I looked up PCI-X speed and they should do at least 800MB/s, which means my parity check should be at least 90MB/s at the start. I've also ran diskspeed.sh which tests each hdd on the system and generates an average speed (from hdparm) of the drives. None of my drive scored below 85MB/s. Help me Obi-wan. Spec: Supermicro H8DME-2 motherboard AMD 2346 HE quad core 8GB ECC RAM 3x AOC-SAT2-MV8 controllers 19 drives of various makes and models (8x 2TB drivess, 6x 3TB drves, 5x 4TB drives) Enermax Revolution 87+ PSU Quote Link to comment
ysss Posted April 17, 2014 Author Share Posted April 17, 2014 Btw, I have tried adding another AOC-SAT2-MV8 controller; so each controllers are wired to 6 hdd (from 8hdd per controller). The parity check speed increased from 24MB/s to 35MB/s. Quote Link to comment
dgaschk Posted April 17, 2014 Share Posted April 17, 2014 Replace one of the AOC-SAT2-MV8 with a AOC-SASLP-MV8 and make sure that the AOC-SAT2-MV8 are on separate busses. Quote Link to comment
ZeroK Posted April 17, 2014 Share Posted April 17, 2014 Replace one of the AOC-SAT2-MV8 with a AOC-SASLP-MV8 and make sure that the AOC-SAT2-MV8 are on separate busses. +1 One of the first things I did with my TAMs server was replace the PCI-X sata controllers with the SAS-MV8's. I never got to see what the old pci-x sata controllers would do I only know what the SAS versions do. I get about 68MB/s and I have a mix of 2 and 3tb drives and 24tb total. Quote Link to comment
BobPhoenix Posted April 17, 2014 Share Posted April 17, 2014 On my original unRAID server (X7SBE MB) I used two AOC-SAT2-MV8s and MB SATA ports for 22 drives. One SAT2-MV8 was on the 133mhz PCI-X bus and the other on the 100mhz PCI-X bus. I got 50-100MB/s on the 2TB WD Greens I had at the time on those cards. So maybe not as fast as the PCIe based cards but still acceptable - at least to me. When I tried to run 3 AOC-SAT2-MV8's with 2 cards on the 133mhz bus the speeds dropped to 30-65MB/s depending on where on the platter it was reading or writing to. Quote Link to comment
ysss Posted April 18, 2014 Author Share Posted April 18, 2014 @dgaschk, zeroK, BobPhoenix: thanks a lot, guys. All signs pointed to bus bottlenecking, but i remembered a few posts about this server in which ppl mentioned that they're ready to use (without mods) and that people have gotten 105MB/s parity check speed. I've also read that PCI-X should do 1GB/s at 133mhz and 800MB/s at 100mhz (which should enable at least around 100Mb/s per drive on Sat2-mv8's max capacity, which is 8 drives per controller). .... It didnt make sense to me, until i went back to the manual and studied the diagram: Quote Link to comment
dgaschk Posted April 18, 2014 Share Posted April 18, 2014 PCIex8 moves about 2GBps. It easily accommodates 1000MBps + 800MBps. The issue is that PCI-x slots share the bus capacity. One of the busses has 2 cards and 16 drives can use more than 1000Mbps. If the system had 2 cards it would have acceptable speeds. How many PCI-x cards are the people who are getting 105MBps using? How many drives are connected in those systems? The third card is killing performance. Replacing one of the cards with a PCIe card should substantially improve performance. Quote Link to comment
clowrym Posted April 19, 2014 Share Posted April 19, 2014 I have a tams server, 3 pci x cards, although a different motherboard, X7dbe i believe.....I typically get 70-90 for parity check average....I do see over 100 on occasion Quote Link to comment
dgaschk Posted April 19, 2014 Share Posted April 19, 2014 I have a tams server, 3 pci x cards, although a different motherboard, X7dbe i believe.....I typically get 70-90 for parity check average....I do see over 100 on occasion How many drives? Quote Link to comment
ysss Posted April 19, 2014 Author Share Posted April 19, 2014 PCIex8 moves about 2GBps. It easily accommodates 1000MBps + 800MBps. The issue is that PCI-x slots share the bus capacity. One of the busses has 2 cards and 16 drives can use more than 1000Mbps. If the system had 2 cards it would have acceptable speeds. How many PCI-x cards are the people who are getting 105MBps using? How many drives are connected in those systems? The third card is killing performance. Replacing one of the cards with a PCIe card should substantially improve performance. Ah alright. I'll try 1 pci-x card per bus. Whatabout the onboard sata connectors (nforce chipset)? Those usually operate full speed, right? Quote Link to comment
dgaschk Posted April 19, 2014 Share Posted April 19, 2014 Yes. The onboard are the best. Quote Link to comment
ysss Posted April 22, 2014 Author Share Posted April 22, 2014 I'm still baffled.... here are the stats from my tests, through some changes of HBA configs... last one I start to use the onboard controller. I'm waiting for SFF-8087 breakout cables before I can deploy M1015s into the mix. 1: Onboard: - SLOT1 (100mhz): - SLOT2 (100mhz): SAT2-MV8: 7 HDD SLOT3 (133mhz): SAT2-MV8: 6 HDD SLOT4 (133mhz): SAT2-MV8: 6 HDD Parity check: 24MB/s Total = 456MB/s 100mhz channel = 168MB/s 133mhz channel = 288MB/s 2: Onboard: - SLOT1 (100mhz): SAT2-MV8: 5 HDD SLOT2 (100mhz): SAT2-MV8: 5 HDD SLOT3 (133mhz): SAT2-MV8: 5 HDD SLOT4 (133mhz): SAT2-MV8: 4 HDD Parity check: 34MB/s Total = 646MB/s 100mhz channel = 340MB/s 133mhz channel = 306MB/s 3: Onboard: 6 HDD SLOT1 (100mhz): 4HDD SLOT2 (100mhz): 4HDD SLOT3 (133mhz): 5HDD SLOT4 (133mhz): - Parity check: 30MB/s Total = 570MB/s 100mhz channel = 240MB/s 133mhz channel = 150MB/s 4: Onboard: 6 HDD SLOT1 (100mhz): - SLOT2 (100mhz): 4HDD SLOT3 (133mhz): 4HDD SLOT4 (133mhz): 5HDD Parity check: 28MB/s Total = 532MB/s 100mhz channel = 112MB/s 133mhz channel = 252MB/s ps: parity check is just the initial starting speed (first 5-10 minutes, taken from a few samples) pps: where's the logic in all this? What is the actual bottleneck? Quote Link to comment
BobPhoenix Posted April 22, 2014 Share Posted April 22, 2014 Try it this way and see what your times are: Onboard: 6 HDD SLOT1 (100mhz): - SLOT2 (100mhz): SAT2-MV8: 7 HDD SLOT3 (133mhz): - SLOT4 (133mhz): SAT2-MV8: 6 HDD Quote Link to comment
clowrym Posted April 22, 2014 Share Posted April 22, 2014 I have a tams server, 3 pci x cards, although a different motherboard, X7dbe i believe.....I typically get 70-90 for parity check average....I do see over 100 on occasion How many drives? I have 9 installed inc. cache & Parity Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.