October 12, 201213 yr Hi Guys, weird one for you all.. Really dates back from here: http://lime-technology.com/forum/index.php?topic=20674.msg183469#msg183469 But it has got a WHOLE lot more serious, the whole machine now restarts even though: 1. It is 50 hours prime stable at 3Ghz, 1.1V - disks unplugged, drawing 150W. 2. It is 48 hours prime stable at 3.2Ghz, 1.1V - disks unplugged, drawing 150W. 3. It is 48 hours memtest stable WITHOUT ECC enabled (ECC is now enabled). 4. The PSU's 12V and 5V rails are solid - even with the CPU at full volts (1.25V) and all disks spinning during a parity check (200W). It is a two year old 650W PSU. Currently, the machine at stock clocks and volts (3Ghz, 1066Mhz, 1.25V.. etc) will restart at any time (even if it is under no apparent load what-so-ever) and come up with these errors: Oct 12 16:55:58 Tower kernel: mdcmd (23): check CORRECT (unRAID engine) Oct 12 16:55:58 Tower kernel: md: recovery thread woken up ... (unRAID engine) Oct 12 16:55:58 Tower kernel: md: recovery thread checking parity... (unRAID engine) Oct 12 16:55:58 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. (unRAID engine) Oct 12 16:55:59 Tower kernel: ------------[ cut here ]------------ Oct 12 16:55:59 Tower kernel: WARNING: at kernel/workqueue.c:1220 worker_enter_idle+0xf8/0x104() (Minor Issues) Oct 12 16:55:59 Tower kernel: Hardware name: System Product Name Oct 12 16:55:59 Tower kernel: Modules linked in: cpufreq_conservative md_mod xor k10temp sg asus_atk0110 r8168(O) hwmon ahci powernow_k8 mperf atiixp libahci sata_sil24 (Drive related) Oct 12 16:55:59 Tower kernel: Pid: 8, comm: kworker/1:0 Tainted: G O 3.4.11-unRAID #1 (Errors) Oct 12 16:55:59 Tower kernel: Call Trace: (Errors) Oct 12 16:55:59 Tower kernel: [<c1022c18>] warn_slowpath_common+0x65/0x7a (Errors) Oct 12 16:55:59 Tower kernel: [<c1030f62>] ? worker_enter_idle+0xf8/0x104 (Errors) Oct 12 16:55:59 Tower kernel: [<c1022c3c>] warn_slowpath_null+0xf/0x13 (Errors) Oct 12 16:55:59 Tower kernel: [<c1030f62>] worker_enter_idle+0xf8/0x104 (Errors) Oct 12 16:55:59 Tower kernel: [<c1033404>] worker_thread+0x2a7/0x2c3 (Errors) Oct 12 16:55:59 Tower kernel: [<c103315d>] ? rescuer_thread+0x1dd/0x1dd (Errors) Oct 12 16:55:59 Tower kernel: [<c1035bb4>] kthread+0x67/0x6c (Errors) Oct 12 16:55:59 Tower kernel: [<c1035b4d>] ? kthread_freezable_should_stop+0x49/0x49 (Errors) Oct 12 16:55:59 Tower kernel: [<c13208f6>] kernel_thread_helper+0x6/0xd (Errors) Oct 12 16:55:59 Tower kernel: ---[ end trace 6ce8446c979e1d2d ]--- Oct 12 16:55:59 Tower avahi-daemon[5001]: Service "Tower" (/services/afp.service) successfully established. Oct 12 16:55:59 Tower avahi-daemon[5001]: Service "Tower-SMB" (/services/smb.service) successfully established. Oct 12 16:56:14 Tower logger: Fri Oct 12 16:55:14 BST 2012 - Hard Drives active, resetting counter Oct 12 16:57:05 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6001 jiffies) Oct 12 16:57:05 Tower kernel: Pid: 5057, comm: unraidd Tainted: G W O 3.4.11-unRAID #1 (Errors) Oct 12 16:57:05 Tower kernel: Call Trace: (Errors) Oct 12 16:57:05 Tower kernel: [<c10551fc>] print_cpu_stall+0x59/0xd1 (Errors) Oct 12 16:57:05 Tower kernel: [<c10552af>] __rcu_pending+0x3b/0x125 (Errors) Oct 12 16:57:05 Tower kernel: [<c105540f>] rcu_check_callbacks+0x76/0xa1 (Errors) Oct 12 16:57:05 Tower kernel: [<c102b12b>] update_process_times+0x2d/0x58 (Errors) Oct 12 16:57:05 Tower kernel: [<c1048e6a>] tick_periodic+0x63/0x65 (Errors) Oct 12 16:57:05 Tower kernel: [<c1048e85>] tick_handle_periodic+0x19/0x6c (Errors) Oct 12 16:57:05 Tower kernel: [<c10173fa>] smp_apic_timer_interrupt+0x67/0x7a (Errors) Oct 12 16:57:05 Tower kernel: [<c131ff92>] apic_timer_interrupt+0x2a/0x30 (Errors) Oct 12 16:57:05 Tower kernel: [<c1081359>] ? __slab_free+0x73/0x25a (Errors) Oct 12 16:57:05 Tower kernel: [<c13208e9>] ? common_interrupt+0x29/0x30 (Errors) Oct 12 16:57:05 Tower kernel: [<c1081e99>] kmem_cache_free+0x83/0x8c (Errors) Oct 12 16:57:05 Tower kernel: [<c1081e99>] ? kmem_cache_free+0x83/0x8c (Errors) Oct 12 16:57:05 Tower kernel: [<c105fe73>] ? mempool_free_slab+0xe/0x10 (Errors) Oct 12 16:57:05 Tower last message repeated 2 times Oct 12 16:57:05 Tower kernel: [<c105fe73>] mempool_free_slab+0xe/0x10 (Errors) Oct 12 16:57:05 Tower kernel: [<c105ff0a>] mempool_free+0x5d/0x64 (Errors) Oct 12 16:57:05 Tower kernel: [<c1230cc0>] scsi_sg_free+0x3f/0x42 (Errors) Oct 12 16:57:05 Tower kernel: [<c11ac14b>] __sg_free_table+0x47/0x5e (Errors) Oct 12 16:57:05 Tower kernel: [<c1230c81>] ? scsi_init_sgtable+0x76/0x76 (Errors) Oct 12 16:57:05 Tower kernel: [<c1230d26>] __scsi_release_buffers+0x22/0x96 (Errors) Oct 12 16:57:05 Tower kernel: [<c123110e>] scsi_end_request+0x5f/0x70 (Errors) Oct 12 16:57:05 Tower kernel: [<c1231439>] scsi_io_completion+0x1ae/0x3e5 (Errors) Oct 12 16:57:05 Tower kernel: [<c12311d2>] ? scsi_device_unbusy+0x7c/0x82 (Errors) Oct 12 16:57:05 Tower kernel: [<c122c560>] scsi_finish_command+0x9d/0xa3 (Errors) Oct 12 16:57:05 Tower kernel: [<c1231739>] scsi_softirq_done+0xba/0xc2 (Errors) Oct 12 16:57:05 Tower kernel: [<c119653a>] blk_done_softirq+0x4a/0x57 (Errors) Oct 12 16:57:05 Tower kernel: [<c1027056>] __do_softirq+0x6b/0xe5 (Errors) Oct 12 16:57:05 Tower kernel: [<c1026feb>] ? irq_enter+0x41/0x41 (Errors) Oct 12 16:57:05 Tower kernel: <IRQ> [<c1026e9f>] ? irq_exit+0x32/0x58 Oct 12 16:57:05 Tower kernel: [<c1003506>] ? do_IRQ+0x7c/0x90 (Errors) Oct 12 16:57:05 Tower kernel: [<c13208e9>] ? common_interrupt+0x29/0x30 (Errors) Oct 12 16:57:05 Tower kernel: [<c131f980>] ? _raw_spin_unlock_irqrestore+0x8/0xa (Errors) Oct 12 16:57:05 Tower kernel: [<c103b36c>] ? __wake_up+0x32/0x3b (Errors) Oct 12 16:57:05 Tower kernel: [<f855dd9f>] ? md_done_sync+0x23/0x27 [md_mod] (Errors) Oct 12 16:57:05 Tower kernel: [<f8561599>] ? handle_stripe+0xb61/0xcfb [md_mod] (Errors) Oct 12 16:57:05 Tower kernel: [<f85617aa>] ? unraidd+0x77/0xb8 [md_mod] (Errors) Oct 12 16:57:05 Tower kernel: [<f855e925>] ? md_thread+0xcc/0xe3 [md_mod] (Errors) Oct 12 16:57:05 Tower kernel: [<c1035eac>] ? wake_up_bit+0x5b/0x5b (Errors) Oct 12 16:57:05 Tower kernel: [<f855e859>] ? import_device+0x147/0x147 [md_mod] (Errors) Oct 12 16:57:05 Tower kernel: [<c1035bb4>] ? kthread+0x67/0x6c (Errors) Oct 12 16:57:05 Tower kernel: [<c1035b4d>] ? kthread_freezable_should_stop+0x49/0x49 (Errors) Oct 12 16:57:05 Tower kernel: [<c13208f6>] ? kernel_thread_helper+0x6/0xd (Errors) Having now forced it to the performance governor, it (seems) to be ok. Please note that this is from a COMPLETELY STOCK INSTALL WITH NO ADDONS (Apart from my clock/voltage and CPU governor script). Why would the cpu freq governor cause such carnage!??
October 13, 201213 yr Author Yep, completed two whole parity checks with Prime95 running at 1.1V with the performance governor and not a single error. Will try the on demand governor.
October 13, 201213 yr Author Completed a third complete parity check, done a 50GB Time Machine Backup, transferred 20GB of photos and 100GB of videos to the server and not a single error with the performance governor. Will try the on-demand once all the file system IO is completed.
Archived
This topic is now archived and is closed to further replies.