Joe L. Posted June 7, 2013 Share Posted June 7, 2013 See here for full syslog: http://lime-technology.com/forum/index.php?topic=27720.msg245463#msg245463 Jun 6 21:20:28 Tower1 logger: /usr/sbin/rpc.mountd Jun 6 21:20:28 Tower1 mountd[9899]: Kernel does not have pseudo root support. Jun 6 21:20:28 Tower1 mountd[9899]: NFS v4 mounts will be disabled unless fsid=0 Jun 6 21:20:28 Tower1 mountd[9899]: is specfied in /etc/exports file. Jun 6 21:20:28 Tower1 emhttp: shcmd (106): /usr/local/sbin/emhttp_event svcs_restarted Jun 6 21:20:28 Tower1 emhttp_event: svcs_restarted Jun 6 21:25:37 Tower1 kernel: mdcmd (61): check NOCORRECT Jun 6 21:25:37 Tower1 kernel: md: recovery thread woken up ... Jun 6 21:25:37 Tower1 kernel: md: recovery thread checking parity... Jun 6 21:25:38 Tower1 kernel: md: using 1536k window, over a total of 2930273228 blocks. Jun 6 22:33:49 Tower1 kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6001 jiffies g=17792 c=17791 q=12897) Jun 6 22:33:49 Tower1 kernel: Pid: 9006, comm: unraidd Not tainted 3.9.3-unRAID #8 Jun 6 22:33:49 Tower1 kernel: Call Trace: Jun 6 22:33:49 Tower1 kernel: [<c1062abc>] print_cpu_stall+0xbc/0x107 Jun 6 22:33:49 Tower1 kernel: [<c1062d4c>] __rcu_pending+0x4f/0x12a Jun 6 22:33:49 Tower1 kernel: [<c1062e9a>] rcu_check_callbacks+0x73/0x9b Jun 6 22:33:49 Tower1 kernel: [<c1032e89>] update_process_times+0x2d/0x53 Jun 6 22:33:49 Tower1 kernel: [<c10550db>] tick_sched_timer+0x77/0xa1 Jun 6 22:33:49 Tower1 kernel: [<c1040d4a>] ? __remove_hrtimer+0x25/0x7a Jun 6 22:33:49 Tower1 kernel: [<c1040e8d>] __run_hrtimer+0x45/0xaf Jun 6 22:33:49 Tower1 kernel: [<c10411f5>] hrtimer_interrupt+0xf1/0x1e7 Jun 6 22:33:49 Tower1 kernel: [<c101c426>] smp_apic_timer_interrupt+0x6d/0x7f Jun 6 22:33:49 Tower1 kernel: [<c14030f9>] apic_timer_interrupt+0x2d/0x34 Jun 6 22:33:49 Tower1 kernel: [<c12f007b>] ? ide_dump_status+0xab/0x14a Jun 6 22:33:49 Tower1 kernel: [<c1247446>] ? blk_update_request+0x12f/0x308 Jun 6 22:33:49 Tower1 kernel: [<c124762d>] blk_update_bidi_request+0xe/0x4f Jun 6 22:33:49 Tower1 kernel: [<c1248045>] blk_end_bidi_request+0x1d/0x53 Jun 6 22:33:49 Tower1 kernel: [<c12480ba>] blk_end_request+0x12/0x14 Jun 6 22:33:49 Tower1 kernel: [<c12fbc45>] scsi_end_request+0x1f/0x70 Jun 6 22:33:49 Tower1 kernel: [<c12fbfb0>] scsi_io_completion+0x1b0/0x421 Jun 6 22:33:49 Tower1 kernel: [<c12fbfb0>] ? scsi_io_completion+0x1b0/0x421 Jun 6 22:33:49 Tower1 kernel: [<c12fbd49>] ? scsi_device_unbusy+0x7c/0x82 Jun 6 22:33:49 Tower1 kernel: [<c12f6de4>] scsi_finish_command+0x91/0x97 Jun 6 22:33:49 Tower1 kernel: [<c12fc2f5>] scsi_softirq_done+0xc5/0xcd Jun 6 22:33:49 Tower1 kernel: [<c124c87a>] blk_done_softirq+0x4a/0x57 Jun 6 22:33:49 Tower1 kernel: [<c102e980>] __do_softirq+0x8d/0x145 Jun 6 22:33:49 Tower1 kernel: [<c102ea16>] ? __do_softirq+0x123/0x145 Jun 6 22:33:49 Tower1 kernel: [<c102ea98>] irq_exit+0x33/0x6c Jun 6 22:33:49 Tower1 kernel: [<c100367e>] do_IRQ+0x87/0x9b Jun 6 22:33:49 Tower1 kernel: [<c1403a2c>] common_interrupt+0x2c/0x31 Jun 6 22:33:49 Tower1 kernel: [<c125e07e>] ? memcmp+0x17/0x25 Jun 6 22:33:49 Tower1 kernel: [<f882484f>] handle_stripe+0xa53/0xcf6 [md_mod] Jun 6 22:33:49 Tower1 kernel: [<c1044e4b>] ? __wake_up+0x3b/0x42 Jun 6 22:33:49 Tower1 kernel: [<f8824b63>] unraidd+0x71/0xb5 [md_mod] Jun 6 22:33:49 Tower1 kernel: [<f8821b7a>] md_thread+0xd3/0xea [md_mod] Jun 6 22:33:49 Tower1 kernel: [<c103ef79>] ? wake_up_bit+0x5b/0x5b Jun 6 22:33:49 Tower1 kernel: [<c103eb39>] kthread+0x90/0x95 Jun 6 22:33:49 Tower1 kernel: [<f8821aa7>] ? import_device+0x166/0x166 [md_mod] Jun 6 22:33:49 Tower1 kernel: [<c1403537>] ret_from_kernel_thread+0x1b/0x28 Jun 6 22:33:49 Tower1 kernel: [<c103eaa9>] ? kthread_freezable_should_stop+0x4a/0x4a Jun 6 22:34:59 Tower1 kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6001 jiffies g=17793 c=17792 q=19669) Jun 6 22:34:59 Tower1 kernel: Pid: 9006, comm: unraidd Not tainted 3.9.3-unRAID #8 Jun 6 22:34:59 Tower1 kernel: Call Trace: Jun 6 22:34:59 Tower1 kernel: [<c1062abc>] print_cpu_stall+0xbc/0x107 Jun 6 22:34:59 Tower1 kernel: [<c1062d4c>] __rcu_pending+0x4f/0x12a Jun 6 22:34:59 Tower1 kernel: [<c1062e9a>] rcu_check_callbacks+0x73/0x9b Jun 6 22:34:59 Tower1 kernel: [<c1032e89>] update_process_times+0x2d/0x53 Jun 6 22:34:59 Tower1 kernel: [<c10550db>] tick_sched_timer+0x77/0xa1 Jun 6 22:34:59 Tower1 kernel: [<c1040d4a>] ? __remove_hrtimer+0x25/0x7a Jun 6 22:34:59 Tower1 kernel: [<c1040e8d>] __run_hrtimer+0x45/0xaf Jun 6 22:34:59 Tower1 kernel: [<c10411f5>] hrtimer_interrupt+0xf1/0x1e7 Jun 6 22:34:59 Tower1 kernel: [<c101c426>] smp_apic_timer_interrupt+0x6d/0x7f Jun 6 22:34:59 Tower1 kernel: [<c14030f9>] apic_timer_interrupt+0x2d/0x34 Jun 6 22:34:59 Tower1 kernel: [<c104007b>] ? cpu_timer_fire+0x35/0x5c Jun 6 22:34:59 Tower1 kernel: [<c1402b53>] ? _raw_spin_unlock_irqrestore+0x8/0xa Jun 6 22:34:59 Tower1 kernel: [<c1044e4b>] __wake_up+0x3b/0x42 Jun 6 22:34:59 Tower1 kernel: [<f8820f33>] md_done_sync+0x2b/0x2f [md_mod] Jun 6 22:34:59 Tower1 kernel: [<f8824947>] handle_stripe+0xb4b/0xcf6 [md_mod] Jun 6 22:34:59 Tower1 kernel: [<c1044e4b>] ? __wake_up+0x3b/0x42 Jun 6 22:34:59 Tower1 kernel: [<f8824b63>] unraidd+0x71/0xb5 [md_mod] Jun 6 22:34:59 Tower1 kernel: [<f8821b7a>] md_thread+0xd3/0xea [md_mod] Jun 6 22:34:59 Tower1 kernel: [<c103ef79>] ? wake_up_bit+0x5b/0x5b Jun 6 22:34:59 Tower1 kernel: [<c103eb39>] kthread+0x90/0x95 Jun 6 22:34:59 Tower1 kernel: [<f8821aa7>] ? import_device+0x166/0x166 [md_mod] Jun 6 22:34:59 Tower1 kernel: [<c1403537>] ret_from_kernel_thread+0x1b/0x28 Jun 6 22:34:59 Tower1 kernel: [<c103eaa9>] ? kthread_freezable_should_stop+0x4a/0x4a Jun 6 22:36:25 Tower1 kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=17799 c=17798 q=8142) Jun 6 22:36:25 Tower1 kernel: Pid: 9006, comm: unraidd Not tainted 3.9.3-unRAID #8 Jun 6 22:36:25 Tower1 kernel: Call Trace: Jun 6 22:36:25 Tower1 kernel: [<c1062abc>] print_cpu_stall+0xbc/0x107 Jun 6 22:36:25 Tower1 kernel: [<c1062d4c>] __rcu_pending+0x4f/0x12a Jun 6 22:36:25 Tower1 kernel: [<c1062e9a>] rcu_check_callbacks+0x73/0x9b Jun 6 22:36:25 Tower1 kernel: [<c1032e89>] update_process_times+0x2d/0x53 Jun 6 22:36:25 Tower1 kernel: [<c10550db>] tick_sched_timer+0x77/0xa1 Jun 6 22:36:25 Tower1 kernel: [<c1040d4a>] ? __remove_hrtimer+0x25/0x7a Jun 6 22:36:25 Tower1 kernel: [<c1040e8d>] __run_hrtimer+0x45/0xaf Jun 6 22:36:25 Tower1 kernel: [<c10411f5>] hrtimer_interrupt+0xf1/0x1e7 Jun 6 22:36:25 Tower1 kernel: [<c101c426>] smp_apic_timer_interrupt+0x6d/0x7f Jun 6 22:36:25 Tower1 kernel: [<c14030f9>] apic_timer_interrupt+0x2d/0x34 Jun 6 22:36:25 Tower1 kernel: [<c124007b>] ? crypto_aes_expand_key+0x123/0x39b Jun 6 22:36:25 Tower1 kernel: [<c12fb5a7>] ? scsi_request_fn+0x33a/0x371 Jun 6 22:36:25 Tower1 kernel: [<c1246d9b>] __blk_run_queue+0x28/0x31 Jun 6 22:36:25 Tower1 kernel: [<c12470dd>] blk_run_queue+0x1b/0x2c Jun 6 22:36:25 Tower1 kernel: [<c12fadf2>] scsi_run_queue+0xe4/0x151 Jun 6 22:36:25 Tower1 kernel: [<c12fb771>] scsi_next_command+0x28/0x34 Jun 6 22:36:25 Tower1 kernel: [<c12fbc8c>] scsi_end_request+0x66/0x70 Jun 6 22:36:25 Tower1 kernel: [<c12fbfb0>] scsi_io_completion+0x1b0/0x421 Jun 6 22:36:25 Tower1 kernel: [<c1246d9b>] ? __blk_run_queue+0x28/0x31 Jun 6 22:36:25 Tower1 kernel: [<c12fbd49>] ? scsi_device_unbusy+0x7c/0x82 Jun 6 22:36:25 Tower1 kernel: [<c12f6de4>] scsi_finish_command+0x91/0x97 Jun 6 22:36:25 Tower1 kernel: [<c12fc2f5>] scsi_softirq_done+0xc5/0xcd Jun 6 22:36:25 Tower1 kernel: [<c12fbc8c>] ? scsi_end_request+0x66/0x70 Jun 6 22:36:25 Tower1 kernel: [<c124c87a>] blk_done_softirq+0x4a/0x57 Jun 6 22:36:25 Tower1 kernel: [<c102e980>] __do_softirq+0x8d/0x145 Jun 6 22:36:25 Tower1 kernel: [<c12fbd49>] ? scsi_device_unbusy+0x7c/0x82 Jun 6 22:36:25 Tower1 kernel: [<c102ea98>] irq_exit+0x33/0x6c Jun 6 22:36:25 Tower1 kernel: [<c100367e>] do_IRQ+0x87/0x9b Jun 6 22:36:25 Tower1 kernel: [<c12fc2f5>] ? scsi_softirq_done+0xc5/0xcd Jun 6 22:36:25 Tower1 kernel: [<c1403a2c>] common_interrupt+0x2c/0x31 Jun 6 22:36:25 Tower1 kernel: [<c104007b>] ? cpu_timer_fire+0x35/0x5c Jun 6 22:36:25 Tower1 kernel: [<c1245100>] ? xor_avx_5+0x6e/0x34c Jun 6 22:36:25 Tower1 kernel: [<c124315e>] xor_blocks+0x74/0x7c Jun 6 22:36:25 Tower1 kernel: [<f8823ce2>] check_parity+0x96/0xcc [md_mod] Jun 6 22:36:25 Tower1 kernel: [<f882482b>] handle_stripe+0xa2f/0xcf6 [md_mod] Jun 6 22:36:25 Tower1 kernel: [<c1044e4b>] ? __wake_up+0x3b/0x42 Jun 6 22:36:25 Tower1 kernel: [<f8824b63>] unraidd+0x71/0xb5 [md_mod] Jun 6 22:36:25 Tower1 kernel: [<f8821b7a>] md_thread+0xd3/0xea [md_mod] Jun 6 22:36:25 Tower1 kernel: [<c103ef79>] ? wake_up_bit+0x5b/0x5b Jun 6 22:36:25 Tower1 kernel: [<c103eb39>] kthread+0x90/0x95 Jun 6 22:36:25 Tower1 kernel: [<f8821aa7>] ? import_device+0x166/0x166 [md_mod] Jun 6 22:36:25 Tower1 kernel: [<c1403537>] ret_from_kernel_thread+0x1b/0x28 Jun 6 22:36:25 Tower1 kernel: [<c103eaa9>] ? kthread_freezable_should_stop+0x4a/0x4a Jun 6 22:42:06 Tower1 kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=17986 c=17985 q=10992) Jun 6 22:42:06 Tower1 kernel: Pid: 9006, comm: unraidd Not tainted 3.9.3-unRAID #8 Jun 6 22:42:06 Tower1 kernel: Call Trace: Jun 6 22:42:06 Tower1 kernel: [<c1062abc>] print_cpu_stall+0xbc/0x107 Jun 6 22:42:06 Tower1 kernel: [<c1062d4c>] __rcu_pending+0x4f/0x12a Jun 6 22:42:06 Tower1 kernel: [<c1062e9a>] rcu_check_callbacks+0x73/0x9b Jun 6 22:42:06 Tower1 kernel: [<c1032e89>] update_process_times+0x2d/0x53 Jun 6 22:42:06 Tower1 kernel: [<c10550db>] tick_sched_timer+0x77/0xa1 Jun 6 22:42:06 Tower1 kernel: [<c1040d4a>] ? __remove_hrtimer+0x25/0x7a Jun 6 22:42:06 Tower1 kernel: [<c1040e8d>] __run_hrtimer+0x45/0xaf Jun 6 22:42:06 Tower1 kernel: [<c10411f5>] hrtimer_interrupt+0xf1/0x1e7 Jun 6 22:42:06 Tower1 kernel: [<c12fadf2>] ? scsi_run_queue+0xe4/0x151 Jun 6 22:42:06 Tower1 kernel: [<c101c426>] smp_apic_timer_interrupt+0x6d/0x7f Jun 6 22:42:06 Tower1 kernel: [<c14030f9>] apic_timer_interrupt+0x2d/0x34 Jun 6 22:42:06 Tower1 kernel: [<c12f007b>] ? ide_dump_status+0xab/0x14a Jun 6 22:42:06 Tower1 kernel: [<c12f6dc3>] ? scsi_finish_command+0x70/0x97 Jun 6 22:42:06 Tower1 kernel: [<c12fc2f5>] scsi_softirq_done+0xc5/0xcd Jun 6 22:42:06 Tower1 kernel: [<c12fc2f5>] ? scsi_softirq_done+0xc5/0xcd Jun 6 22:42:06 Tower1 kernel: [<c10482c5>] ? sched_clock_cpu+0x3f/0x13f Jun 6 22:42:06 Tower1 kernel: [<c124c87a>] blk_done_softirq+0x4a/0x57 Jun 6 22:42:06 Tower1 kernel: [<c102e980>] __do_softirq+0x8d/0x145 Jun 6 22:42:06 Tower1 kernel: [<c102ea16>] ? __do_softirq+0x123/0x145 Jun 6 22:42:06 Tower1 kernel: [<c102ea98>] irq_exit+0x33/0x6c Jun 6 22:42:06 Tower1 kernel: [<c100367e>] do_IRQ+0x87/0x9b Jun 6 22:42:06 Tower1 kernel: [<c100367e>] ? do_IRQ+0x87/0x9b Jun 6 22:42:06 Tower1 kernel: [<c1403a2c>] common_interrupt+0x2c/0x31 Jun 6 22:42:06 Tower1 kernel: [<f8823cb8>] ? check_parity+0x6c/0xcc [md_mod] Jun 6 22:42:06 Tower1 kernel: [<f882482b>] handle_stripe+0xa2f/0xcf6 [md_mod] Jun 6 22:42:06 Tower1 kernel: [<c1044e4b>] ? __wake_up+0x3b/0x42 Jun 6 22:42:06 Tower1 kernel: [<f8824b63>] unraidd+0x71/0xb5 [md_mod] Jun 6 22:42:06 Tower1 kernel: [<f8821b7a>] md_thread+0xd3/0xea [md_mod] Jun 6 22:42:06 Tower1 kernel: [<c103ef79>] ? wake_up_bit+0x5b/0x5b Jun 6 22:42:06 Tower1 kernel: [<c103eb39>] kthread+0x90/0x95 Jun 6 22:42:06 Tower1 kernel: [<f8821aa7>] ? import_device+0x166/0x166 [md_mod] Jun 6 22:42:06 Tower1 kernel: [<c1403537>] ret_from_kernel_thread+0x1b/0x28 Jun 6 22:42:06 Tower1 kernel: [<c103eaa9>] ? kthread_freezable_should_stop+0x4a/0x4a Jun 6 22:45:11 Tower1 kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6001 jiffies g=18128 c=18127 q=9065) The phrase "unraidd Not tainted" just means that the kernel does not have any proprietary (non-GPL) modules loaded. If you try to submit an error report to kernel guys and they see you're using a "tainted" kernel, they won't help you. This is not the case with the unRaid kernel build. As for the stall.. I have not reproduced this yet. The message is "informational" meaning there's not anything crashing, but still should not happen and I'll get to this issue at some point. Link to comment
jbartlett Posted June 17, 2013 Share Posted June 17, 2013 I would get a lot of these CPU stall errors when running a parity check and the value of md_sync_window seemed to have some effect as to the frequency of the errors occurring. I did not get any CPU stall errors on RC15 when I ran a non-correcting parity check. Link to comment
dikkiedirk Posted June 17, 2013 Share Posted June 17, 2013 Just installed RC15 an running a parity check now. Some time after starting the parity check these messages start appearing in my syslog. Link to comment
graywolf Posted June 19, 2013 Share Posted June 19, 2013 Just installed RC15 an running a parity check now. Some time after starting the parity check these messages start appearing in my syslog. Please post your syslog here so Tom has something to work with/review Link to comment
Recommended Posts