Recommended Posts

Not sure if this could cause any issues or not, my server repeatedly reports in my log file a CPU 3 stall:

 

I am running and AMD A8 processor on an Asus F1A75-v Pro mobo. 

 

Log details:

 

Jan 5 09:37:13 Tower kernel: Call Trace:

Jan 5 09:37:13 Tower kernel: [] print_cpu_stall+0x59/0xd1

Jan 5 09:37:13 Tower kernel: [] __rcu_pending+0x3b/0x125

Jan 5 09:37:13 Tower kernel: [] rcu_check_callbacks+0x76/0xa1

Jan 5 09:37:13 Tower kernel: [] update_process_times+0x2d/0x58

Jan 5 09:37:13 Tower kernel: [] tick_periodic+0x63/0x65

Jan 5 09:37:13 Tower kernel: [] tick_handle_periodic+0x19/0x6c

Jan 5 09:37:13 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a

Jan 5 09:37:13 Tower kernel: [] apic_timer_interrupt+0x2a/0x30

Jan 5 09:37:13 Tower kernel: [] ? sas_queuecommand+0x19c/0x1c4 [libsas]

Jan 5 09:37:13 Tower kernel: [] scsi_dispatch_cmd+0xfa/0x125

Jan 5 09:37:13 Tower kernel: [] scsi_request_fn+0x24a/0x365

Jan 5 09:37:13 Tower kernel: [] __blk_run_queue+0x14/0x16

Jan 5 09:37:13 Tower kernel: [] blk_run_queue+0x1b/0x2c

Jan 5 09:37:13 Tower kernel: [] scsi_run_queue+0xe4/0x151

Jan 5 09:37:13 Tower kernel: [] scsi_next_command+0x28/0x34

Jan 5 09:37:13 Tower kernel: [] scsi_end_request+0x66/0x70

Jan 5 09:37:13 Tower kernel: [] scsi_io_completion+0x1ae/0x3e5

Jan 5 09:37:13 Tower kernel: [] ? scsi_device_unbusy+0x7c/0x82

Jan 5 09:37:13 Tower kernel: [] scsi_finish_command+0x9d/0xa3

Jan 5 09:37:13 Tower kernel: [] scsi_softirq_done+0xba/0xc2

Jan 5 09:37:13 Tower kernel: [] blk_done_softirq+0x4a/0x57

Jan 5 09:37:13 Tower kernel: [] __do_softirq+0x6b/0xe5

Jan 5 09:37:13 Tower kernel: [] ? irq_enter+0x41/0x41

Jan 5 09:37:13 Tower kernel: [] ? irq_exit+0x32/0x58

Jan 5 09:37:13 Tower kernel: [] ? do_IRQ+0x7c/0x90

Jan 5 09:37:13 Tower kernel: [] ? common_interrupt+0x29/0x30

Jan 5 09:37:13 Tower kernel: [] ? memcmp+0x17/0x25

Jan 5 09:37:13 Tower kernel: [] ? handle_stripe+0xaa1/0xcfb [md_mod]

Jan 5 09:37:13 Tower kernel: [] ? unraidd+0x77/0xb8 [md_mod]

Jan 5 09:37:13 Tower kernel: [] ? md_thread+0xcc/0xe3 [md_mod]

Jan 5 09:37:13 Tower kernel: [] ? wake_up_bit+0x5b/0x5b

Jan 5 09:37:13 Tower kernel: [] ? import_device+0x147/0x147 [md_mod]

Jan 5 09:37:13 Tower kernel: [] ? kthread+0x67/0x6c

Jan 5 09:37:13 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49

Jan 5 09:37:13 Tower kernel: [] ? kernel_thread_helper+0x6/0xd

Jan 5 09:50:28 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies)

Jan 5 09:50:28 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1

Jan 5 09:50:28 Tower kernel: Call Trace:

Jan 5 09:50:28 Tower kernel: [] print_cpu_stall+0x59/0xd1

Jan 5 09:50:28 Tower kernel: [] __rcu_pending+0x3b/0x125

Jan 5 09:50:28 Tower kernel: [] rcu_check_callbacks+0x76/0xa1

Jan 5 09:50:28 Tower kernel: [] update_process_times+0x2d/0x58

Jan 5 09:50:28 Tower kernel: [] tick_periodic+0x63/0x65

Jan 5 09:50:28 Tower kernel: [] tick_handle_periodic+0x19/0x6c

Jan 5 09:50:28 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a

Jan 5 09:50:28 Tower kernel: [] apic_timer_interrupt+0x2a/0x30

Jan 5 09:50:28 Tower kernel: [] ? page_address+0x5/0x8f

Jan 5 09:50:28 Tower kernel: [] check_parity+0x6a/0xc3 [md_mod]

Jan 5 09:50:28 Tower kernel: [] handle_stripe+0xa7d/0xcfb [md_mod]

Jan 5 09:50:28 Tower kernel: [] unraidd+0x77/0xb8 [md_mod]

Jan 5 09:50:28 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod]

Jan 5 09:50:28 Tower kernel: [] ? wake_up_bit+0x5b/0x5b

Jan 5 09:50:28 Tower kernel: [] ? import_device+0x147/0x147 [md_mod]

Jan 5 09:50:28 Tower kernel: [] kthread+0x67/0x6c

Jan 5 09:50:28 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49

Jan 5 09:50:28 Tower kernel: [] kernel_thread_helper+0x6/0xd

Jan 5 10:18:49 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies)

Jan 5 10:18:49 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1

Jan 5 10:18:49 Tower kernel: Call Trace:

Jan 5 10:18:49 Tower kernel: [] print_cpu_stall+0x59/0xd1

Jan 5 10:18:49 Tower kernel: [] __rcu_pending+0x3b/0x125

Jan 5 10:18:49 Tower kernel: [] rcu_check_callbacks+0x76/0xa1

Jan 5 10:18:49 Tower kernel: [] update_process_times+0x2d/0x58

Jan 5 10:18:49 Tower kernel: [] tick_periodic+0x63/0x65

Jan 5 10:18:49 Tower kernel: [] tick_handle_periodic+0x19/0x6c

Jan 5 10:18:49 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a

Jan 5 10:18:49 Tower kernel: [] apic_timer_interrupt+0x2a/0x30

Jan 5 10:18:49 Tower kernel: [] ? xor_sse_5+0x29e/0x3d8 [xor]

Jan 5 10:18:49 Tower kernel: [] ? native_smp_send_reschedule+0x3f/0x41

Jan 5 10:18:49 Tower kernel: [] xor_blocks+0x66/0x71 [xor]

Jan 5 10:18:49 Tower kernel: [] ? xor_blocks+0x66/0x71 [xor]

Jan 5 10:18:49 Tower kernel: [] check_parity+0x8d/0xc3 [md_mod]

Jan 5 10:18:49 Tower kernel: [] handle_stripe+0xa7d/0xcfb [md_mod]

Jan 5 10:18:49 Tower kernel: [] unraidd+0x77/0xb8 [md_mod]

Jan 5 10:18:49 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod]

Jan 5 10:18:49 Tower kernel: [] ? wake_up_bit+0x5b/0x5b

Jan 5 10:18:49 Tower kernel: [] ? import_device+0x147/0x147 [md_mod]

Jan 5 10:18:49 Tower kernel: [] kthread+0x67/0x6c

Jan 5 10:18:49 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49

Jan 5 10:18:49 Tower kernel: [] kernel_thread_helper+0x6/0xd

Jan 5 10:45:39 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies)

Jan 5 10:45:39 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1

Jan 5 10:45:39 Tower kernel: Call Trace:

Jan 5 10:45:39 Tower kernel: [] print_cpu_stall+0x59/0xd1

Jan 5 10:45:39 Tower kernel: [] __rcu_pending+0x3b/0x125

Jan 5 10:45:39 Tower kernel: [] rcu_check_callbacks+0x76/0xa1

Jan 5 10:45:39 Tower kernel: [] update_process_times+0x2d/0x58

Jan 5 10:45:39 Tower kernel: [] tick_periodic+0x63/0x65

Jan 5 10:45:39 Tower kernel: [] tick_handle_periodic+0x19/0x6c

Jan 5 10:45:39 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a

Jan 5 10:45:39 Tower kernel: [] apic_timer_interrupt+0x2a/0x30

Jan 5 10:45:39 Tower kernel: [] ? memcmp+0x13/0x25

Jan 5 10:45:39 Tower kernel: [] handle_stripe+0xaa1/0xcfb [md_mod]

Jan 5 10:45:39 Tower kernel: [] unraidd+0x77/0xb8 [md_mod]

Jan 5 10:45:39 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod]

Jan 5 10:45:39 Tower kernel: [] ? wake_up_bit+0x5b/0x5b

Jan 5 10:45:39 Tower kernel: [] ? import_device+0x147/0x147 [md_mod]

Jan 5 10:45:39 Tower kernel: [] kthread+0x67/0x6c

Jan 5 10:45:39 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49

Jan 5 10:45:39 Tower kernel: [] kernel_thread_helper+0x6/0xd

Jan 5 10:49:10 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies)

Jan 5 10:49:10 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1

Jan 5 10:49:10 Tower kernel: Call Trace:

Jan 5 10:49:10 Tower kernel: [] print_cpu_stall+0x59/0xd1

Jan 5 10:49:10 Tower kernel: [] __rcu_pending+0x3b/0x125

Jan 5 10:49:10 Tower kernel: [] rcu_check_callbacks+0x76/0xa1

Jan 5 10:49:10 Tower kernel: [] update_process_times+0x2d/0x58

Jan 5 10:49:10 Tower kernel: [] tick_periodic+0x63/0x65

Jan 5 10:49:10 Tower kernel: [] tick_handle_periodic+0x19/0x6c

Jan 5 10:49:10 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a

Jan 5 10:49:10 Tower kernel: [] apic_timer_interrupt+0x2a/0x30

Jan 5 10:49:10 Tower kernel: [] ? _raw_spin_unlock_irqrestore+0x8/0xa

Jan 5 10:49:10 Tower kernel: [] blk_end_bidi_request+0x46/0x50

Jan 5 10:49:10 Tower kernel: [] blk_end_request+0xa/0xc

Jan 5 10:49:10 Tower kernel: [] scsi_end_request+0x1f/0x70

Jan 5 10:49:10 Tower kernel: [] scsi_io_completion+0x1ae/0x3e5

Jan 5 10:49:10 Tower kernel: [] ? scsi_device_unbusy+0x7c/0x82

Jan 5 10:49:10 Tower kernel: [] scsi_finish_command+0x9d/0xa3

Jan 5 10:49:10 Tower kernel: [] scsi_softirq_done+0xba/0xc2

Jan 5 10:49:10 Tower kernel: [] blk_done_softirq+0x4a/0x57

Jan 5 10:49:10 Tower kernel: [] __do_softirq+0x6b/0xe5

Jan 5 10:49:10 Tower kernel: [] ? irq_enter+0x41/0x41

Jan 5 10:49:10 Tower kernel: [] ? irq_exit+0x32/0x58

Jan 5 10:49:10 Tower kernel: [] ? do_IRQ+0x7c/0x90

Jan 5 10:49:10 Tower kernel: [] ? common_interrupt+0x29/0x30

Jan 5 10:49:10 Tower kernel: [] ? xor_blocks+0x39/0x71 [xor]

Jan 5 10:49:10 Tower kernel: [] ? check_parity+0xb3/0xc3 [md_mod]

Jan 5 10:49:10 Tower kernel: [] ? handle_stripe+0xa7d/0xcfb [md_mod]

Jan 5 10:49:10 Tower kernel: [] ? unraidd+0x77/0xb8 [md_mod]

Jan 5 10:49:10 Tower kernel: [] ? md_thread+0xcc/0xe3 [md_mod]

Jan 5 10:49:10 Tower kernel: [] ? wake_up_bit+0x5b/0x5b

Jan 5 10:49:10 Tower kernel: [] ? import_device+0x147/0x147 [md_mod]

Jan 5 10:49:10 Tower kernel: [] ? kthread+0x67/0x6c

Jan 5 10:49:10 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49

Jan 5 10:49:10 Tower kernel: [] ? kernel_thread_helper+0x6/0xd

Jan 5 10:57:01 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies)

Jan 5 10:57:01 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1

Link to comment

I am not overclocking or underclocking the CPU, and I already updated the BIOS to the latest.

 

However, after switching my supermicro controller card to a different PCI-E slot, not only did it fix the disk disconnection problem I was having during parity checks, it has also resolved the CPU stalling issue.

 

Not sure why this is the case, but after 2 weeks of continually receiving the above-posted messages about CPU stall, it has stopped after the controller card switch.

 

I have now put a different supermicro card back into the "bad" PCI-E slot, and all looks good, no CPU stall.

 

I will continue to monitor the issue, but it looks resolved now.

Link to comment

found this..

 

"The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback.

 

The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units. Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures"

 

from here http://en.wikipedia.org/wiki/Central_processing_unit

 

just really interested to know the cause as i did not know about this....  ::)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.