Glimmerman911 Posted January 6, 2013 Share Posted January 6, 2013 Not sure if this could cause any issues or not, my server repeatedly reports in my log file a CPU 3 stall: I am running and AMD A8 processor on an Asus F1A75-v Pro mobo. Log details: Jan 5 09:37:13 Tower kernel: Call Trace: Jan 5 09:37:13 Tower kernel: [] print_cpu_stall+0x59/0xd1 Jan 5 09:37:13 Tower kernel: [] __rcu_pending+0x3b/0x125 Jan 5 09:37:13 Tower kernel: [] rcu_check_callbacks+0x76/0xa1 Jan 5 09:37:13 Tower kernel: [] update_process_times+0x2d/0x58 Jan 5 09:37:13 Tower kernel: [] tick_periodic+0x63/0x65 Jan 5 09:37:13 Tower kernel: [] tick_handle_periodic+0x19/0x6c Jan 5 09:37:13 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a Jan 5 09:37:13 Tower kernel: [] apic_timer_interrupt+0x2a/0x30 Jan 5 09:37:13 Tower kernel: [] ? sas_queuecommand+0x19c/0x1c4 [libsas] Jan 5 09:37:13 Tower kernel: [] scsi_dispatch_cmd+0xfa/0x125 Jan 5 09:37:13 Tower kernel: [] scsi_request_fn+0x24a/0x365 Jan 5 09:37:13 Tower kernel: [] __blk_run_queue+0x14/0x16 Jan 5 09:37:13 Tower kernel: [] blk_run_queue+0x1b/0x2c Jan 5 09:37:13 Tower kernel: [] scsi_run_queue+0xe4/0x151 Jan 5 09:37:13 Tower kernel: [] scsi_next_command+0x28/0x34 Jan 5 09:37:13 Tower kernel: [] scsi_end_request+0x66/0x70 Jan 5 09:37:13 Tower kernel: [] scsi_io_completion+0x1ae/0x3e5 Jan 5 09:37:13 Tower kernel: [] ? scsi_device_unbusy+0x7c/0x82 Jan 5 09:37:13 Tower kernel: [] scsi_finish_command+0x9d/0xa3 Jan 5 09:37:13 Tower kernel: [] scsi_softirq_done+0xba/0xc2 Jan 5 09:37:13 Tower kernel: [] blk_done_softirq+0x4a/0x57 Jan 5 09:37:13 Tower kernel: [] __do_softirq+0x6b/0xe5 Jan 5 09:37:13 Tower kernel: [] ? irq_enter+0x41/0x41 Jan 5 09:37:13 Tower kernel: [] ? irq_exit+0x32/0x58 Jan 5 09:37:13 Tower kernel: [] ? do_IRQ+0x7c/0x90 Jan 5 09:37:13 Tower kernel: [] ? common_interrupt+0x29/0x30 Jan 5 09:37:13 Tower kernel: [] ? memcmp+0x17/0x25 Jan 5 09:37:13 Tower kernel: [] ? handle_stripe+0xaa1/0xcfb [md_mod] Jan 5 09:37:13 Tower kernel: [] ? unraidd+0x77/0xb8 [md_mod] Jan 5 09:37:13 Tower kernel: [] ? md_thread+0xcc/0xe3 [md_mod] Jan 5 09:37:13 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Jan 5 09:37:13 Tower kernel: [] ? import_device+0x147/0x147 [md_mod] Jan 5 09:37:13 Tower kernel: [] ? kthread+0x67/0x6c Jan 5 09:37:13 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49 Jan 5 09:37:13 Tower kernel: [] ? kernel_thread_helper+0x6/0xd Jan 5 09:50:28 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies) Jan 5 09:50:28 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1 Jan 5 09:50:28 Tower kernel: Call Trace: Jan 5 09:50:28 Tower kernel: [] print_cpu_stall+0x59/0xd1 Jan 5 09:50:28 Tower kernel: [] __rcu_pending+0x3b/0x125 Jan 5 09:50:28 Tower kernel: [] rcu_check_callbacks+0x76/0xa1 Jan 5 09:50:28 Tower kernel: [] update_process_times+0x2d/0x58 Jan 5 09:50:28 Tower kernel: [] tick_periodic+0x63/0x65 Jan 5 09:50:28 Tower kernel: [] tick_handle_periodic+0x19/0x6c Jan 5 09:50:28 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a Jan 5 09:50:28 Tower kernel: [] apic_timer_interrupt+0x2a/0x30 Jan 5 09:50:28 Tower kernel: [] ? page_address+0x5/0x8f Jan 5 09:50:28 Tower kernel: [] check_parity+0x6a/0xc3 [md_mod] Jan 5 09:50:28 Tower kernel: [] handle_stripe+0xa7d/0xcfb [md_mod] Jan 5 09:50:28 Tower kernel: [] unraidd+0x77/0xb8 [md_mod] Jan 5 09:50:28 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod] Jan 5 09:50:28 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Jan 5 09:50:28 Tower kernel: [] ? import_device+0x147/0x147 [md_mod] Jan 5 09:50:28 Tower kernel: [] kthread+0x67/0x6c Jan 5 09:50:28 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49 Jan 5 09:50:28 Tower kernel: [] kernel_thread_helper+0x6/0xd Jan 5 10:18:49 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies) Jan 5 10:18:49 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1 Jan 5 10:18:49 Tower kernel: Call Trace: Jan 5 10:18:49 Tower kernel: [] print_cpu_stall+0x59/0xd1 Jan 5 10:18:49 Tower kernel: [] __rcu_pending+0x3b/0x125 Jan 5 10:18:49 Tower kernel: [] rcu_check_callbacks+0x76/0xa1 Jan 5 10:18:49 Tower kernel: [] update_process_times+0x2d/0x58 Jan 5 10:18:49 Tower kernel: [] tick_periodic+0x63/0x65 Jan 5 10:18:49 Tower kernel: [] tick_handle_periodic+0x19/0x6c Jan 5 10:18:49 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a Jan 5 10:18:49 Tower kernel: [] apic_timer_interrupt+0x2a/0x30 Jan 5 10:18:49 Tower kernel: [] ? xor_sse_5+0x29e/0x3d8 [xor] Jan 5 10:18:49 Tower kernel: [] ? native_smp_send_reschedule+0x3f/0x41 Jan 5 10:18:49 Tower kernel: [] xor_blocks+0x66/0x71 [xor] Jan 5 10:18:49 Tower kernel: [] ? xor_blocks+0x66/0x71 [xor] Jan 5 10:18:49 Tower kernel: [] check_parity+0x8d/0xc3 [md_mod] Jan 5 10:18:49 Tower kernel: [] handle_stripe+0xa7d/0xcfb [md_mod] Jan 5 10:18:49 Tower kernel: [] unraidd+0x77/0xb8 [md_mod] Jan 5 10:18:49 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod] Jan 5 10:18:49 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Jan 5 10:18:49 Tower kernel: [] ? import_device+0x147/0x147 [md_mod] Jan 5 10:18:49 Tower kernel: [] kthread+0x67/0x6c Jan 5 10:18:49 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49 Jan 5 10:18:49 Tower kernel: [] kernel_thread_helper+0x6/0xd Jan 5 10:45:39 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies) Jan 5 10:45:39 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1 Jan 5 10:45:39 Tower kernel: Call Trace: Jan 5 10:45:39 Tower kernel: [] print_cpu_stall+0x59/0xd1 Jan 5 10:45:39 Tower kernel: [] __rcu_pending+0x3b/0x125 Jan 5 10:45:39 Tower kernel: [] rcu_check_callbacks+0x76/0xa1 Jan 5 10:45:39 Tower kernel: [] update_process_times+0x2d/0x58 Jan 5 10:45:39 Tower kernel: [] tick_periodic+0x63/0x65 Jan 5 10:45:39 Tower kernel: [] tick_handle_periodic+0x19/0x6c Jan 5 10:45:39 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a Jan 5 10:45:39 Tower kernel: [] apic_timer_interrupt+0x2a/0x30 Jan 5 10:45:39 Tower kernel: [] ? memcmp+0x13/0x25 Jan 5 10:45:39 Tower kernel: [] handle_stripe+0xaa1/0xcfb [md_mod] Jan 5 10:45:39 Tower kernel: [] unraidd+0x77/0xb8 [md_mod] Jan 5 10:45:39 Tower kernel: [] md_thread+0xcc/0xe3 [md_mod] Jan 5 10:45:39 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Jan 5 10:45:39 Tower kernel: [] ? import_device+0x147/0x147 [md_mod] Jan 5 10:45:39 Tower kernel: [] kthread+0x67/0x6c Jan 5 10:45:39 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49 Jan 5 10:45:39 Tower kernel: [] kernel_thread_helper+0x6/0xd Jan 5 10:49:10 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies) Jan 5 10:49:10 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1 Jan 5 10:49:10 Tower kernel: Call Trace: Jan 5 10:49:10 Tower kernel: [] print_cpu_stall+0x59/0xd1 Jan 5 10:49:10 Tower kernel: [] __rcu_pending+0x3b/0x125 Jan 5 10:49:10 Tower kernel: [] rcu_check_callbacks+0x76/0xa1 Jan 5 10:49:10 Tower kernel: [] update_process_times+0x2d/0x58 Jan 5 10:49:10 Tower kernel: [] tick_periodic+0x63/0x65 Jan 5 10:49:10 Tower kernel: [] tick_handle_periodic+0x19/0x6c Jan 5 10:49:10 Tower kernel: [] smp_apic_timer_interrupt+0x67/0x7a Jan 5 10:49:10 Tower kernel: [] apic_timer_interrupt+0x2a/0x30 Jan 5 10:49:10 Tower kernel: [] ? _raw_spin_unlock_irqrestore+0x8/0xa Jan 5 10:49:10 Tower kernel: [] blk_end_bidi_request+0x46/0x50 Jan 5 10:49:10 Tower kernel: [] blk_end_request+0xa/0xc Jan 5 10:49:10 Tower kernel: [] scsi_end_request+0x1f/0x70 Jan 5 10:49:10 Tower kernel: [] scsi_io_completion+0x1ae/0x3e5 Jan 5 10:49:10 Tower kernel: [] ? scsi_device_unbusy+0x7c/0x82 Jan 5 10:49:10 Tower kernel: [] scsi_finish_command+0x9d/0xa3 Jan 5 10:49:10 Tower kernel: [] scsi_softirq_done+0xba/0xc2 Jan 5 10:49:10 Tower kernel: [] blk_done_softirq+0x4a/0x57 Jan 5 10:49:10 Tower kernel: [] __do_softirq+0x6b/0xe5 Jan 5 10:49:10 Tower kernel: [] ? irq_enter+0x41/0x41 Jan 5 10:49:10 Tower kernel: [] ? irq_exit+0x32/0x58 Jan 5 10:49:10 Tower kernel: [] ? do_IRQ+0x7c/0x90 Jan 5 10:49:10 Tower kernel: [] ? common_interrupt+0x29/0x30 Jan 5 10:49:10 Tower kernel: [] ? xor_blocks+0x39/0x71 [xor] Jan 5 10:49:10 Tower kernel: [] ? check_parity+0xb3/0xc3 [md_mod] Jan 5 10:49:10 Tower kernel: [] ? handle_stripe+0xa7d/0xcfb [md_mod] Jan 5 10:49:10 Tower kernel: [] ? unraidd+0x77/0xb8 [md_mod] Jan 5 10:49:10 Tower kernel: [] ? md_thread+0xcc/0xe3 [md_mod] Jan 5 10:49:10 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Jan 5 10:49:10 Tower kernel: [] ? import_device+0x147/0x147 [md_mod] Jan 5 10:49:10 Tower kernel: [] ? kthread+0x67/0x6c Jan 5 10:49:10 Tower kernel: [] ? kthread_freezable_should_stop+0x49/0x49 Jan 5 10:49:10 Tower kernel: [] ? kernel_thread_helper+0x6/0xd Jan 5 10:57:01 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=6000 jiffies) Jan 5 10:57:01 Tower kernel: Pid: 1550, comm: unraidd Tainted: G O 3.4.11-unRAID #1 Quote Link to comment
abs0lut.zer0 Posted January 7, 2013 Share Posted January 7, 2013 does this mean that one of the cores is bad.. ? first time i heard of that Quote Link to comment
bcbgboy13 Posted January 7, 2013 Share Posted January 7, 2013 The motherboard has newer BIOS available (from 2012.11.01). Try with this one. Quote Link to comment
lionelhutz Posted January 7, 2013 Share Posted January 7, 2013 Did you do any overclocking or undervolting or CPU core unlocking or any other such thing on this system? Quote Link to comment
Glimmerman911 Posted January 10, 2013 Author Share Posted January 10, 2013 I am not overclocking or underclocking the CPU, and I already updated the BIOS to the latest. However, after switching my supermicro controller card to a different PCI-E slot, not only did it fix the disk disconnection problem I was having during parity checks, it has also resolved the CPU stalling issue. Not sure why this is the case, but after 2 weeks of continually receiving the above-posted messages about CPU stall, it has stopped after the controller card switch. I have now put a different supermicro card back into the "bad" PCI-E slot, and all looks good, no CPU stall. I will continue to monitor the issue, but it looks resolved now. Quote Link to comment
Glimmerman911 Posted January 10, 2013 Author Share Posted January 10, 2013 I spoke too soon, the same message is repeating in my log during parity sync, though it does seem less frequent. What next steps should I look at regarding this CPU info message, if anything? Memtest passed 24 hours of testing error free, and as I mentioned, I am running latest firmware on my motherboard. Quote Link to comment
dgaschk Posted January 11, 2013 Share Posted January 11, 2013 The particular slot may not be compatible with the card. This could be a defect, bug, or just a limitation of the design. You may need to consider a different MB. Quote Link to comment
Glimmerman911 Posted January 11, 2013 Author Share Posted January 11, 2013 The controller card issue is resolved, but the CPU stall issue (if it is an issue, it is just info in my log) persists. Quote Link to comment
Glimmerman911 Posted January 11, 2013 Author Share Posted January 11, 2013 Update: I just upgraded a hard drive so went through the rebuild process with the log open, and no CPU stall. The CPU stall seems to only occur when running a parity check, then it happens every 45 minutes or so. Quote Link to comment
abs0lut.zer0 Posted January 11, 2013 Share Posted January 11, 2013 found this.. "The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback. The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units. Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures" from here http://en.wikipedia.org/wiki/Central_processing_unit just really interested to know the cause as i did not know about this.... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.