rcu_sched self-detected stall on CPU

BillyJ · February 24, 2015

Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes.

I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window.

Only choice left is to login via IPMI and issue a power cycle.

Any ideas? It seems to only have happened on Beta14..

Thanks

Will

EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14.

Feb 24 19:16:03 server-ramford emhttp: shcmd (10418): /usr/local/sbin/mover |& logger &
Feb 24 19:16:03 server-ramford logger: mover started
Feb 24 19:16:03 server-ramford logger: moving "Music Videos"
Feb 24 19:16:03 server-ramford logger: ./Music Videos/METALLICA - Main feature.mp4
Feb 24 19:17:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1}  (t=6000 jiffies g=2821570 c=2821569 q=29868)
Feb 24 19:17:03 server-ramford kernel: Task dump for CPU 1:
Feb 24 19:17:03 server-ramford kernel: shfs            R  running task        0  2081      1 0x00000008
Feb 24 19:17:03 server-ramford kernel: 0000000000000000 ffff88041fc43da8 ffffffff8105e0b5 0000000000000001
Feb 24 19:17:03 server-ramford kernel: 0000000000000001 ffff88041fc43dc8 ffffffff81060780 0000000000000002
Feb 24 19:17:03 server-ramford kernel: ffffffff81834400 ffff88041fc43df8 ffffffff8107845f ffffffff81834400
Feb 24 19:17:03 server-ramford kernel: Call Trace:
Feb 24 19:17:03 server-ramford kernel: <IRQ>  [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e
Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80
Feb 24 19:17:03 server-ramford kernel: <EOI>  [<ffffffff8114c543>] ? __discard_prealloc+0x9f/0xb3
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19
Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17
Feb 24 19:20:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1}  (t=24003 jiffies g=2821570 c=2821569 q=112021)

jonp · February 24, 2015

Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes.

I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window.

Only choice left is to login via IPMI and issue a power cycle.

Any ideas? It seems to only have happened on Beta14..

Thanks

Will

EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14.

Feb 24 19:16:03 server-ramford emhttp: shcmd (10418): /usr/local/sbin/mover |& logger &
Feb 24 19:16:03 server-ramford logger: mover started
Feb 24 19:16:03 server-ramford logger: moving "Music Videos"
Feb 24 19:16:03 server-ramford logger: ./Music Videos/METALLICA - Main feature.mp4
Feb 24 19:17:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1}  (t=6000 jiffies g=2821570 c=2821569 q=29868)
Feb 24 19:17:03 server-ramford kernel: Task dump for CPU 1:
Feb 24 19:17:03 server-ramford kernel: shfs            R  running task        0  2081      1 0x00000008
Feb 24 19:17:03 server-ramford kernel: 0000000000000000 ffff88041fc43da8 ffffffff8105e0b5 0000000000000001
Feb 24 19:17:03 server-ramford kernel: 0000000000000001 ffff88041fc43dc8 ffffffff81060780 0000000000000002
Feb 24 19:17:03 server-ramford kernel: ffffffff81834400 ffff88041fc43df8 ffffffff8107845f ffffffff81834400
Feb 24 19:17:03 server-ramford kernel: Call Trace:
Feb 24 19:17:03 server-ramford kernel: <IRQ>  [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e
Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80
Feb 24 19:17:03 server-ramford kernel: <EOI>  [<ffffffff8114c543>] ? __discard_prealloc+0x9f/0xb3
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297
Feb 24 19:17:03 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce
Feb 24 19:17:03 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a
Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19
Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17
Feb 24 19:20:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1}  (t=24003 jiffies g=2821570 c=2821569 q=112021)

Please post a complete syslog for review.

BillyJ · February 24, 2015

http://lime-technology.com/forum/index.php?topic=37848.msg350080#msg350080

Seems to be similar to this but I don't use AFP or Time Machine.

Thank Jonp, syslog attached

syslog_copy.zip

BRiT · February 24, 2015

The work around everyone who has encountered this has done was to switch and convert all their array drives from RFS ReiserFS to XFS. Theres definitely some negative interactions between RFS and SHFS going on. If you seaech the forums you can roughly one new account of this per week. It is a long process to convert off RFS, but the good news is once you do you shouldnt see this issue. Smallwood82 has been stable well over 40 days after his conversion, before hand his system couldnt last more than a day or two.

jonp · February 24, 2015

If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more.

BillyJ · February 25, 2015

I've received another CPU stall whilst running the 14a update via the WebGUI. It seemed to be stuck on sync in the list of processes that upgrade went through.

A power cycle was the only thing that made it come good again.

Latest syslog attached.

syslog.txt

BRiT · February 25, 2015

If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more.

From what I recall all of the kernel dumps show reiserfs and shfs in the stack trace.

This is the first one I can recall posted where it didnt have shfs listed as the other process.

BillyJ · February 25, 2015

I now have data loss of items that had been on the array not just the cache disk. Syslog indicates the files that are missing had in fact been moved to the array at 1am.

Should i format my cache back to BTRFS? The problem had never occurred with that FS.

BRiT · February 25, 2015

I now have data loss of items that had been on the array not just the cache disk. Syslog indicates the files that are missing had in fact been moved to the array at 1am.

Should i format my cache back to BTRFS? The problem had never occurred with that FS.

It doesn't occur with a full XFS array filesystem either on the cache and data drives.

SmallwoodDR82 · February 25, 2015

If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more.

jonp, out of the users who have moved to 6.0 here are people who have reported it, with syslogs. It may not be a "ton" but I would guess it's a decent percentage of 6.0/forum users. Keep in mind it took me over a month just to finally snag a syslog and to try and figure out wth was going on. I would agree majority of users are on RFS and I would also say majority of users are still on 5.X.X. With that said, once the wave move to 6.0 be ready for many more of these. I would not just ignore this issue.

http://lime-technology.com/forum/index.php?topic=38370.msg356079#msg356079

http://lime-technology.com/forum/index.php?topic=38119.msg353003#msg353003

http://lime-technology.com/forum/index.php?topic=37311.msg344863#msg344863

http://lime-technology.com/forum/index.php?topic=38108.msg352782#msg352782

http://lime-technology.com/forum/index.php?topic=38019.msg351938#msg351938

http://lime-technology.com/forum/index.php?topic=37864.msg350200#msg350200

http://lime-technology.com/forum/index.php?topic=37316.msg344859#msg344859

http://lime-technology.com/forum/index.php?topic=37185.msg343742#msg343742

http://lime-technology.com/forum/index.php?topic=35788.msg339577#msg339577

http://lime-technology.com/forum/index.php?topic=37848.msg350080#msg350080

http://lime-technology.com/forum/index.php?topic=38409.msg356875#msg356875

http://lime-technology.com/forum/index.php?topic=38863.0

Edit: 2/26/2015 @ 3:42PM PST - Added 1 more case.

Edit: 3/21/2015 @ 8:22PM PST - Added 1 more case.

I hope this helps, and I hope this does not result in more data loss.

Good Luck!

jonp · February 26, 2015

If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more.

jonp, out of the users who have moved to 6.0 here are people who have reported it, with syslogs. It may not be a "ton" but I would guess it's a decent percentage of 6.0/forum users. Keep in mind it took me over a month just to finally snag a syslog and to try and figure out wth was going on. I would agree majority of users are on RFS and I would also say majority of users are still on 5.X.X. With that said, once the wave move to 6.0 be ready for many more of these. I would not just ignore this issue.

http://lime-technology.com/forum/index.php?topic=38370.msg356079#msg356079

http://lime-technology.com/forum/index.php?topic=38119.msg353003#msg353003

http://lime-technology.com/forum/index.php?topic=37311.msg344863#msg344863

http://lime-technology.com/forum/index.php?topic=38108.msg352782#msg352782

http://lime-technology.com/forum/index.php?topic=38019.msg351938#msg351938

http://lime-technology.com/forum/index.php?topic=37864.msg350200#msg350200

http://lime-technology.com/forum/index.php?topic=37316.msg344859#msg344859

I hope this helps, and I hope this does not result in more data loss.

Good Luck!

Definitely not ignoring it, just was saying I'm not necessarily convinced the root cause is from shfs and reiserfs.

Trust me, this is DEFINITELY a concern of mine, but need to find a way to replicate this. I've tested with reiser disks on a few different systems now, but haven't been able to cause it. Once I can recreate the error consistently, we are golden.

I need to compile a spreadsheet from these posts with as much information I can. Will be doing this later tonight. Trying to find the correlation is key.

SmallwoodDR82 · February 26, 2015

I will continue to add links to my original post. Hopefully that will help.

Also FYI, since moving all arrays to XFS i'm 50 plus days without a crash.

limetech · February 26, 2015

Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes.

I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window.

Only choice left is to login via IPMI and issue a power cycle.

Any ideas? It seems to only have happened on Beta14..

Thanks

Will

EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14.

Please start array in 'maintenance' mode, and click 'Check Filesystem' for each device. Turn up any corruptions?

BillyJ · February 26, 2015

Please start array in 'maintenance' mode, and click 'Check Filesystem' for each device. Turn up any corruptions?

reiserfsck -yq /dev/md1 2>&1
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md1
Will put log info to 'stdout'
###########
reiserfsck --check started at Thu Feb 26 13:07:15 2015
###########
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed

Is this the end? Or should I be waiting for more?

Webgui is frozen on multiple machines, logged in via telnet and can see the process reiserfsck working away... I'll wait.

ljm42 · February 26, 2015

I will continue to add links to my original post. Hopefully that will help.

I have a shorter list going over here:

http://lime-technology.com/forum/index.php?topic=37767.msg354452#msg354452

you found quite a few more than I did

Definitely not ignoring it, just was saying I'm not necessarily convinced the root cause is from shfs and reiserfs.

I manage two very similar systems... mine is based on the ASRock E3C226D2I and my sister's is based on the E3C224D2I. Mine did not have the crashing problems on later versions of 6, but hers did. Her system stabilized once I converted it from RFS to XFS.

These are the main differences between our systems:

Hers is in a small business environment and I think it was probably worked a bit harder with more simultaneous writes. It crashed a few times during the day under heavy use, but it also crashed at night when nobody was using it.
Hers has AFP enabled as they have a Mac in the office, mine does not.

Because her system is remote I wasn't able to do a lot of troubleshooting. I did convert the cache drive from BTRFS to XFS remotely, but that didn't help. So I drove up there with a pre-cleared drive and started moving data, thankfully once all the data drives were on XFS the problems went away.

Can you tell from the logs people provided if AFP and/or heavy use is a common theme?

SmallwoodDR82 · February 26, 2015

Mine does get heavy use during the day and light to no use at night.

I do not have AFP enabled.

BillyJ · February 27, 2015

Starting to tick me off a bit, so far I've moved 2 of my drives to XFS. Mover would have been moving the file to the new XFS disk.

Should I put my cache back to BTRFS? It's currently XFS but I have seen in another post Tom suggesting the recommend v6 setup is BTRFS cache and XFS array.

Feb 27 15:07:38 server-ramford login[763]: ROOT LOGIN  on '/dev/pts/5' from 'Medias-Mac-mini.willyweb.com.au'
Feb 27 16:00:01 server-ramford logger: mover started
Feb 27 16:00:01 server-ramford logger: moving "TV"
Feb 27 16:00:01 server-ramford logger: ./TV/Vikings/Season 3/Vikings - S03E02 - The Wanderer HDTV-720p.mkv
Feb 27 16:01:01 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 7}  (t=6000 jiffies g=431630 c=431629 q=110233)
Feb 27 16:01:01 server-ramford kernel: Task dump for CPU 7:
Feb 27 16:01:01 server-ramford kernel: shfs            R  running task        0  4528      1 0x00000008
Feb 27 16:01:01 server-ramford kernel: 0000000000000000 ffff88041fdc3da8 ffffffff8105e0b5 0000000000000007
Feb 27 16:01:01 server-ramford kernel: 0000000000000007 ffff88041fdc3dc8 ffffffff81060780 0000000000000080
Feb 27 16:01:01 server-ramford kernel: ffffffff81834400 ffff88041fdc3df8 ffffffff8107845f ffffffff81834400
Feb 27 16:01:01 server-ramford kernel: Call Trace:
Feb 27 16:01:01 server-ramford kernel: <IRQ>  [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64
Feb 27 16:01:01 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e
Feb 27 16:01:01 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80
Feb 27 16:01:01 server-ramford kernel: <EOI>  [<ffffffff8114c4bb>] ? __discard_prealloc+0x17/0xb3
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c
Feb 27 16:01:01 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297
Feb 27 16:01:01 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce
Feb 27 16:01:01 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a
Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19
Feb 27 16:01:01 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17

SmallwoodDR82 · February 27, 2015

@BillyJ

I had similar issues and you just have to fight through it. Continue moving your array drives to XFS. Once 100% XFS your issue "should" go away.

What I did was stopped my mover from running until the conversion was complete.

Just a thought.

Good luck!

BillyJ · March 5, 2015

Well I've moved 21TB of data to XFS and after 3 days not one CPU Stall.

SmallwoodDR82 · March 22, 2015

Added another case of CPU Stall.

BRiT · January 20, 2016

For the record, this was an issue in interaction between unraid parity checks and the linux kernel. It was resolved with setting CONFIG_PREEMPT to enable Preemptible Kernel (to address RCU timeout errors) in unRAID Version 6.0-beta 15.

Here is the announcement thread: http://lime-technology.com/forum/index.php?topic=39343.0

* Preemptible kernel. This should solve the 'RCU Timeout' errors seen by some h/w configurations. Though additional overhead comes with a preemptible kernel, the general "response" will be much smoother, especially within VM's. Some users may experience a slight decrease in parity sync/check times, let us know your results.

rcu_sched self-detected stall on CPU

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation