BillyJ Posted February 24, 2015 Share Posted February 24, 2015 Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes. I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window. Only choice left is to login via IPMI and issue a power cycle. Any ideas? It seems to only have happened on Beta14.. Thanks Will EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14. Feb 24 19:16:03 server-ramford emhttp: shcmd (10418): /usr/local/sbin/mover |& logger & Feb 24 19:16:03 server-ramford logger: mover started Feb 24 19:16:03 server-ramford logger: moving "Music Videos" Feb 24 19:16:03 server-ramford logger: ./Music Videos/METALLICA - Main feature.mp4 Feb 24 19:17:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1} (t=6000 jiffies g=2821570 c=2821569 q=29868) Feb 24 19:17:03 server-ramford kernel: Task dump for CPU 1: Feb 24 19:17:03 server-ramford kernel: shfs R running task 0 2081 1 0x00000008 Feb 24 19:17:03 server-ramford kernel: 0000000000000000 ffff88041fc43da8 ffffffff8105e0b5 0000000000000001 Feb 24 19:17:03 server-ramford kernel: 0000000000000001 ffff88041fc43dc8 ffffffff81060780 0000000000000002 Feb 24 19:17:03 server-ramford kernel: ffffffff81834400 ffff88041fc43df8 ffffffff8107845f ffffffff81834400 Feb 24 19:17:03 server-ramford kernel: Call Trace: Feb 24 19:17:03 server-ramford kernel: <IRQ> [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64 Feb 24 19:17:03 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80 Feb 24 19:17:03 server-ramford kernel: <EOI> [<ffffffff8114c543>] ? __discard_prealloc+0x9f/0xb3 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e Feb 24 19:17:03 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c Feb 24 19:17:03 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297 Feb 24 19:17:03 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce Feb 24 19:17:03 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19 Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17 Feb 24 19:20:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1} (t=24003 jiffies g=2821570 c=2821569 q=112021) Quote Link to comment
jonp Posted February 24, 2015 Share Posted February 24, 2015 Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes. I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window. Only choice left is to login via IPMI and issue a power cycle. Any ideas? It seems to only have happened on Beta14.. Thanks Will EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14. Feb 24 19:16:03 server-ramford emhttp: shcmd (10418): /usr/local/sbin/mover |& logger & Feb 24 19:16:03 server-ramford logger: mover started Feb 24 19:16:03 server-ramford logger: moving "Music Videos" Feb 24 19:16:03 server-ramford logger: ./Music Videos/METALLICA - Main feature.mp4 Feb 24 19:17:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1} (t=6000 jiffies g=2821570 c=2821569 q=29868) Feb 24 19:17:03 server-ramford kernel: Task dump for CPU 1: Feb 24 19:17:03 server-ramford kernel: shfs R running task 0 2081 1 0x00000008 Feb 24 19:17:03 server-ramford kernel: 0000000000000000 ffff88041fc43da8 ffffffff8105e0b5 0000000000000001 Feb 24 19:17:03 server-ramford kernel: 0000000000000001 ffff88041fc43dc8 ffffffff81060780 0000000000000002 Feb 24 19:17:03 server-ramford kernel: ffffffff81834400 ffff88041fc43df8 ffffffff8107845f ffffffff81834400 Feb 24 19:17:03 server-ramford kernel: Call Trace: Feb 24 19:17:03 server-ramford kernel: <IRQ> [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64 Feb 24 19:17:03 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80 Feb 24 19:17:03 server-ramford kernel: <EOI> [<ffffffff8114c543>] ? __discard_prealloc+0x9f/0xb3 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e Feb 24 19:17:03 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c Feb 24 19:17:03 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1 Feb 24 19:17:03 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297 Feb 24 19:17:03 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce Feb 24 19:17:03 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125 Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a Feb 24 19:17:03 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19 Feb 24 19:17:03 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17 Feb 24 19:20:03 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 1} (t=24003 jiffies g=2821570 c=2821569 q=112021) Please post a complete syslog for review. Quote Link to comment
BillyJ Posted February 24, 2015 Author Share Posted February 24, 2015 http://lime-technology.com/forum/index.php?topic=37848.msg350080#msg350080 Seems to be similar to this but I don't use AFP or Time Machine. Thank Jonp, syslog attached syslog_copy.zip Quote Link to comment
BRiT Posted February 24, 2015 Share Posted February 24, 2015 The work around everyone who has encountered this has done was to switch and convert all their array drives from RFS ReiserFS to XFS. Theres definitely some negative interactions between RFS and SHFS going on. If you seaech the forums you can roughly one new account of this per week. It is a long process to convert off RFS, but the good news is once you do you shouldnt see this issue. Smallwood82 has been stable well over 40 days after his conversion, before hand his system couldnt last more than a day or two. Quote Link to comment
jonp Posted February 24, 2015 Share Posted February 24, 2015 If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more. Quote Link to comment
BillyJ Posted February 25, 2015 Author Share Posted February 25, 2015 I've received another CPU stall whilst running the 14a update via the WebGUI. It seemed to be stuck on sync in the list of processes that upgrade went through. A power cycle was the only thing that made it come good again. Latest syslog attached. syslog.txt Quote Link to comment
BRiT Posted February 25, 2015 Share Posted February 25, 2015 If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more. From what I recall all of the kernel dumps show reiserfs and shfs in the stack trace. This is the first one I can recall posted where it didnt have shfs listed as the other process. Quote Link to comment
BillyJ Posted February 25, 2015 Author Share Posted February 25, 2015 I now have data loss of items that had been on the array not just the cache disk. Syslog indicates the files that are missing had in fact been moved to the array at 1am. Should i format my cache back to BTRFS? The problem had never occurred with that FS. Quote Link to comment
BRiT Posted February 25, 2015 Share Posted February 25, 2015 I now have data loss of items that had been on the array not just the cache disk. Syslog indicates the files that are missing had in fact been moved to the array at 1am. Should i format my cache back to BTRFS? The problem had never occurred with that FS. It doesn't occur with a full XFS array filesystem either on the cache and data drives. Quote Link to comment
SmallwoodDR82 Posted February 25, 2015 Share Posted February 25, 2015 If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more. jonp, out of the users who have moved to 6.0 here are people who have reported it, with syslogs. It may not be a "ton" but I would guess it's a decent percentage of 6.0/forum users. Keep in mind it took me over a month just to finally snag a syslog and to try and figure out wth was going on. I would agree majority of users are on RFS and I would also say majority of users are still on 5.X.X. With that said, once the wave move to 6.0 be ready for many more of these. I would not just ignore this issue. http://lime-technology.com/forum/index.php?topic=38370.msg356079#msg356079 http://lime-technology.com/forum/index.php?topic=38119.msg353003#msg353003 http://lime-technology.com/forum/index.php?topic=37311.msg344863#msg344863 http://lime-technology.com/forum/index.php?topic=38108.msg352782#msg352782 http://lime-technology.com/forum/index.php?topic=38019.msg351938#msg351938 http://lime-technology.com/forum/index.php?topic=37864.msg350200#msg350200 http://lime-technology.com/forum/index.php?topic=37316.msg344859#msg344859 http://lime-technology.com/forum/index.php?topic=37185.msg343742#msg343742 http://lime-technology.com/forum/index.php?topic=35788.msg339577#msg339577 http://lime-technology.com/forum/index.php?topic=37848.msg350080#msg350080 http://lime-technology.com/forum/index.php?topic=38409.msg356875#msg356875 http://lime-technology.com/forum/index.php?topic=38863.0 Edit: 2/26/2015 @ 3:42PM PST - Added 1 more case. Edit: 3/21/2015 @ 8:22PM PST - Added 1 more case. I hope this helps, and I hope this does not result in more data loss. Good Luck! Quote Link to comment
jonp Posted February 26, 2015 Share Posted February 26, 2015 If it was reiserfs / shfs, why wouldn't everyone or at least a ton more people be reporting this? The majority of users still have reiser disks. I'm not convinced this is a reiserfs / shfs issue just yet, but will be pulling together all the logs posted on CPU stalls to find out more. jonp, out of the users who have moved to 6.0 here are people who have reported it, with syslogs. It may not be a "ton" but I would guess it's a decent percentage of 6.0/forum users. Keep in mind it took me over a month just to finally snag a syslog and to try and figure out wth was going on. I would agree majority of users are on RFS and I would also say majority of users are still on 5.X.X. With that said, once the wave move to 6.0 be ready for many more of these. I would not just ignore this issue. http://lime-technology.com/forum/index.php?topic=38370.msg356079#msg356079 http://lime-technology.com/forum/index.php?topic=38119.msg353003#msg353003 http://lime-technology.com/forum/index.php?topic=37311.msg344863#msg344863 http://lime-technology.com/forum/index.php?topic=38108.msg352782#msg352782 http://lime-technology.com/forum/index.php?topic=38019.msg351938#msg351938 http://lime-technology.com/forum/index.php?topic=37864.msg350200#msg350200 http://lime-technology.com/forum/index.php?topic=37316.msg344859#msg344859 I hope this helps, and I hope this does not result in more data loss. Good Luck! Definitely not ignoring it, just was saying I'm not necessarily convinced the root cause is from shfs and reiserfs. Trust me, this is DEFINITELY a concern of mine, but need to find a way to replicate this. I've tested with reiser disks on a few different systems now, but haven't been able to cause it. Once I can recreate the error consistently, we are golden. I need to compile a spreadsheet from these posts with as much information I can. Will be doing this later tonight. Trying to find the correlation is key. Quote Link to comment
SmallwoodDR82 Posted February 26, 2015 Share Posted February 26, 2015 I will continue to add links to my original post. Hopefully that will help. Also FYI, since moving all arrays to XFS i'm 50 plus days without a crash. Quote Link to comment
limetech Posted February 26, 2015 Share Posted February 26, 2015 Hi Guys, Seems to only occur when i invoke Mover and this writes to the log every 3 minutes. I can Telnet into server but webgui is unresponsive, dockers that are running come and go i.e. Plex will be browsable but then die somewhere in the 3 minute window. Only choice left is to login via IPMI and issue a power cycle. Any ideas? It seems to only have happened on Beta14.. Thanks Will EDIT: I had just changed my cache from BTRFS to XFS after upgrade to B14. Please start array in 'maintenance' mode, and click 'Check Filesystem' for each device. Turn up any corruptions? Quote Link to comment
BillyJ Posted February 26, 2015 Author Share Posted February 26, 2015 Please start array in 'maintenance' mode, and click 'Check Filesystem' for each device. Turn up any corruptions? reiserfsck -yq /dev/md1 2>&1 reiserfsck 3.6.24 Will read-only check consistency of the filesystem on /dev/md1 Will put log info to 'stdout' ########### reiserfsck --check started at Thu Feb 26 13:07:15 2015 ########### Replaying journal: Done. Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed Is this the end? Or should I be waiting for more? Webgui is frozen on multiple machines, logged in via telnet and can see the process reiserfsck working away... I'll wait. Quote Link to comment
ljm42 Posted February 26, 2015 Share Posted February 26, 2015 I will continue to add links to my original post. Hopefully that will help. I have a shorter list going over here: http://lime-technology.com/forum/index.php?topic=37767.msg354452#msg354452 you found quite a few more than I did Definitely not ignoring it, just was saying I'm not necessarily convinced the root cause is from shfs and reiserfs. I manage two very similar systems... mine is based on the ASRock E3C226D2I and my sister's is based on the E3C224D2I. Mine did not have the crashing problems on later versions of 6, but hers did. Her system stabilized once I converted it from RFS to XFS. These are the main differences between our systems: Hers is in a small business environment and I think it was probably worked a bit harder with more simultaneous writes. It crashed a few times during the day under heavy use, but it also crashed at night when nobody was using it. Hers has AFP enabled as they have a Mac in the office, mine does not. Because her system is remote I wasn't able to do a lot of troubleshooting. I did convert the cache drive from BTRFS to XFS remotely, but that didn't help. So I drove up there with a pre-cleared drive and started moving data, thankfully once all the data drives were on XFS the problems went away. Can you tell from the logs people provided if AFP and/or heavy use is a common theme? Quote Link to comment
SmallwoodDR82 Posted February 26, 2015 Share Posted February 26, 2015 Mine does get heavy use during the day and light to no use at night. I do not have AFP enabled. Quote Link to comment
BillyJ Posted February 27, 2015 Author Share Posted February 27, 2015 Starting to tick me off a bit, so far I've moved 2 of my drives to XFS. Mover would have been moving the file to the new XFS disk. Should I put my cache back to BTRFS? It's currently XFS but I have seen in another post Tom suggesting the recommend v6 setup is BTRFS cache and XFS array. Feb 27 15:07:38 server-ramford login[763]: ROOT LOGIN on '/dev/pts/5' from 'Medias-Mac-mini.willyweb.com.au' Feb 27 16:00:01 server-ramford logger: mover started Feb 27 16:00:01 server-ramford logger: moving "TV" Feb 27 16:00:01 server-ramford logger: ./TV/Vikings/Season 3/Vikings - S03E02 - The Wanderer HDTV-720p.mkv Feb 27 16:01:01 server-ramford kernel: INFO: rcu_sched self-detected stall on CPU { 7} (t=6000 jiffies g=431630 c=431629 q=110233) Feb 27 16:01:01 server-ramford kernel: Task dump for CPU 7: Feb 27 16:01:01 server-ramford kernel: shfs R running task 0 4528 1 0x00000008 Feb 27 16:01:01 server-ramford kernel: 0000000000000000 ffff88041fdc3da8 ffffffff8105e0b5 0000000000000007 Feb 27 16:01:01 server-ramford kernel: 0000000000000007 ffff88041fdc3dc8 ffffffff81060780 0000000000000080 Feb 27 16:01:01 server-ramford kernel: ffffffff81834400 ffff88041fdc3df8 ffffffff8107845f ffffffff81834400 Feb 27 16:01:01 server-ramford kernel: Call Trace: Feb 27 16:01:01 server-ramford kernel: <IRQ> [<ffffffff8105e0b5>] sched_show_task+0xbe/0xc3 Feb 27 16:01:01 server-ramford kernel: [<ffffffff81060780>] dump_cpu_task+0x35/0x39 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107845f>] rcu_dump_cpu_stacks+0x6a/0x8c Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107acb5>] rcu_check_callbacks+0x1db/0x4f9 Feb 27 16:01:01 server-ramford kernel: [<ffffffff81088601>] ? tick_sched_handle+0x34/0x34 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107ca53>] update_process_times+0x3a/0x64 Feb 27 16:01:01 server-ramford kernel: [<ffffffff810885ff>] tick_sched_handle+0x32/0x34 Feb 27 16:01:01 server-ramford kernel: [<ffffffff81088638>] tick_sched_timer+0x37/0x61 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107cf9b>] __run_hrtimer.isra.29+0x57/0xb0 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8107d48a>] hrtimer_interrupt+0xd9/0x1c0 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8102f72e>] local_apic_timer_interrupt+0x50/0x54 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8102fb0b>] smp_apic_timer_interrupt+0x3c/0x4e Feb 27 16:01:01 server-ramford kernel: [<ffffffff815fdf7d>] apic_timer_interrupt+0x6d/0x80 Feb 27 16:01:01 server-ramford kernel: <EOI> [<ffffffff8114c4bb>] ? __discard_prealloc+0x17/0xb3 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8114c5bb>] reiserfs_discard_all_prealloc+0x44/0x4e Feb 27 16:01:01 server-ramford kernel: [<ffffffff81168de0>] do_journal_end+0x4e7/0xc78 Feb 27 16:01:01 server-ramford kernel: [<ffffffff81169ad0>] journal_end+0xae/0xb6 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8115a1c9>] reiserfs_dirty_inode+0x6c/0x7c Feb 27 16:01:01 server-ramford kernel: [<ffffffff810478ac>] ? ns_capable+0x3a/0x4f Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111611c>] __mark_inode_dirty+0x30/0x1e1 Feb 27 16:01:01 server-ramford kernel: [<ffffffff81155f57>] reiserfs_setattr+0x262/0x297 Feb 27 16:01:01 server-ramford kernel: [<ffffffff810f95ae>] ? __sb_start_write+0x9a/0xce Feb 27 16:01:01 server-ramford kernel: [<ffffffff81103706>] ? final_putname+0x30/0x34 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8110d343>] notify_change+0x1dc/0x2d0 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8110fecc>] ? __mnt_want_write+0x43/0x4a Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a350>] utimes_common+0x114/0x174 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a49b>] do_utimes+0xeb/0x125 Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a624>] SyS_futimesat+0x7f/0x9a Feb 27 16:01:01 server-ramford kernel: [<ffffffff8111a653>] SyS_utimes+0x14/0x19 Feb 27 16:01:01 server-ramford kernel: [<ffffffff815fd1a9>] system_call_fastpath+0x12/0x17 Quote Link to comment
SmallwoodDR82 Posted February 27, 2015 Share Posted February 27, 2015 @BillyJ I had similar issues and you just have to fight through it. Continue moving your array drives to XFS. Once 100% XFS your issue "should" go away. What I did was stopped my mover from running until the conversion was complete. Just a thought. Good luck! Quote Link to comment
BillyJ Posted March 5, 2015 Author Share Posted March 5, 2015 Well I've moved 21TB of data to XFS and after 3 days not one CPU Stall. Quote Link to comment
SmallwoodDR82 Posted March 22, 2015 Share Posted March 22, 2015 Added another case of CPU Stall. Quote Link to comment
BRiT Posted January 20, 2016 Share Posted January 20, 2016 For the record, this was an issue in interaction between unraid parity checks and the linux kernel. It was resolved with setting CONFIG_PREEMPT to enable Preemptible Kernel (to address RCU timeout errors) in unRAID Version 6.0-beta 15. Here is the announcement thread: http://lime-technology.com/forum/index.php?topic=39343.0 * Preemptible kernel. This should solve the 'RCU Timeout' errors seen by some h/w configurations. Though additional overhead comes with a preemptible kernel, the general "response" will be much smoother, especially within VM's. Some users may experience a slight decrease in parity sync/check times, let us know your results. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.