June 6, 201214 yr Edit: NOTE, I got these errors in previous builds. So this may not belong here. Hi guys, got an interesting one for you: As soon as I start a parity check I get a warning. Every single time after I have just booted. If I stop and start it again, you can clearly see it is fine. I've tested the hardware using IntelBurnTest in windows over 25 loops using 7GB RAM. The system is stable. Just to double check, I reduced all the frequencies and upped all the voltages and the log is the result of those changes. I.e. Not a bloody thing makes the error go away. So now I am stumped to what it is. No plugins running apart from UnMenu. I use the conservative module for power saving. I will disable that once the check has completed, (I suspect I will find no errors - always have done even with these errors coming up) Ideas/thoughts? Jun 6 18:20:43 Tower kernel: mdcmd (22): check CORRECT Jun 6 18:20:43 Tower kernel: md: recovery thread woken up ... Jun 6 18:20:43 Tower kernel: md: recovery thread checking parity... Jun 6 18:20:43 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. Jun 6 18:20:43 Tower kernel: ------------[ cut here ]------------ Jun 6 18:20:43 Tower kernel: WARNING: at kernel/workqueue.c:1224 worker_enter_idle+0xf8/0x104() Jun 6 18:20:43 Tower kernel: Hardware name: System Product Name Jun 6 18:20:43 Tower kernel: Modules linked in: cpufreq_conservative powernow_k8 mperf md_mod xor sg ahci libahci asus_atk0110 k10temp hwmon atiixp r8169 Jun 6 18:20:43 Tower kernel: Pid: 578, comm: kworker/1:1 Not tainted 3.0.33-unRAID #1 Jun 6 18:20:43 Tower kernel: Call Trace: Jun 6 18:20:43 Tower kernel: [<c1028c68>] warn_slowpath_common+0x65/0x7a Jun 6 18:20:43 Tower kernel: [<c10375f4>] ? worker_enter_idle+0xf8/0x104 Jun 6 18:20:43 Tower kernel: [<c1028c8c>] warn_slowpath_null+0xf/0x13 Jun 6 18:20:43 Tower kernel: [<c10375f4>] worker_enter_idle+0xf8/0x104 Jun 6 18:20:43 Tower kernel: [<c1039a08>] worker_thread+0x2a6/0x2c2 Jun 6 18:20:43 Tower kernel: [<c1039762>] ? rescuer_thread+0x1dc/0x1dc Jun 6 18:20:43 Tower kernel: [<c103c075>] kthread+0x62/0x67 Jun 6 18:20:43 Tower kernel: [<c103c013>] ? kthread_worker_fn+0x10a/0x10a Jun 6 18:20:43 Tower kernel: [<c1310176>] kernel_thread_helper+0x6/0xd Jun 6 18:20:43 Tower kernel: ---[ end trace b34d33af29ac3106 ]--- Jun 6 18:20:56 Tower kernel: mdcmd (23): nocheck Jun 6 18:20:56 Tower kernel: md: md_do_sync: got signal, exit... Jun 6 18:20:56 Tower kernel: md: recovery thread sync completion status: -4 Jun 6 18:21:05 Tower kernel: mdcmd (24): check CORRECT Jun 6 18:21:05 Tower kernel: md: recovery thread woken up ... Jun 6 18:21:05 Tower kernel: md: recovery thread checking parity... Jun 6 18:21:05 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. Jun 6 18:21:23 Tower logger: Wed Jun 6 18:19:22 BST 2012 - Hard Drives active, resetting counter Jun 6 18:21:39 Tower kernel: mdcmd (25): nocheck Jun 6 18:21:39 Tower kernel: md: md_do_sync: got signal, exit... Jun 6 18:21:39 Tower kernel: md: recovery thread sync completion status: -4 Jun 6 18:22:01 Tower in.telnetd[7432]: connect from 192.168.1.5 (192.168.1.5) Jun 6 18:22:01 Tower login[7433]: invalid password for 'username' on '/dev/pts/0' from 'Mac-Pro' Jun 6 18:22:02 Tower login[7433]: ROOT LOGIN on '/dev/pts/0' from 'Mac-Pro' Jun 6 18:22:23 Tower logger: Wed Jun 6 18:19:22 BST 2012 - Hard Drives active, resetting counter Jun 6 18:22:37 Tower in.telnetd[7631]: connect from 192.168.1.5 (192.168.1.5) Jun 6 18:22:38 Tower login[7632]: invalid password for 'username' on '/dev/pts/1' from 'Mac-Pro' Jun 6 18:22:39 Tower login[7632]: ROOT LOGIN on '/dev/pts/1' from 'Mac-Pro' Jun 6 18:23:14 Tower kernel: mdcmd (26): check CORRECT Jun 6 18:23:14 Tower kernel: md: recovery thread woken up ... Jun 6 18:23:14 Tower kernel: md: recovery thread checking parity... Jun 6 18:23:14 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. Jun 6 18:23:23 Tower logger: Wed Jun 6 18:19:22 BST 2012 - Hard Drives active, resetting counter Jun 6 18:23:40 Tower kernel: mdcmd (27): nocheck Jun 6 18:23:40 Tower kernel: md: md_do_sync: got signal, exit... Jun 6 18:23:40 Tower kernel: md: recovery thread sync completion status: -4 Jun 6 18:23:52 Tower kernel: mdcmd (28): check CORRECT Jun 6 18:23:52 Tower kernel: md: recovery thread woken up ... Jun 6 18:23:52 Tower kernel: md: recovery thread checking parity... Jun 6 18:23:52 Tower kernel: md: using 10000k window, over a total of 1953514552 blocks. syslog-2012-06-06.txt
June 6, 201214 yr Here's the code generating this warning: /* * Sanity check nr_running. Because trustee releases gcwq->lock * between setting %WORKER_ROGUE and zapping nr_running, the * warning may trigger spuriously. Check iff trustee is idle. */ WARN_ON_ONCE(gcwq->trustee_state == TRUSTEE_DONE && gcwq->nr_workers == gcwq->nr_idle && atomic_read(get_gcwq_nr_running(gcwq->cpu))); I have no clue I changed the subject of your message, let's see if anyone else gets this message.
June 6, 201214 yr Author Thanks. It does say there that it may be triggered spuriously, maybe my system has the right config for that to happen? I'll do more testing when I have time. I have a hunch it is to do with the power-saving and frequency changes, however that doesn't explain why it is OK after it has been run once. *shrug*
June 6, 201214 yr Thanks. It does say there that it may be triggered spuriously, maybe my system has the right config for that to happen? I'll do more testing when I have time. I have a hunch it is to do with the power-saving and frequency changes, however that doesn't explain why it is OK after it has been run once. *shrug* It only is reported once, may be happening a lot, and whether that's a problem or not, I don't know.
June 7, 201214 yr I examined your syslog and compared it with one from a month ago, and noted a number of things that seem unique to your system, perhaps you have forgotten you are loading them, or in some of the cases you would not know they were being loaded. Note: I keep a lot of syslogs around just for research and comparison, to isolate issues with different hardware and software versions. If anyone does not want me to keep their syslog, I will immediately delete it, but I find it very helpful in researching new issues. * Java jre6u27 is loading, could not find anyone else loading jre6; there are a few loading jre1.5.0_15 in connection with something called watchdog-webAccess in vmware * your system is loading asus_atk0110, could not find anyone else using it * you are loading cpufreq and powernow, could not find anyone else using it (I'm sure there are, but I don't have their syslogs, so that makes it at least somewhat rare) * your machine is the one that has the Realtek chipset that appears to crash and spew "eth0: link up" messages; I think you and only one other had that issue; it now loads the Realtek firmware patch - hopefully that fixed that issue? * not unique to you, but still relatively uncommon, you have turned on much of the Apple support; just mentioning it because the majority of users do not use it, and it does start a number of services that they don't run * as you mentioned, UnMENU is loading (not unique to you of course) For testing purposes, it might help Tom to turn off Java, cpufreq and powernow, UnMENU, and the Apple support, test then turn them back on one by one, testing again after each. I have no idea what the asus thing is, or how to remove it. If you need Java, you might look for an updated version, currently jre6u32. One last comment, the workerqueue problem occurs after the parity check starts, as you mentioned. In the current rc4 syslog, it occurs immediately (in less than a second) after the check starts. In the rc2 syslog of a month ago, it occurs exactly 5 minutes and 0 seconds after the check starts, which is a time that seems more than a little coincidental. I cannot help wondering if there is a process that runs precisely 5 minutes after the check starts. Edit: I did not find anyone else with the workerqueue issue.
June 7, 201214 yr * you are loading cpufreq and powernow, could not find anyone else using it (I'm sure there are, but I don't have their syslogs, so that makes it at least somewhat rare) I deliberately load acpi_cpufreq, with a modprobe in my go script and that calls for mperf ... I'm not sure whether this is the same thing.
June 7, 201214 yr PeterB- You get a chance to test out NFS yet & see if stale file handle issue is solved?
June 7, 201214 yr PeterB- You get a chance to test out NFS yet & see if stale file handle issue is solved? Hi Tom ... the stale handle problem has never been reproducible at will. All I can say is that I've not experienced a stale handle with RC4 ...... yet!
June 7, 201214 yr PeterB- You get a chance to test out NFS yet & see if stale file handle issue is solved? Hi Tom ... the stale handle problem has never been reproducible at will. All I can say is that I've not experienced a stale handle with RC4 ...... yet! Ok, I don't mean to divert this thread, so if you find anything, use one of the NFS issue threads please.
June 7, 201214 yr Author Java jre6u27 is loading, could not find anyone else loading jre6; there are a few loading jre1.5.0_15 in connection with something called watchdog-webAccess in vmware Java was disabled in the syslog posted above? Your system is loading asus_atk0110, could not find anyone else using it No idea about that one. I don't have anything to do with that. My mobo is an Asus M4A78LT-M however. You are loading cpufreq and powernow, could not find anyone else using it (I'm sure there are, but I don't have their syslogs, so that makes it at least somewhat rare) It is my next test to disable all my power saving features. Your machine is the one that has the Realtek chipset that appears to crash and spew "eth0: link up" messages; I think you and only one other had that issue; it now loads the Realtek firmware patch - hopefully that fixed that issue? Nope, that issue isn't fixed. I haven't had time to test the NIC properly, but when I did a quick test a while back I managed 70MB/sec to an HD via AFP to SMB, which is faster than the 50MB/sec I get when running to the server's cache disk via AFP. It never used to come up in b8 or something (can't remember) but once I have confirmed I can hammer the NIC at 100+MB/sec with no issues then I know it is a linux driver thing. Not unique to you, but still relatively uncommon, you have turned on much of the Apple support; just mentioning it because the majority of users do not use it, and it does start a number of services that they don't run *shrug* I will test with AFP disabled. As you mentioned, UnMENU is loading (not unique to you of course) I kept that running just due to the ease of which I can get the full syslog. I'll disable that as a last test.
June 7, 201214 yr Author I *think* i've found the issue. Edit: I tell a lie. It is to do with the conservative governor and power saving. If I disable that, it is fine.
June 7, 201214 yr Java jre6u27 is loading, could not find anyone else loading jre6; there are a few loading jre1.5.0_15 in connection with something called watchdog-webAccess in vmware Java was disabled in the syslog posted above? I apologize, this is what appeared in your syslog: Jun 6 18:19:14 Tower logger: Verifying package jre-6u27-i586-1.txz. Jun 6 18:19:19 Tower logger: Installing package jre-6u27-i586-1.txz: The actual details of its installation were not logged, but I assumed (wrongly?) that that was suppressed, running in a non-verbose mode. Guess I need to learn more about what the package installation options are. (And perhaps it should not say it is installing!)
June 7, 201214 yr Author Java jre6u27 is loading, could not find anyone else loading jre6; there are a few loading jre1.5.0_15 in connection with something called watchdog-webAccess in vmware Java was disabled in the syslog posted above? I apologize, this is what appeared in your syslog: Jun 6 18:19:14 Tower logger: Verifying package jre-6u27-i586-1.txz. Jun 6 18:19:19 Tower logger: Installing package jre-6u27-i586-1.txz: The actual details of its installation were not logged, but I assumed (wrongly?) that that was suppressed, running in a non-verbose mode. Guess I need to learn more about what the package installation options are. (And perhaps it should not say it is installing!) *shrug*. It is disabled now, that is for sure. I deleted it Anyway, I think I've found the issue. It is to do with the power states switching.
June 7, 201214 yr Author Yep. Found the cause: GO Script: modprobe powernow-k8 modprobe cpufreq_conservative echo conservative > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor echo conservative > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor echo conservative > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor echo conservative > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor Causes the issue. No idea why.
June 7, 201214 yr Yep. Found the cause. Please rename the thread to (solved). Im starting to feel like a broken record player but please in future disable all non standard scripts/plugins etc before posting in the RC thread, otherwise just post as is in general support.
June 7, 201214 yr Author Yep. Found the cause. Please rename the thread to (solved). Im starting to feel like a broken record player but please in future disable all non standard scripts/plugins etc before posting in the RC thread, otherwise just post as is in general support. Technically, it is not non-standard. The loaded modules are in the kernel. Given it doesn't seem to effect the server negatively I'm going to ignore it. A solution however, is still actually welcomed. However this may be more suited in general?
June 7, 201214 yr Any changes to the go file results in a non-standard configuration and is not supported by Lime. Please only post problems with stock unRAID distributions in the RC forum. Issues with non-standard configurations should be posted in the User Customizations forum or General. Perhaps a comment in the go file that says "DO NOT EDIT THIS FILE" would be helpful.
June 7, 201214 yr Any changes to the go file results in a non-standard configuration and is not supported by Lime. Please only post problems with stock unRAID distributions in the RC forum. Issues with non-standard configurations should be posted in the User Customizations forum or General. Perhaps a comment in the go file that says "DO NOT EDIT THIS FILE" would be helpful. It is intended to be modified... has been from the very beginning... In fact, to deal with users with various editors the config/go file is processed through fromdos before invoking a copy of it.... just in case a users used an ms-dos editor they are stripped out for them. There would be absolutely no reason to do that processing if it was never to be edited. So, even though there is no warning, it should not say "do not edit" Joe L.
June 8, 201214 yr Author Any changes to the go file results in a non-standard configuration and is not supported by Lime. Please only post problems with stock unRAID distributions in the RC forum. Issues with non-standard configurations should be posted in the User Customizations forum or General. Perhaps a comment in the go file that says "DO NOT EDIT THIS FILE" would be helpful. Sorry but that is a daft thing to say. The standard kernel loads different things on different computers, it is still the default config. Example: Realtek drivers and Intel NIC drivers. Just because I turn on the conservative governor rather than the performance governor (which I suspect is the default) does not mean it is non-standard. Plus, there is absolutely NOTHING I can do about this error apart from ignore it or turn it off, which for 30W I'll keep thanks. Either way, given this does not just relate to RC4, perhaps a mod could move it to a more general problem solving board? However, again, it is a kernel thing so *shrug*. People with more knowledge than me should comment
June 8, 201214 yr Perhaps my comment was a bit tongue-in-check but the premise is sound. The RC forum is for detecting problems with the standard distribution. E.g., Functions such as sleep and WOL are also not standard and not supported. And if any problems are detected these changes need to be removed to determine if the issue lies with the stock distribution or are a result of modification to the go file. Posting to the RC forum should only be done on systems that exhibit problems when using the stock version of the go file. Allowing personal, however innocuous, changes to the go file will devolve this forum to the status of General support. This efforts of this forum have at least been doubled due to people submitting issues that result from a modified go file. How about, "EDIT THIS FILE AT YOUR PERIL" and "ALL EDITS BELOW THIS LINE". All the edits can be placed in an if statement switched on the variable !DEBUG.
June 8, 201214 yr Author Perhaps my comment was a bit tongue-in-check but the premise is sound. The RC forum is for detecting problems with the standard distribution. E.g., Functions such as sleep and WOL are also not standard and not supported. And if any problems are detected these changes need to be removed to determine if the issue lies with the stock distribution or are a result of modification to the go file. Posting to the RC forum should only be done on systems that exhibit problems when using the stock version of the go file. Allowing personal, however innocuous, changes to the go file will devolve this forum to the status of General support. This efforts of this forum have at least been doubled due to people submitting issues that result from a modified go file. How about, "EDIT THIS FILE AT YOUR PERIL" and "ALL EDITS BELOW THIS LINE". All the edits can be placed in an if statement switched on the variable !DEBUG. If a mod can move this to the General Bug fixing forum then we can give it a bash. But I suspect it is a kernel thing, not something I or we can fix ourselves. Edit: Thanks! Now, does anybody have any idea why this is happening?
Archived
This topic is now archived and is closed to further replies.