nick5429 Posted May 10, 2012 Share Posted May 10, 2012 I seem to have developed intermittent parity errors (ie, iterative non-correcting parity checks don't show all the same errors) I had an unclean shutdown a while back and it's possible I didn't let the parity recalc finish. That was stupid, but is somewhat separate from my issue. /var/log/syslog.1:May 1 00:00:01 nickserver kernel: md: recovery thread checking parity... /var/log/syslog.1:May 1 02:26:37 nickserver kernel: md: parity incorrect: 1542896200 <--- this one appears "real" /var/log/syslog.1:May 1 02:40:30 nickserver kernel: md: parity incorrect: 1669823784 the others are all phantom /var/log/syslog:May 9 10:38:38 nickserver kernel: md: recovery thread checking parity... /var/log/syslog:May 9 14:15:23 nickserver kernel: md: parity incorrect: 1542896200 /var/log/syslog:May 9 16:16:46 nickserver kernel: md: parity incorrect: 2290922496 /var/log/syslog:May 9 17:05:06 nickserver kernel: md: parity incorrect: 2558346984 /var/log/syslog:May 9 17:36:25 nickserver kernel: md: parity incorrect: 2676010000 /var/log/syslog:May 9 17:49:42 nickserver kernel: md: parity incorrect: 2740517552 /var/log/syslog:May 9 22:19:38 nickserver kernel: md: recovery thread checking parity... /var/log/syslog:May 9 22:44:34 nickserver kernel: md: parity incorrect: 281891928 /var/log/syslog:May 9 22:49:24 nickserver kernel: md: parity incorrect: 333552304 /var/log/syslog:May 10 00:08:53 nickserver kernel: md: parity incorrect: 1158023712 /var/log/syslog:May 10 00:49:43 nickserver kernel: md: parity incorrect: 1542896200 /var/log/syslog:May 10 00:50:10 nickserver kernel: md: parity incorrect: 1546624504 /var/log/syslog:May 10 00:50:49 nickserver kernel: md: parity incorrect: 1552673536 /var/log/syslog:May 10 02:31:30 nickserver kernel: md: parity incorrect: 2374474944 /var/log/syslog:May 10 08:17:10 nickserver kernel: md: recovery thread checking parity... /var/log/syslog:May 10 08:40:56 nickserver kernel: md: parity incorrect: 277107080 /var/log/syslog:May 10 10:43:11 nickserver kernel: md: parity incorrect: 1542896200 /var/log/syslog:May 10 10:57:54 nickserver kernel: md: parity incorrect: 1676424632 /var/log/syslog:May 10 12:12:00 nickserver kernel: md: parity incorrect: 2286487984 Configuration: unRAID 4.7 on a full Slackware (13.1?) installation All drives are SATA and connected directly to motherboard headers. I have 3 1.5TB drives and 2 2TB drives: Status Disk Mounted Device Model/Serial Temp Reads Writes Errors Size Used %Used Free OK parity /dev/sde 9VT1_5YD517KW 31°C 38725618 2774174 OK /dev/md1 /mnt/disk1 /dev/sda SAMSUNG_HD154UI_S1Y6J1KS744713 * 33225543 415589 1.50T 1.50T 100% 1.38M OK /dev/md2 /mnt/disk2 /dev/sdc 00Z_WD-WMAVU3394155 * 24331755 242123 1.50T 1.47T 99% 26.74G OK /dev/md3 /mnt/disk3 /dev/sdb SAMSUNG_HD154UI_S1Y6J1KS744712 * 30915273 381624 1.50T 915.74G 62% 584.52G OK /dev/md4 /mnt/disk4 /dev/sdd 00P_WD-WCAZAD107336 31°C 46038453 1765660 2.00T 488.22G 25% 1.51T Total: 6.50T 4.38T 67% 2.12T Apparently my system configuration has some sort of log rotation turned on, so my syslog [attached] doesn't show my last boot (~2 months ago). May 7 05:37:00 nickserver kernel: ------------[ cut here ]------------ May 7 05:37:00 nickserver kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf5/0x175() May 7 05:37:00 nickserver kernel: Hardware name: MS-7576 May 7 05:37:00 nickserver kernel: Modules linked in: md_mod xor dm_mod fglrx(P) [last unloaded: md_mod] May 7 05:37:00 nickserver kernel: Pid: 0, comm: swapper Tainted: P 2.6.32.9-unRAID #9 May 7 05:37:00 nickserver kernel: Call Trace: May 7 05:37:00 nickserver kernel: [<c102523f>] warn_slowpath_common+0x65/0x7c May 7 05:37:00 nickserver kernel: [<c12f838c>] ? dev_watchdog+0xf5/0x175 May 7 05:37:00 nickserver kernel: [<c102528a>] warn_slowpath_fmt+0x24/0x27 May 7 05:37:00 nickserver kernel: [<c12f838c>] dev_watchdog+0xf5/0x175 May 7 05:37:00 nickserver kernel: [<c1031c1b>] ? insert_work+0x41/0x49 May 7 05:37:00 nickserver kernel: [<c1031f4e>] ? __queue_work+0x2a/0x2f May 7 05:37:00 nickserver kernel: [<c102c983>] run_timer_softirq+0x112/0x166 May 7 05:37:00 nickserver kernel: [<c12f8297>] ? dev_watchdog+0x0/0x175 May 7 05:37:00 nickserver kernel: [<c1029269>] __do_softirq+0x79/0xee May 7 05:37:00 nickserver kernel: [<c1029304>] do_softirq+0x26/0x2b May 7 05:37:00 nickserver kernel: [<c10293e3>] irq_exit+0x29/0x2b May 7 05:37:00 nickserver kernel: [<c10126cf>] smp_apic_timer_interrupt+0x6f/0x7d May 7 05:37:00 nickserver kernel: [<c10031f6>] apic_timer_interrupt+0x2a/0x30 May 7 05:37:00 nickserver kernel: [<c100843f>] ? default_idle+0x2d/0x42 May 7 05:37:00 nickserver kernel: [<c100868c>] c1e_idle+0xcd/0xd2 May 7 05:37:00 nickserver kernel: [<c1001b66>] cpu_idle+0x3a/0x50 May 7 05:37:00 nickserver kernel: [<c1346fe7>] rest_init+0x53/0x55 May 7 05:37:00 nickserver kernel: [<c14fb79d>] start_kernel+0x27b/0x280 May 7 05:37:00 nickserver kernel: [<c14fb097>] i386_start_kernel+0x97/0x9e May 7 05:37:00 nickserver kernel: ---[ end trace 6f5f19d34dc73db0 ]--- Since a large portion of the identified "bad" blocks are >1500000000, my inclination is to think the issue lies with one of the 2tb drives (or hopefully the sata cables attaching to them). Smart reports attached. Aside: has anyone figured out a good way to determine which file a block maps to with reiserFS yet? ext2/3/4 has the 'debugfs' tool that can do it... Any thoughts other than 'replace the sata cables on the 2 2tb drives and try another non-correcting check'? smart.txt syslog.txt Quote Link to comment
nick5429 Posted July 23, 2012 Author Share Posted July 23, 2012 Interesting. So I replaced all the SATA cables to no avail. I modified the script under 'how to troubleshoot recurring parity errors' from here: http://lime-technology.com/wiki/index.php/FAQ#Hard_Drives as follows, to allow greater flexibility / better logging. #!/bin/bash LOG_DIR=/root/hashes DEVICE=sda #COUNT=10000000 #5GB COUNT=2000000 #1GB SKIP=0 #start MAX=2000000000 #end block -- make sure the drive is at least this big #MAX=1017926000 #start #COUNT=10000 #5MB TIMES=9 #for each stride, repeat this many times cd $LOG_DIR if [ $# -ne 1 ] then echo "Need 1 param, got $#" exit else DEVICE=$1 echo "Running hashes for device=$DEVICE skip=$SKIP count=$COUNT" fi INITIALRESULT="" RESULT="" while [ $SKIP -lt $MAX ]; do echo "Begin $DEVICE at block $SKIP size $COUNT." INITIALRESULT=`dd if=/dev/$DEVICE skip=$SKIP count=$COUNT | md5sum -b | awk '{print $1}'` echo "Block $SKIP: $INITIALRESULT initial" >> $DEVICE.log for i in `seq 1 $TIMES` do RESULT=`dd if=/dev/$DEVICE skip=$SKIP count=$COUNT | md5sum -b | awk '{print $1}'` echo "Block $SKIP: $RESULT" >> $DEVICE.log if [ "$RESULT" != "$INITIALRESULT" ]; then echo "!!!!ERRORERRORERROR Block $SKIP md5 $RESULT did not match expected $INITIALRESULT" echo "!!!!ERRORERRORERROR Block $SKIP md5 $RESULT did not match expected $INITIALRESULT" >> $DEVICE.log fi done let SKIP=$SKIP+$COUNT done exit The script uses "dd" to read the raw contents of the same section of the disk 10 times and computes the md5 sum of the data it read and compares it to the initial read. An md5 sum will be the same for the same input data. I ran this over the first 1.5TB of each of my disks in parallel 2-3 times, resulting in ~45TB of reads from each disk (plus a few subset runs). In all that, I found two data miscompares. Even reading these same block addresses with a higher count, I've not been able to reproduce this through anything but "luck". Both miscompares happened on sda, but I can't reliably determine anything from a sample size of 2. Block 430000000: 3dc262a182f0031f0c92f6e3cf4811b6 initial Block 430000000: 3dc262a182f0031f0c92f6e3cf4811b6 Block 430000000: 3dc262a182f0031f0c92f6e3cf4811b6 Block 430000000: 3dc262a182f0031f0c92f6e3cf4811b6 Block 430000000: 3d74be3d80e1f9710837e7149f299ef3 !!!!ERRORERRORERROR Block 430000000 md5 3d74be3d80e1f9710837e7149f299ef3 did not match expected 3dc262a182f0031f0c92f6e3cf4811b6 Block 430000000: 3dc262a182f0031f0c92f6e3cf4811b6 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 initial Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 271a8b560f4911a74aec07a135c399b5 !!!!ERRORERRORERROR Block 436000000 md5 271a8b560f4911a74aec07a135c399b5 did not match expected 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 Block 436000000: 0e4597dda7af49bb13b9c647503ca856 In both cases, you see that it returns to reading the 'correct' data after having read the error, so it's not like this was caused by a write to the disk in the middle of the process. Aside: I'm not entirely confident that the "block number" reported by the unraid parity checker correlates with the skip/count parameters to dd. Whatever read pattern the unraid parity checker does seems to hit this more consistently; I've done a number of noncorrecting parity checks in the course of testing this, and typically end up with 1-4 parity mismatches. In contrast, a full run of my script runs 10x as long, but only found 2 errors in 3 full runs. The errors above are both in the 430000000 area, though I've noticed no such pattern in the parity checks. Note, not all of these ran to completion. Jul 5 12:42:38 nickserver kernel: mdcmd (29): check NOCORRECT Jul 5 12:42:38 nickserver kernel: md: recovery thread woken up ... Jul 5 12:42:38 nickserver kernel: md: recovery thread checking parity... Jul 5 12:42:38 nickserver kernel: md: using 1152k window, over a total of 1953514552 blocks. Jul 5 14:31:51 nickserver kernel: md: parity incorrect: 1180695504 Jul 5 15:09:36 nickserver kernel: md: parity incorrect: 1542896200 Jul 5 15:16:50 nickserver kernel: md: parity incorrect: 1608572368 Jul 16 11:17:30 nickserver kernel: mdcmd (25): check NOCORRECT Jul 16 11:17:30 nickserver kernel: md: recovery thread woken up ... Jul 16 11:17:30 nickserver kernel: md: recovery thread checking parity... Jul 16 11:17:30 nickserver kernel: md: using 1152k window, over a total of 1953514552 blocks. Jul 16 11:36:18 nickserver kernel: md: parity incorrect: 218250952 Jul 16 11:56:39 nickserver kernel: md: parity incorrect: 420616808 Jul 16 16:50:29 nickserver kernel: mdcmd (30): check NOCORRECT Jul 16 16:50:29 nickserver kernel: md: recovery thread woken up ... Jul 16 16:50:29 nickserver kernel: md: recovery thread checking parity... Jul 16 16:50:29 nickserver kernel: md: using 1152k window, over a total of 1953514552 blocks. Jul 16 18:24:34 nickserver kernel: md: parity incorrect: 1017876024 Jul 18 10:58:00 nickserver kernel: mdcmd (29): check NOCORRECT Jul 18 10:58:00 nickserver kernel: md: recovery thread woken up ... Jul 18 10:58:00 nickserver kernel: md: recovery thread checking parity... Jul 18 10:58:00 nickserver kernel: md: using 1152k window, over a total of 1953514552 blocks. Jul 18 11:34:35 nickserver kernel: md: parity incorrect: 421167040 Jul 18 13:23:10 nickserver kernel: md: parity incorrect: 1535345568 Jul 18 13:38:38 nickserver kernel: md: parity incorrect: 1676847576 Jul 18 16:07:22 nickserver kernel: md: parity incorrect: 2740517552 All drives have completed an extended offline smart self-test without error. Currently attempting to reproduce the issue again. Any other thoughts? Ideas? Quote Link to comment
Joe L. Posted July 23, 2012 Share Posted July 23, 2012 Assuming you've ruled out a memory error, by running memtest on it for several cycles, and have ruled out the disk controller by trying another port, then odds are it is the disk itself and you might test the drive's suitability as a wheel-chock. That wheel-chock test involves placing the disk behind the wheel of your car and repeatedly driving over it, testing its ability to stop the car. If, after several dozen attempts to get it to stop the car, it still returns an rare but occasional MD5 error, get a bigger car. It will cause you far less hair-loss after going through those tests. (and, odds are, parity will consistently be failing once you are through flattening it) Joe L. (Oh yeah... it might not qualify for an RMA afterwords... but you'll feel better regardless ) Quote Link to comment
nick5429 Posted July 23, 2012 Author Share Posted July 23, 2012 Joe, Haha, thanks for the tips! I ran a brief memtest at the beginning of this, but you're right -- I do need to run a longer memtest check just to be sure. I'd been hoping to be able to reproduce the error more reliably before switching to and testing another controller lest I not prove anything to myself when I don't get an error. Though now that I'm typing this out, I realize that I do have a semi-reliable way to reproduce it: unRAID's built in parity check. That's what I get for trying to outsmart myself with all this fancy testing! Will report in a day or three after I've had a chance to run things through several cycles. Quote Link to comment
bcbgboy13 Posted July 24, 2012 Share Posted July 24, 2012 Nick, 1. There is a reason why the commercial servers use at least ECC memory and are on UPS. 2. URE (unrecoverable read errors) rating on the consumer level disks. I suspect every Unraid user transferring huge amount of data 24/7 like in your tests (this is not the usual way Unraid is used) will be bitten by the statistics regarding these two factors above. I cannot prove anything but is is something for you to keep in mind since you are looking for ideas. PS. Forgot to mention but you have some Samsung hard drives - some older one had a defective firmware and you should check if yours are the one affected - you can search even here as it was discussed at that time. Quote Link to comment
nick5429 Posted July 25, 2012 Author Share Posted July 25, 2012 While these are valid points which may potentially explain a problem like what I'm seeing... 1. There is a reason why the commercial servers use at least ECC memory and are on UPS. If the RAM were bad, it would show up in a memtest. Any memory error, ever, is indicative of a module that should be thrown out / RMA'd. 2. URE (unrecoverable read errors) rating on the consumer level disks. This would show up in the syslog / smart stats and again, indi I suspect every Unraid user transferring huge amount of data 24/7 like in your tests (this is not the usual way Unraid is used) will be bitten by the statistics regarding these two factors above.I highly doubt it, and the test I describe is quite similar to the stress from a standard parity check. If this were true, all users of unRAID would be seeing random parity errors when doing subsequent parity checks. This issue can easily cause silent data corruption, which--even on consumer hardware--is unacceptable and unexpected. PS. Forgot to mention but you have some Samsung hard drives - some older one had a defective firmware and you should check if yours are the one affected - you can search even here as it was discussed at that time. One of the Samsung drives is the drive I suspect of failures, but as far as I can tell, the firmware issue affects only the HD155UI and HD204UI drives; mine are HD154UI Anyway, an update: * 12 hour memtest shows no errors (still running) * The only other sata controller I have laying around is sil3132-based, which are apparently known to be flaky. I've got two of the Syba cards from monoprice (which seem to be one of the few 'recommended' brands and I think is why I bought them), but switching the suspect drive onto either of them results in a drastically increased number of parity mismatches. I'm curious to do a little more testing to see if switching a suspected-good drive onto the Syba cards has a similar effect (which will lead to me promptly discarding the cards...) Quote Link to comment
EddieA Posted July 25, 2012 Share Posted July 25, 2012 While these are valid points which may potentially explain a problem like what I'm seeing... 1. There is a reason why the commercial servers use at least ECC memory and are on UPS. If the RAM were bad, it would show up in a memtest. Any memory error, ever, is indicative of a module that should be thrown out / RMA'd. I wouldn't be so sure of that. I was getting a lot of single bit errors when calculating checksums using md5deep. I could run the same disk numerous times, and get different files failing each time. memtest+ ran for 48 hours without error, but throwing prime95 onto my system, and running the Blend test blew up within 5 minutes. Cheers. Quote Link to comment
nick5429 Posted July 27, 2012 Author Share Posted July 27, 2012 memtest+ ran for 48 hours without error, but throwing prime95 onto my system, and running the Blend test blew up within 5 minutes. ::jawdrop:: I almost wouldn't have believed you if I didn't just observe the same behavior. memtest+ stable for 36 hours, prime95 blend consistently fails within 15 minutes. prime95 'small' passes overnight (likely indicating, from the description in prime95, a 'problem with the memory or memory controller'). I'd always considered memtest the golden standard in memory subsystem stability. I guess not. My RAM was even running underclocked -- autodetected [email protected] vs the rated 1600 @ 1.5V. CPU, everything else running at stock/autodetected speeds. I increased the RAM voltage a bit (1.6ish) and left it at 1333; so far prime95 blend has been running for ~3 hours without errors. Quote Link to comment
dgaschk Posted July 27, 2012 Share Posted July 27, 2012 Where do you get prime95? Quote Link to comment
EddieA Posted July 27, 2012 Share Posted July 27, 2012 memtest+ ran for 48 hours without error, but throwing prime95 onto my system, and running the Blend test blew up within 5 minutes. ::jawdrop:: I almost wouldn't have believed you if I didn't just observe the same behavior. Neither did anyone on the hardware board I posted some questions on. I almost got laughed off it for suggesting that memtest didn't catch the errors. Luckily my memory was all Corsair, so I RMA'd the pair that I think were bad, after using prime95 to determine. Where do you get prime95? Google it. It's the top entry. Cheers. Quote Link to comment
nick5429 Posted July 31, 2012 Author Share Posted July 31, 2012 Bumped up the RAM voltage. After many hours of testing/confirming, the system is totally prime95 stable and passed 3 rounds of parity checks with no mismatches. I think I can finally call this closed, thanks for everyone's help / tips / suggestions! Quote Link to comment
jumperalex Posted August 1, 2012 Share Posted August 1, 2012 This would seem to imply that the current Unraid received wisdom of using memtest, indeed its inclusion in the official download, be reconsidered? GNex Tapatalk Quote Link to comment
kenoka Posted August 1, 2012 Share Posted August 1, 2012 This would seem to imply that the current Unraid received wisdom of using memtest, indeed its inclusion in the official download, be reconsidered? GNex Tapatalk I don't think so. It's still a good test. But now we have data to suggest that it's not 100% valid in all cases. So we'll need to remember that a flaky system that has passed memtest should be checked with Prime95. Quote Link to comment
nick5429 Posted August 1, 2012 Author Share Posted August 1, 2012 This would seem to imply that the current Unraid received wisdom of using memtest, indeed its inclusion in the official download, be reconsidered? GNex Tapatalk I don't think so. It's still a good test. But now we have data to suggest that it's not 100% valid in all cases. So we'll need to remember that a flaky system that has passed memtest should be checked with Prime95. Agreed. I think, as a grand overgeneralization: * Memtest is best at finding actual bad locations in the RAM, and it does a much more thorough job of testing all memory locations with a variety of access patterns. * Prime95 would be better at testing stability under load. It can't know which physical memory locations it's testing, it just does a lot of memory transactions under heavy CPU and memory load. This is more likely to find problems with things like overclocked or overheating RAM or RAM which isn't getting enough voltage. Quote Link to comment
Superorb Posted February 4, 2013 Share Posted February 4, 2013 Which version of prime95 do I download and how do I get it running on my unRAID server? I've been getting random parity mismatches and memtest returned 0 errors after 23 hours. Quote Link to comment
nick5429 Posted February 4, 2013 Author Share Posted February 4, 2013 Which version of prime95 do I download and how do I get it running on my unRAID server? The 32-bit Linux one wget ftp://mersenne.org/gimps/p95v279.linux32.tar.gz tar -xzvf p95v279.linux32.tar.gz ./mprime Quote Link to comment
Superorb Posted February 4, 2013 Share Posted February 4, 2013 Ok, I ran prime95 and the process was killed instantly. Syslog: Feb 4 17:09:43 unRAID login[4944]: ROOT LOGIN on `pts/0' from `Ken-Windows7' (Logins) Feb 4 17:11:11 unRAID kernel: mprime invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0 (Minor Issues) Feb 4 17:11:11 unRAID kernel: Pid: 4992, comm: mprime Not tainted 2.6.32.9-unRAID #8 (Errors) Feb 4 17:11:11 unRAID kernel: Call Trace: (Errors) Feb 4 17:11:11 unRAID kernel: [<c104ab61>] oom_kill_process+0x59/0x1cd (Errors) Feb 4 17:11:11 unRAID kernel: [<c104afb9>] __out_of_memory+0xef/0x102 (Errors) Feb 4 17:11:11 unRAID kernel: [<c104b02a>] out_of_memory+0x5e/0x83 (Errors) Feb 4 17:11:11 unRAID kernel: [<c104cfe9>] __alloc_pages_nodemask+0x375/0x42f (Errors) Feb 4 17:11:11 unRAID kernel: [<c1059686>] handle_mm_fault+0x254/0x8f1 (Errors) Feb 4 17:11:11 unRAID kernel: [<c129f124>] ? schedule+0x691/0x72f (Errors) Feb 4 17:11:11 unRAID kernel: [<c1017050>] do_page_fault+0x17c/0x1e4 (Errors) Feb 4 17:11:11 unRAID kernel: [<c1016ed4>] ? do_page_fault+0x0/0x1e4 (Errors) Feb 4 17:11:11 unRAID kernel: [<c12a07ce>] error_code+0x66/0x6c (Errors) Feb 4 17:11:11 unRAID kernel: [<c1016ed4>] ? do_page_fault+0x0/0x1e4 (Errors) Feb 4 17:11:11 unRAID kernel: Mem-Info: Feb 4 17:11:11 unRAID kernel: DMA per-cpu: Feb 4 17:11:11 unRAID kernel: CPU 0: hi: 0, btch: 1 usd: 0 Feb 4 17:11:11 unRAID kernel: Normal per-cpu: Feb 4 17:11:11 unRAID kernel: CPU 0: hi: 186, btch: 31 usd: 168 Feb 4 17:11:11 unRAID kernel: HighMem per-cpu: Feb 4 17:11:11 unRAID kernel: CPU 0: hi: 186, btch: 31 usd: 164 Feb 4 17:11:11 unRAID kernel: active_anon:354816 inactive_anon:74677 isolated_anon:0 Feb 4 17:11:11 unRAID kernel: active_file:8 inactive_file:14 isolated_file:0 Feb 4 17:11:11 unRAID kernel: unevictable:54391 dirty:0 writeback:0 unstable:0 Feb 4 17:11:11 unRAID kernel: free:12293 slab_reclaimable:851 slab_unreclaimable:1810 Feb 4 17:11:11 unRAID kernel: mapped:1521 shmem:24 pagetables:965 bounce:0 Feb 4 17:11:11 unRAID kernel: DMA free:7984kB min:64kB low:80kB high:96kB active_anon:5576kB inactive_anon:2304kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Feb 4 17:11:11 unRAID kernel: lowmem_reserve[]: 0 867 1982 1982 Feb 4 17:11:11 unRAID kernel: Normal free:40568kB min:3732kB low:4664kB high:5596kB active_anon:706968kB inactive_anon:60672kB active_file:24kB inactive_file:36kB unevictable:17460kB isolated(anon):0kB isolated(file):0kB present:887976kB mlocked:0kB dirty:0kB writeback:0kB mapped:188kB shmem:0kB slab_reclaimable:3404kB slab_unreclaimable:7240kB kernel_stack:616kB pagetables:1484kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:100 all_unreclaimable? no Feb 4 17:11:11 unRAID kernel: lowmem_reserve[]: 0 0 8924 8924 Feb 4 17:11:11 unRAID kernel: HighMem free:620kB min:512kB low:1712kB high:2912kB active_anon:706720kB inactive_anon:235732kB active_file:8kB inactive_file:20kB unevictable:200104kB isolated(anon):0kB isolated(file):0kB present:1142312kB mlocked:0kB dirty:0kB writeback:0kB mapped:5896kB shmem:96kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:133 all_unreclaimable? no Feb 4 17:11:11 unRAID kernel: lowmem_reserve[]: 0 0 0 0 Feb 4 17:11:11 unRAID kernel: DMA: 0*4kB 2*8kB 2*16kB 2*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 7984kB Feb 4 17:11:11 unRAID kernel: Normal: 104*4kB 109*8kB 73*16kB 29*32kB 23*64kB 7*128kB 12*256kB 16*512kB 5*1024kB 3*2048kB 3*4096kB = 40568kB Feb 4 17:11:11 unRAID kernel: HighMem: 17*4kB 5*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 620kB Feb 4 17:11:11 unRAID kernel: 54460 total pagecache pages Feb 4 17:11:11 unRAID kernel: 0 pages in swap cache Feb 4 17:11:11 unRAID kernel: Swap cache stats: add 0, delete 0, find 0/0 Feb 4 17:11:11 unRAID kernel: Free swap = 0kB Feb 4 17:11:11 unRAID kernel: Total swap = 0kB Feb 4 17:11:11 unRAID kernel: 515649 pages RAM Feb 4 17:11:11 unRAID kernel: 287827 pages HighMem Feb 4 17:11:11 unRAID kernel: 5487 pages reserved Feb 4 17:11:11 unRAID kernel: 4837 pages shared Feb 4 17:11:11 unRAID kernel: 495608 pages non-shared Feb 4 17:11:11 unRAID kernel: Out of memory: kill process 4990 (mprime) score 57106 or a child (Errors) Feb 4 17:11:11 unRAID kernel: Killed process 4990 (mprime) (Errors) Quote Link to comment
nick5429 Posted February 4, 2013 Author Share Posted February 4, 2013 You might have better luck starting a separate thread, since this one is marked as solved. Post the results of these two commands in your other thread: ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS free -m Also, a list of what plugins you're running. Quote Link to comment
Superorb Posted February 4, 2013 Share Posted February 4, 2013 You might have better luck starting a separate thread, since this one is marked as solved. Post the results of these two commands in your other thread: ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS free -m Also, a list of what plugins you're running. Here you go, I added it onto my ongoing thread. http://lime-technology.com/forum/index.php?topic=25672.msg224410#msg224410 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.