al_uk Posted March 3, 2017 Share Posted March 3, 2017 On 3/1/2017 at 11:49 PM, RobJ said: You are loading a huge amount of stuff, some of them clear memory hogs, like multiple java apps and Plex and more. You would thing that with 64GB you should not be having memory issues, but they started as you said about 24 hours after booting. At that point, page allocations were numerous but very slow, ranging from 10 to 20 seconds per process, to some time later 10 to 50 seconds, an obvious latency issue. I believe they were because of garbage collection efforts as the memory filled up, the attempt to reorganize memory chunks to satisfy the requests. They clearly were averaging longer and longer, so it was only a matter of time before they were going to fail to satisfy an allocation request. The OOM (Out Of Memory) was the final straw. While the system did carry on valiantly for quite awhile, you probably should have rebooted when the allocation issues first began. My guess is, you have something with a serious memory leak. I can't say what it is, prime suspects would be java itself, a java app, makemkv, a Plex component, or a corrupted btrfs causing this. I would look for updates for all apps, then check the disk file systems on all drives formatted with BTRFS. If at all possible, stop loading anything you aren't actually using. For example, do you really need so many of the NerdPack packages? If the problem continues, and I suspect it will, you will have to run without selected apps, trying different combinations, and figure out which apps are using up the memory. You do have Cadvisor, perhaps it could be used to monitor all resource usage, see what is growing too large, and never shrinks back. Right now, the java processes are enormous, and there are a number of them, could be suspect. This is a support issue, will probably be moved to the support board. I closed down all VMs and dockers and run another diagnostics which is attached. Yesterday I rebooted and did not start Crashplan or Dropbox. Today, about 12 hours after reboot, my problems started again. Mar 3 12:08:51 Tower kernel: btrfs-transacti: page allocation stalls for 11007ms, order:2, mode:0x2404040(GFP_NOFS|__GFP_COMP) Mar 3 12:08:51 Tower kernel: CPU: 9 PID: 14971 Comm: btrfs-transacti Not tainted 4.9.10-unRAID #1 I want to go back to 6.2.4. I tried this yesterday by swapping out the bz files on the flash. 6.2.4 booted ok, but my dockers did not start, and only one of 1 VMs showed up. The settings/dockers page said I needed to recreate my docker image, libvirt.log showed the following. 3+0000: 16742: info : libvirt version: 1.3.1 2017-03-02 23:52:46.633+0000: 16742: info : hostname: Tower 2017-03-02 23:52:46.633+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.633+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.634+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.634+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.635+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.635+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id How do I go back to 6.2.4 and fix the VMs and dockers? Thanks tower-diagnostics-20170302-2313.zip Quote Link to comment
richardsim7 Posted March 4, 2017 Share Posted March 4, 2017 On 02/03/2017 at 8:17 PM, richardsim7 said: I did a quick search but couldn't find anything: I upgraded from 6.2.4 to 6.3.2, and now my Windows 10 VM won't boot. SeaBIOS just says "No bootable device" Any ideas? nas-diagnostics-20170302-2017.zip Rolled back to 6.2.4 and the VM boots again. Any ideas why 6.3.2 isn't working? Quote Link to comment
thither Posted March 4, 2017 Share Posted March 4, 2017 After upgrading from 6.3.1 I'm seeing some odd behavior where I can boot into GUI mode, but when I try to boot into regular (OS) mode the server freezes after it loads /bzImage and is not pingable. I run headless most of the time with my monitor plugged into a different GUI card for VMs, so this isn't ideal. Anything I can try to diagnose this? Quote Link to comment
itimpi Posted March 4, 2017 Share Posted March 4, 2017 57 minutes ago, thither said: After upgrading from 6.3.1 I'm seeing some odd behavior where I can boot into GUI mode, but when I try to boot into regular (OS) mode the server freezes after it loads /bzImage and is not pingable. I run headless most of the time with my monitor plugged into a different GUI card for VMs, so this isn't ideal. Anything I can try to diagnose this? Quite a few people have reported that! In most cases it seems to occur for SuperMicro motherboards - what do you have? Quote Link to comment
thither Posted March 4, 2017 Share Posted March 4, 2017 44 minutes ago, itimpi said: Quite a few people have reported that! In most cases it seems to occur for SuperMicro motherboards - what do you have? I've got an ASRock Z170 Extreme+ - this one. Quote Link to comment
al_uk Posted March 4, 2017 Share Posted March 4, 2017 On 03/03/2017 at 6:06 PM, al_uk said: I closed down all VMs and dockers and run another diagnostics which is attached. Yesterday I rebooted and did not start Crashplan or Dropbox. Today, about 12 hours after reboot, my problems started again. Mar 3 12:08:51 Tower kernel: btrfs-transacti: page allocation stalls for 11007ms, order:2, mode:0x2404040(GFP_NOFS|__GFP_COMP) Mar 3 12:08:51 Tower kernel: CPU: 9 PID: 14971 Comm: btrfs-transacti Not tainted 4.9.10-unRAID #1 I want to go back to 6.2.4. I tried this yesterday by swapping out the bz files on the flash. 6.2.4 booted ok, but my dockers did not start, and only one of 1 VMs showed up. The settings/dockers page said I needed to recreate my docker image, libvirt.log showed the following. 3+0000: 16742: info : libvirt version: 1.3.1 2017-03-02 23:52:46.633+0000: 16742: info : hostname: Tower 2017-03-02 23:52:46.633+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.633+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.634+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.634+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.635+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id 2017-03-02 23:52:46.635+0000: 16742: error : virDomainDefParseXML:15455 : unsupported configuration: unsupported HyperV Enlightenment feature: vendor_id How do I go back to 6.2.4 and fix the VMs and dockers? Thanks tower-diagnostics-20170302-2313.zip Today I tried just having the VMs powered up. No dockers were started. within 8 hours I was getting "tainted" problems again, Any suggestions on what to try next? I have started a separate thread on how to roll back to 6.2.4. Quote Link to comment
hgeorges Posted March 5, 2017 Share Posted March 5, 2017 Hi, Background: I upgraded yesterday to the new 6.3.2 version (and upgraded to a new motherboard - x10SLL-F-O - in the same time) apparently w/o issues (Thank you!). However, after a short time I started receiving plugin errors (missing files, and an endless loop from a tenacious plugin wanting to send statistics from my system). Reading also in your note: Limetech quote: Plugin Authors: as posted earlier, your plugin may not function properly depending how how POST requests are handled, see: http://lime-technology.com/forum/index.php?topic=55986.0 Plugin Users: please post issues you find in the appropriate Plugin Support topic. And to reiterate: true plugins (not Docker containers) run as the root user and have full access to everything on your server: Install 3rd party plugins at your own risk. End Limetech quote. I have decided to remove all the plugins from my server - as I don't have time to hunt for compatibility issues and resolve strange behaviors. I'm running now the barebone system, and want to use docker or VMs for additional functionality. Now here is my question: regarding plugins - I'm assuming there are no limetech endorsed and versions controlled plugins? Is that right? Among those I had installed were a few which made sense to have close by (all were strictly tools, one to create and verify checksums, another adding an expanded tool set, etc) - perhaps you can create limetech optional add-on packages which make sense to run as root, and are safe to run (version controlled). Please comment. Thanks again. Quote Link to comment
sambo Posted March 7, 2017 Share Posted March 7, 2017 Hello, Is it possible to add on next release the spinup information on dmesg like we have for spindown ? Thanks for all this good work! Quote Link to comment
itimpi Posted March 8, 2017 Share Posted March 8, 2017 12 hours ago, sambo said: Hello, Is it possible to add on next release the spinup information on dmesg like we have for spindown ? Thanks for all this good work! Although I would like to see such information I suspect it is not available. I think the Spindown messages relate to specific events within unRAID, while the Spinups are likely to happen automatically when an access is made to the drive (without an explicit Spinup command being issued). The closest I could see is adding a message to the log on the periodic drive checks when the Spin state is found to be different to the last one logged. Although this may mean the log message is delayed from the actual event happening it would still be useful information. 1 Quote Link to comment
sambo Posted March 8, 2017 Share Posted March 8, 2017 Since unraid is able to spindown disk after some inactivity time, i think it shoud be possible, atleast i hope Quote Link to comment
JonUKRed Posted March 8, 2017 Share Posted March 8, 2017 On 04/03/2017 at 7:04 PM, thither said: After upgrading from 6.3.1 I'm seeing some odd behavior where I can boot into GUI mode, but when I try to boot into regular (OS) mode the server freezes after it loads /bzImage and is not pingable. I run headless most of the time with my monitor plugged into a different GUI card for VMs, so this isn't ideal. Anything I can try to diagnose this? On 04/03/2017 at 8:02 PM, itimpi said: Quite a few people have reported that! In most cases it seems to occur for SuperMicro motherboards - what do you have? This is happening to me also after upgrade, only able to boot to GUI mode. Again, not ideal as I also run headless. MB is ASRock Z270 Pro4. Quote Link to comment
limetech Posted March 8, 2017 Author Share Posted March 8, 2017 9 minutes ago, JonUKRed said: This is happening to me also after upgrade, only able to boot to GUI mode. Again, not ideal as I also run headless. MB is ASRock Z270 Pro4. Please confirm for me no corruption of the 'bzroot' file has occurred. From console or telnet/ssh please type this: md5sum /boot/bzroot Should return this for 6.3.2 release: c1a14a522656426fb9e20b66a5968d1a /boot/bzroot Quote Link to comment
JonUKRed Posted March 8, 2017 Share Posted March 8, 2017 2 minutes ago, limetech said: Please confirm for me no corruption of the 'bzroot' file has occurred. From console or telnet/ssh please type this: md5sum /boot/bzroot Should return this for 6.3.2 release: c1a14a522656426fb9e20b66a5968d1a /boot/bzroot Hi there. Yes, it does indeed return the above from. root@IronCloud:~# md5sum /boot/bzroot c1a14a522656426fb9e20b66a5968d1a /boot/bzroot Thanks, Jon Quote Link to comment
limetech Posted March 8, 2017 Author Share Posted March 8, 2017 I guess for completeness can check 'em all: 5a4d270d192c0573bb78af92220e149b bzimage c1a14a522656426fb9e20b66a5968d1a bzroot f65c0917efe04edf5b91528c3c7eb1d1 bzroot-gui Quote Link to comment
JonUKRed Posted March 8, 2017 Share Posted March 8, 2017 (edited) 4 minutes ago, limetech said: I guess for completeness can check 'em all: 5a4d270d192c0573bb78af92220e149b bzimage c1a14a522656426fb9e20b66a5968d1a bzroot f65c0917efe04edf5b91528c3c7eb1d1 bzroot-gui root@IronCloud:~# md5sum /boot/bzroot c1a14a522656426fb9e20b66a5968d1a /boot/bzroot root@IronCloud:~# md5sum /boot/bzimage 5a4d270d192c0573bb78af92220e149b /boot/bzimage root@IronCloud:~# md5sum /boot/bzroot-gui f65c0917efe04edf5b91528c3c7eb1d1 /boot/bzroot-gui Yes - all present and correct. It isn't the end of the world - I just thought it very odd... Edited March 8, 2017 by JonUKRed Quote Link to comment
JonathanM Posted March 8, 2017 Share Posted March 8, 2017 30 minutes ago, limetech said: I guess for completeness can check 'em all: Would it hurt anything or be worth the labor to do a checksum early in the boot process and log the results in syslog? Quote Link to comment
limetech Posted March 8, 2017 Author Share Posted March 8, 2017 1 hour ago, JonUKRed said: Yes - all present and correct. It isn't the end of the world - I just thought it very odd... Please post your syslinux.cfg file. Quote Link to comment
G2-91305 Posted March 8, 2017 Share Posted March 8, 2017 Hey guys, just want to say that i was having this problem with an Asus Z-170 Maximus viii hero. Tried everything and finally fixed it by updating to the latest bios on my board. Not sure if that will help others but its worth a shot for those of us on z-170. Quote Link to comment
JonUKRed Posted March 9, 2017 Share Posted March 9, 2017 10 hours ago, limetech said: Please post your syslinux.cfg file. OK here is mu syslinux.cfg file. default /syslinux/menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append initrd=/bzroot label unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemode label Memtest86+ kernel /memtest 10 hours ago, G2-91305 said: Hey guys, just want to say that i was having this problem with an Asus Z-170 Maximus viii hero. Tried everything and finally fixed it by updating to the latest bios on my board. Not sure if that will help others but its worth a shot for those of us on z-170. OK - issue fixed. I have a Z270 MB and after reading the above I thought I would try something. I knew I was running the latest BIOS as it was updated very recently and both the support site and MB told me so. Anyhow, I reset the MB to default setting and low and behold I can now boot straight to unRAID OS without any issue. So it wasn't that I was running out dated BIOS but a setting within BIOS that was causing the problem. I will run some trial and error on MB settings to try and recreate the problem and see whether I can isolate it - If I find it I will let you all know. Thanks for your help! Jon. Quote Link to comment
thither Posted March 9, 2017 Share Posted March 9, 2017 17 hours ago, limetech said: I guess for completeness can check 'em all: 5a4d270d192c0573bb78af92220e149b bzimage c1a14a522656426fb9e20b66a5968d1a bzroot f65c0917efe04edf5b91528c3c7eb1d1 bzroot-gui Just to confirm, I also see these same checksums on my Asus Z170 board, and my syslinux.cfg is the same as the one @JonUKRed posted above (and I'm also not able to boot into non-GUI mode). Don't have time for a BIOS upgrade now but I'll try it sometime in the next few days and report back. Quote Link to comment
G2-91305 Posted March 9, 2017 Share Posted March 9, 2017 7 hours ago, JonUKRed said: OK here is mu syslinux.cfg file. default /syslinux/menu.c32 menu title Lime Technology, Inc. prompt 0 timeout 50 label unRAID OS menu default kernel /bzimage append initrd=/bzroot label unRAID OS GUI Mode kernel /bzimage append initrd=/bzroot,/bzroot-gui label unRAID OS Safe Mode (no plugins, no GUI) kernel /bzimage append initrd=/bzroot unraidsafemode label Memtest86+ kernel /memtest OK - issue fixed. I have a Z270 MB and after reading the above I thought I would try something. I knew I was running the latest BIOS as it was updated very recently and both the support site and MB told me so. Anyhow, I reset the MB to default setting and low and behold I can now boot straight to unRAID OS without any issue. So it wasn't that I was running out dated BIOS but a setting within BIOS that was causing the problem. I will run some trial and error on MB settings to try and recreate the problem and see whether I can isolate it - If I find it I will let you all know. Thanks for your help! Jon. Glad that helped a bit man! Quote Link to comment
Enver Posted March 13, 2017 Share Posted March 13, 2017 On 20/02/2017 at 0:18 AM, sakh1979 said: After I upgraded from 6.2.4 -> 6.3.2 I started seeing this error message in the log: Feb 18 20:53:08 Tower emhttp: err: handleRequest: getpeername: Transport endpoint is not connected Feb 18 20:53:08 Tower emhttp: err: handleRequest: getpeername: Transport endpoint is not connected I only seem them if I am connecting to unRAID via a Window's 10 laptop, connecting to unRAID with any other OS (Linux or OSX) does not give me this error message. Is there something I can do to prevent this error from showing up? I am having the same error; please see my post here: Have you had any progress on this issue? Quote Link to comment
JorgeB Posted March 14, 2017 Share Posted March 14, 2017 Can anyone from LT (or anyone else) look at this thread, it's the second time I've seen this issue, when assigning a disk as parity, looks like the partition is successfully created but right after there's an invalid partition error and array won't start: Mar 13 16:13:19 Tower emhttp: writing GPT on disk (sde), with partition 1 offset 64, erased: 0 Mar 13 16:13:19 Tower emhttp: shcmd (585): sgdisk -Z /dev/sde &> /dev/null Mar 13 16:13:19 Tower kernel: sde: sde1 Mar 13 16:13:20 Tower emhttp: shcmd (586): sgdisk -o -a 64 -n 1:64:0 /dev/sde |& logger Mar 13 16:13:21 Tower root: Creating new GPT entries. Mar 13 16:13:21 Tower root: The operation has completed successfully. Mar 13 16:13:21 Tower kernel: sde: sde1 Mar 13 16:13:21 Tower emhttp: shcmd (587): udevadm settle Mar 13 16:13:21 Tower emhttp: invalid partition(s) Quote Link to comment
BoHiCa Posted March 14, 2017 Share Posted March 14, 2017 I just ran through a successful upgrade from a very stable 6.1.9 version to 6.3.2 via the "Plugin" update method. Smooth as silk! This machine has no dockers currently configured (but dockers are enabled) nor VM's (hardware can't handle it, CPU is an Atom quad-core, 4 GiB RAM, 19 devices in the array, single parity drive (actually a hardware RAID 1 in an enclosure off an eSATA port = parity drive) and bonded ethernet for fault-tolerance. The drives are mostly re-purposed laptop drives for power consumption reduction).). System logs look clean (as in no errors), and all shares appear to be present and functioning nominally when accessed from Win 10 machines and Linux machines. The only "oddity" I've noticed is with the report of the last parity check in the Main tab and the Dashboard tab. I checked this right before performing the upgrade, and it reported 0 errors from the prior parity check which completed yesterday. Right after coming up in 6.3.2 and starting the array the UI reports this: Last checked on Sun 12 Mar 2017 07:09:49 PM CDT (yesterday), finding errors. Duration: 21 hours, 9 minutes, 48 seconds. Average speed: 26.3 MB/s That is the usual parity check time and speeds for this tiny box (motherboard SATA (6 drives) + eSATA (parity) and LSI HBA SATA controller on PCIe X8 (12 drives) + 4 cache SSD's on the LSI controller also), it varies by minutes +/- every week like clockwork. Data = xfs, cache = btrfs. I checked the /config/parity-checks.log file and the last entry from the last parity check is: Mar 12 19:09:49|76188|26.3 MB/s|0 I'm assuming that there have been some changes to the format of the entries in parity-checks.log that explain the odd phrasing in the UI, but wanted to make sure before I rely on the integrity of the box again. Great work guys! Quote Link to comment
Frank1940 Posted March 14, 2017 Share Posted March 14, 2017 4 minutes ago, BoHiCa said: I'm assuming that there have been some changes to the format of the entries in parity-checks.log that explain the odd phrasing in the UI, but wanted to make sure before I rely on the integrity of the box again. I ran a Correcting Parity Check yesterday and this is the report on the Array Operation tab: Last check completed on Sun 12 Mar 2017 03:09:01 PM EDT (yesterday), finding 0 errors. Duration: 7 hours, 13 minutes, 18 seconds. Average speed: 115.4 MB/sec Not really sure why you are seeing truncated report, but this is what I found in the parity-checks.log file: 2017 Mar 12 15:09:01|25998|115.4 MB/s|0|0 Did you by any chance terminate it before it completed? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.