bigjme

Members
  • Posts

    351
  • Joined

  • Last visited

Everything posted by bigjme

  1. Nope... What type of events are you getting? root@tmp-unraid:~# egrep -i 'fail|error' /var/log/syslog Jul 9 18:01:29 tmp-unraid kernel: ACPI: \: failed to evaluate _DSM (0x1001) root@tmp-unraid:~# Around 1 of these everything few seconds Jul 9 23:46:23 Archangel kernel: pcieport 0000:00:02.0: AER: Multiple Corrected error received: id=0010 Jul 9 23:46:23 Archangel kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Receiver ID) Jul 9 23:46:23 Archangel kernel: pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00000040/00002000 Jul 9 23:46:23 Archangel kernel: pcieport 0000:00:02.0: [ 6] Bad TLP Last time i saw this was when i had an nvidia driver issue on a vm but the only thing to change before these started was the unraid update and this driver being added These are from the main unraid system log Jamie
  2. Are you getting any issues reported in your system log? I get multiple per second with my 7805H Jamie
  3. Others may have more expert opinions on this, here are my thoughts: * Probably harmless, in that the AER function (Advanced Error Reporting) is detecting and correcting an issue. * But indirectly not so harmless, because at the current rate, you're going to explode the logging space due to these, syslog growing at a rate just under 1MB per hour. * "Bad TLP" means a bad Transaction Layer packet was detected, by CRC error or wrong sequence number. "Bad DLLP" means a bad Data Link Layer packet was detected, same reasons. So these sound similar to CRC errors on SATA cables, except it's on a PCI bus or connection. * Suggestions: - you've started a parity check, that doesn't seem like a good idea at the moment; until this is figured out, I'd cancel it - try isolating it down to a particular PCI device - with no other drive activity happening, see if these errors occur when you read or write to a single drive, and cycle through all of your drives - then try I/O through all other PCI attached devices, graphics units, USB devices, experiment at will and see what affects the production of these messages, positively or negatively. I can't say you'll actually pin the cause down, but you may eliminate a number of possibilities. Having just been on a game on both systems i can say the error is not caused by my GPUs or the usb controller (no connected devices) The actual error coming up is from one of the onboard PLX chips which manages the PCI devices - this was a driver issue last time i saw these types of errors so perhaps it is an issue with the new adaptec driver? I haven't seen any slow downs on the system or issues but as you say, the log file is ballooning in size and the main system log is so long now that chrome starts to lock up when i open the page The controller was plugged in before i did the update to this version and had the drivers so that is the only thing i can think of unless the drive cables could be suspect? In this instance my server is moving to a new case in the next 2 weeks so will be using a different cable and backplane - this cable did come from a working system however it was running windows Jamie
  4. Others may have more expert opinions on this, here are my thoughts: * Probably harmless, in that the AER function (Advanced Error Reporting) is detecting and correcting an issue. * But indirectly not so harmless, because at the current rate, you're going to explode the logging space due to these, syslog growing at a rate just under 1MB per hour. * "Bad TLP" means a bad Transaction Layer packet was detected, by CRC error or wrong sequence number. "Bad DLLP" means a bad Data Link Layer packet was detected, same reasons. So these sound similar to CRC errors on SATA cables, except it's on a PCI bus or connection. * Suggestions: - you've started a parity check, that doesn't seem like a good idea at the moment; until this is figured out, I'd cancel it - try isolating it down to a particular PCI device - with no other drive activity happening, see if these errors occur when you read or write to a single drive, and cycle through all of your drives - then try I/O through all other PCI attached devices, graphics units, USB devices, experiment at will and see what affects the production of these messages, positively or negatively. I can't say you'll actually pin the cause down, but you may eliminate a number of possibilities. So I didn't see this till now so the parity check did finish, after finishing it seems to have calmed down and appears at a much slower rate This system has been on beta a while and that last time I saw these type of errors was when I have a bad no idea driver in a vm. Neither vms have had new drivers so that shouldn't be causing it The USB controller connected has been for a while but has nothing plugged into it which leaves me to believe it may be the new adaptec hba that we just got drivers for
  5. I just decided to check my system logs to come into thousands of error lines all like this Jul 9 01:57:51 Archangel kernel: pcieport 0000:00:02.0: AER: Multiple Corrected error received: id=0010 Jul 9 01:57:51 Archangel kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Receiver ID) Jul 9 01:57:51 Archangel kernel: pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00000040/00002000 Attached is my system diagnostics, i get multiple of these errors every second or so Jamie archangel-diagnostics-20160709-0201.zip
  6. Or if they don't exist and your system is new then create them? Surely something could store somewhere to say it was a fresh install or newly registered with a trial key?
  7. I've had my nvme drive in the cache array for a while now, I plan to move it out and put in some ssds as I want my nvme for vms and the cache for caching on ssds
  8. Nice! Thank you for the report. Not a problem, I also have to say I'm getting better performance even during parity checks with this driver and card than my on board so a huge thanks! I may have to upgrade to a pro license just because of this
  9. I deleted that folder off my cache as well, but I thought I'd created that one tbh.... Doesn't seems so, I wonder if that's for vm snapshots in a future release It seems those 2 are the only ones I do think expect, I have an app data but that is just there, and system I dont plan to do a reboot for a while so it would be interesting to see if they stay deleted or not
  10. I had it as well. I'm not sure if this one has been around a while or not but I have a "domains" share which is labeled as "saved VM instances" as well Jamie
  11. Can you please define the precise mechanism for key validation? Has it changed from the way it worked in the betas? The Internet connection here is almost as unreliable as our power supply - and when the Internet is down we often revert to watching a movie. However, if the server won't start up without Internet, then that will no longer be possible. Will this requirement for Internet still be present in 'Final'? If so, I will, regrettably, have to remain on v6.1.9 or investigate alternative storage solutions. This is in betas and rc versions so lime tech can prevent usage of new installs after the full release is out - in case someone installs a beta on accident The final release won't have this
  12. I've done a quick search and can't find out for sure but are nvme drives working fine? I know support was added in May but has anyone tested it yet? Jamie
  13. I would like to confirm the Adaptec driver worked a charm and my 7805H is working perfectly - no more sata for me! Thanks for putting that in at such a late point, user requests for the win! Everything else seems fine and i have been able to update my dockers without deleting the main image file Jamie
  14. Indeed we do, now to de-rail this topic even more and relight the torches! Any news LT? Jamie
  15. As an update to this after talking to LT the drivers for these cards should be implemented in the rc1 for 6.2, i have been told it is imminent so i will post back once i have been able to test this Jamie Edit: Support is now in for the 7805H in 6.2.0 rc1
  16. Hopefully rc1 will resolve a lot of these issues, I heard it was "imminent" but that was 24 hours ago so yeh.... soon™
  17. I increased the value mentioned below in beta 22 and going to 23 I never changed it back I did a 4tb transfer yesterday from my array to a windows vm to a USB drive with no issues This may fix your issue instantly although people have reported it as not needed now Jamie
  18. Do you have one of these cards already ezhik? I'm not sure if this one will work but i plan to try it later: https://lime-technology.com/forum/index.php?topic=40684.0 Its for an adaptec raid card but something similar may need to be done maybe - I'm going to try this later after backing up my server drives as i don't have any spares to plug in and try Jamie Edit: As its my week off work i took apart my array, and tried booting unraid with different settings on the controller and tries running lsblk and a few others and the drives do not appear as a device at all - in fact where some people see the controller as a block level device i see nothing This disproves a controller setting and only leaves drivers - i think - i've seen rather a few posts about people wanting adaptec drives but the majority of them are never replied to at all
  19. Hi Everyone, While there is a feature request for this now i wanted to see if anyone had been able to get one of these controllers working with unraid and just not posted about it As it is a HBA i expected it to work but was wrong on that front - the device shows up in system devices correctly 06:00.0 Serial Attached SCSI controller [0107]: Adaptec PMC-Sierra PM8018 SAS HBA [series 7H] [9005:8088] (rev 06) When in the array screen no drives are shown although the motherboard does pick them up so i know the controller works Attached is my system diagnostic in case that helps answer any questions Help on this would be great as i would imagine there may be a driver i need to install on boot or perhaps a HBA setting that needs changing? Jamie archangel-diagnostics-20160705-1323.zip
  20. Having just had one of these delivered thinking it would work I would like to request this as well Edit: For additional information this is how the 7805H shows in system devices 06:00.0 Serial Attached SCSI controller [0107]: Adaptec PMC-Sierra PM8018 SAS HBA [series 7H] [9005:8088] (rev 06) It seems to pick up the correct name/information but just won't show the drives in the array selector - i did not try checking system devices with a drive connected to see if they were detected in there at all I have also attached my system diagnostic in case it holds extra information that may help archangel-diagnostics-20160705-1323.zip
  21. No, memcached is a performance boosting tool that can be added to an app with a database. By its nature, it uses up memory, potentially a LOT of memory, in order to provide that boost. I believe it's best used when the database server is remote, caching objects locally in RAM that would otherwise be stored on drives elsewhere, and accessed over limited network bandwidth. So I'm not sure why it's used here when it's all on your server, except perhaps to cache in RAM what's on your drives. My only experience (and it's indirect) is with Perl and MySQL, client server. It's not a drop-in booster, it needs to be carefully hand-tooled into the application. When I have time, I don't right now, I'll take a look, add additional figures to the table. What seems obvious though is that Jamie is your memory hog, demanding much more than it perhaps should. As I said, troubleshooting Jamie is probably the right focus. If you have a way to disable memcached usage, that might be an interesting test. Or switch to a different application provider that doesn't use memcached. Another option is to disable ballooning for Jamie, since Jamie seems to want to grab any memory it can. Another option is to drop Cat to 6GB and Jamie to 10GB, both of which seem already excessive to me (but I'm old-school). That would free 4GB. Let's see if Jamie still want to grab that 4GB for itself. It's possible that memcached is better used when it isn't sharing the available memory with other VM's. Perhaps it needs hard limits, which is why I'm suggesting trying without ballooning. I'm sure there are others here with more relevant experience though. The vm Jamie at one point had 16GB allocated to it and the more it has allocated the more it tries to take from the system so i dropped it to 12GB and then down to 8 as you can see now. My vm for CCTV wasn't around when it was 16GB however I'm not very familiar with the vm terminology or how to change anything not in the web panel, how can i disable balooning and what affect will it have on the systems? The vm Jamie sits at 2.4GB memory usage almost all the time so 8GB is plenty unless i start doing large image/video editing (web developer by trade) So for my usual use i would use memcache to cache webpages served from Apache or Nginx mainly when the websites are dynamic, to my knowledge i haven't enabled it on my servers as i have no access to the php config but it seems that opcache was enabled as well (again nothing i enabled) I have altered my system to disable opcache on every page load since i can not control it directly so this will remove this from the loop - although this hasn't altered the memory usage at all i don't want it on anyway (good spot!) My system is right now reporting 26.8G Mem, and 1.33G Swap used and it doesn't appear to be taking more for that vm so it hasn't just taken everything it can get its haands on One that may be worth a try is to see if i increase the memory for vm Cat to 12gb and see if that has the same memory usage issue as vm Jamie? It may help to rule out of it is a possible OVMF issue? Thanks for your help again! Jamie
  22. I agree that may be useful So after posting last night i changed my vm Jamie to be allocated 8GB memory, the same as the VM Cat I've just checked htop for both machines and here are the memory usages VM1 Jamie: 12.9G RES Uptime: 16 hours VM2 Cat: 10.4G RES Uptime: 93 hours Both vms have 8gb allocated, both vms have a gpu passed through, both vms have the same install of windows running (same disc, same date installed, even down to updated and antivirus), both have a usb controller passed through (same make and model) The only difference in them are: VM1 is OVMF, and has 4 passed through USBs VM2 is Seabios, and has 2 passed through USBs Yet there is 2.5G usage difference between the 2 Jamie
  23. Hi RobJ Thanks for the reply. By menaced do you mean the swap file that is running? I'm not aware of anything else that may be causing something like that to run? To my understanding memcached is a php extension? My system boots at around 26GB memory usage with everything loaded, I've posted a few logs on the previous page where the system would crash in less than 24 hours due to memory issues, these should be small enough to show everything from the time of boot? If I disable the swap file plug in I am using the vms get so large that the system shuts things down shut to save memory and the system locks up. I am using around 2.4gb swap at all times with all physical memory used With regards to memory usage per vm I posted some figures about these on the previous page as well. For example if I shut down the vm Jamie and restart is it will using around 13gb at boot which is fine. Over the next day or so it will grow to us almost 19gb where it tends to stay at around that point I am using beta 23 in case something new is in this one for vms that I haven't seen yet Thanks again RobJ for your reply Jamie
  24. Ok so i had waited but things are getting stupid now, my emby docker can not stream files or even serve artwork for videos because the server has no memory left Below is a link to my latest server diagnostics, it is too large to post here but shows an outline of the changes in my system over the last 8 days of running - so as an updated: Yes i am reliant on the swapfile plugin to keep the system running Yes i have had to reduce my main vm's memory allocation down to 8GB from 12GB just so the rest of my server is usable In total out of 32GB i can now only allocate 18GB to VMs before the system just swallows the rest of it and because a mess Diagnostic link: http://www.cudamining.co.uk/archangel-diagnostics-20160701-2210.zip i know i'm a pest but i could really do with this one being looked into jonp as if the server keeps chewing up my memory for nothing i'm going to have to start considering changing my OS - after having just got my vms working after months of issues Regards, Jamie
  25. So my system hasn't shut down or crashed since my last report (it is using 600M swap at the moment) My plan is to host a few of my log files online as they will be too large to post on the forum directly. Will this be of any use if i just post the links jonp? As i have debug mode on i get a diagnostic every 30 minutes so can post them on intervals if needed or just the latest and you can work backwards through the logs This should outline the change in VM memory and everything else over a number of days so you can see what changes? Regards, Jamie