February 26, 201511 yr On b14b, I got lost communications even when not doing pci-back and booting in Xen. Xen is taking over control of the hardware and I suspect that the apcups daemon is trying to attach a USB port and Xen is not ready for that. I'm booting in Xen mode, but with no VMs running - apcups is running fine. Do I need to (auto?)start a VM in order to provoke this problem?
February 26, 201511 yr Author On b14b, I got lost communications even when not doing pci-back and booting in Xen. Xen is taking over control of the hardware and I suspect that the apcups daemon is trying to attach a USB port and Xen is not ready for that. I'm booting in Xen mode, but with no VMs running - apcups is running fine. Do I need to (auto?)start a VM in order to provoke this problem? I had the problem without auto start.
February 26, 201511 yr I upgraded to b14b and I cannot get UPS communications even when I just pass through the iGPU. On b14 I could get the UPS communications to work if I did not auto start a VM. Now with all VMs set to not auto start and pci back set in the syslinux.cfg, I get no UPS communications. The USB keyboard and mouse connected to my server do not come on once unRAID is booted. It appears to me that the USB device drivers are not getting loaded to enable the USB ports, or are getting clobbered Syslog attached. EDIT: Apcupsd is currently started on the 'driver_loaded' event. Is that too soon? The plugin started apcupsd when the plugin was started. Would it be better to start apcupsd when the array is started? Wouldn't it be better to start it as early as possible, but include some 'wait until safe' logic at the top of the installer/loader? Could be as simple as 'if xen sleep 10', or better 'wait while unready condition detected, then proceed'. Then others without the specific race conditions could be protected as early as possible.
February 26, 201511 yr dlandon, A few things to do for me: 1) After booting, navigate to the webGui under Tools -> System Devices. Please copy and paste the contents of the USB Devices section here. 2) Are you using any USB hubs / port replicators in your setup or are all the USB devices you use directly attached? 3) Can you try changing the USB port you use for the APC UPS? Specifically try switching between USB 3 and USB 2 (ports colored in blue are USB 3).
February 26, 201511 yr Author dlandon, A few things to do for me: 1) After booting, navigate to the webGui under Tools -> System Devices. Please copy and paste the contents of the USB Devices section here. Normal boot - USB 3.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 005: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 004: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S Boot with no communications - USB 3.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S Boot that would normally fail (but works) - USB 2.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 006: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 005: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S 2) Are you using any USB hubs / port replicators in your setup or are all the USB devices you use directly attached? All directly attached. 3) Can you try changing the USB port you use for the APC UPS? Specifically try switching between USB 3 and USB 2 (ports colored in blue are USB 3). See above. USB 2.0 port works when it would normally fail. Move UPS to USB 3.0 port after boot and works. USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 010: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 011: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S
February 26, 201511 yr dlandon, A few things to do for me: 1) After booting, navigate to the webGui under Tools -> System Devices. Please copy and paste the contents of the USB Devices section here. Normal boot - USB 3.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 005: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 004: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S Boot with no communications - USB 3.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S Boot that would normally fail (but works) - USB 2.0: USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 006: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 005: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S 2) Are you using any USB hubs / port replicators in your setup or are all the USB devices you use directly attached? All directly attached. 3) Can you try changing the USB port you use for the APC UPS? Specifically try switching between USB 3 and USB 2 (ports colored in blue are USB 3). See above. USB 2.0 port works when it would normally fail. Move UPS to USB 3.0 port after boot and works. USB Devices Bus 004 Device 002: ID 8087:8000 Intel Corp. Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003 Device 002: ID 8087:8008 Intel Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 004: ID 174c:3074 ASMedia Technology Inc. Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0781:5530 SanDisk Corp. Cruzer Bus 001 Device 010: ID 051d:0002 American Power Conversion Uninterruptible Power Supply Bus 001 Device 011: ID 045e:00cb Microsoft Corp. Basic Optical Mouse v2.0 Bus 001 Device 002: ID 174c:2074 ASMedia Technology Inc. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub S Can you repeat test a few times rebooting when attached to usb 2? Curious is the one time it worked was a fluke or I'd we are on to something.
February 26, 201511 yr Author Test 1: 3 reboots with UPS plugged into USB 2.0 - works. Test 2: Boot with UPS unplugged. Plug in UPS to USB 3.0 after unraid is booted - works.
February 26, 201511 yr I'm booting in Xen mode, but with no VMs running - apcups is running fine. Do I need to (auto?)start a VM in order to provoke this problem? I had the problem without auto start. ... in which case I would say that it's not a simple Xen/apcupsd interaction - there must be something else involved. Edit: I don't have any usb3 ports.
February 26, 201511 yr Author I stand by my original thought. The apcupsd daemon is being started too early in the sequence.
February 26, 201511 yr Test 1: 3 reboots with UPS plugged into USB 2.0 - works. Test 2: Boot with UPS unplugged. Plug in UPS to USB 3.0 after unraid is booted - works. This suggest that there is a race condition with USB 3.0 ports getting initialised. For the particular issue of the UPS then an avoidance action may be to simply avoid USB 3 ports? However I can see the day coming when USB 3.0 ports are all that new systems have so a more generic solution is really required. I wonder if this is a known issue to upstream Linux developers?
February 26, 201511 yr Author This was never a problem with the apcupsd plugin. The apcups daemon was started when the plugin was installed. This was late in the start up cycle. It is not a complex issue. The one test I did was to leave the UPS unplugged until the array started and then I plugged it into a USB 3.0 port. Communications worked fine. If there was an interaction, it would still fail.
February 26, 201511 yr Author This suggest that there is a race condition with USB 3.0 ports getting initialised. For the particular issue of the UPS then an avoidance action may be to simply avoid USB 3 ports? However I can see the day coming when USB 3.0 ports are all that new systems have so a more generic solution is really required. I wonder if this is a known issue to upstream Linux developers? The problem is that the USB ports go off line when this problem occurs. Imagine if your flash is installed in a USB 3.0 port and it goes off line. Not good. Avoiding USB 2.0 ports is not the answer.
February 27, 201511 yr I would suggest, and I mean no disrespect to you, dlandon, that to move the invocation of apcupsd, without fully understanding the real cause of this problem, would be a knee-jerk reaction. It would seem that, for the time being at least, there is a simple work-around - until someone acquires a mobo which is only equipped with USB3 ports. I don't like to argue against you, because you have done so much for the community, and it seems that you are passionate about this issue - however, until the cause is understood, we cannot know that simply moving the invocation of apdcupsd will effect a permanent, and truly effective, cure.
February 27, 201511 yr So we have a short term solution then for dlandon and anyone else that runs into this issue until we have time to implement a more proper fix. Eric and I poured through your syslog today and found a few usb errors but nothing that pointed to a root cause. Its incredibly odd to say the least and its going to take some more digging. I agree with PeterB though that to change the daemon to initialize at array start is not the right solution. The entire purpose of this is to ensure the server is safely shut down, right? I realize that without the array started, power loss is less serious, but its still an unclean shut down.
February 27, 201511 yr I realize that without the array started, power loss is less serious, but its still an unclean shut down. The unclean shutdown is one thing but, perhaps more serious is that, without apcupsd active, the UPS will continue to run until its battery is depleted. This can be detrimental to battery health.
February 27, 201511 yr I realize that without the array started, power loss is less serious, but its still an unclean shut down. The unclean shutdown is one thing but, perhaps more serious is that, without apcupsd active, the UPS will continue to run until its battery is depleted. This can be detrimental to battery health. Excellent point! The battery is the most expensive part of the UPS, so extra wear and tear here should be avoided wherever possible.
February 27, 201511 yr Author I ran my tests on my test server that is the same motherboard as my main server, but they have different processors. The test server is a Pentium and the main server is a Xeon. Thinking that I could just move the UPS to a USB 2.0 port for the time being , I unplugged the UPS and booted b14b. The USB ports were lost even when the UPS is not plugged in. I cannot boot b14b without losing USB ports. The Xeon boots faster? So i would submit that the potential is there for anyone with USB 3.0 ports that doesn't use a UPS to also have problems. If the flash is plugged into a USB 3.0 port, the system will fail when the flash goes offline. So we have a short term solution then for dlandon and anyone else that runs into this issue until we have time to implement a more proper fix. I really don't have a short term fix. I can't move off of b14. I agree with PeterB though that to change the daemon to initialize at array start is not the right solution. The entire purpose of this is to ensure the server is safely shut down, right? Really? The concern is an unprotected array that is not started? This situation occurs when unraid is started and the array does not start. How often does that happen unintentionally? And aren't there bigger problems anyway if the array doesn't start? I realize that without the array started, power loss is less serious, but its still an unclean shut down. How do you get an unclean shutdown if the array has not been started? Disks are not mounted, a parity check will not be started on the next boot, and the ram file system is temporary and gets re-loaded on boot. Maybe not ideal, but what is unclean? Ok, so maybe the choice of array_start is not the best choice, but please understand I am not privy to the inner workings of unraid. It's the best I could come up with. I'm sure LT can come up with the proper invocation of apcupsd so the USB ports are not clobbered, and it will start regardless of whether or not the array starts. How about you test my theory and start apcupsd at a later time in the sequence. Maybe at the same time as Dockers are started? Create an image I can test and let's kill this bug. Spending hours to understand this problem may not be the best use of your time and you will probably not find the reason easily. I know I'll sound like an old fart here, but I've been working with computers since 1974. I cut my teeth on an Intel 8008. I've worked with microcomputers, main frames, and mini computers. You develop a sense after a while of what is happening without necessarily having all the answers. I don't analyze things ad nauseam, I am more pragmatic and solve problems. There is an answer here that is pretty simple. EDIT: It appears to be USB 3.0 ports that I lose.
February 27, 201511 yr Thinking that I could just move the UPS to a USB 2.0 port for the time being , I unplugged the UPS and booted b14b. The USB ports were lost even when the UPS is not plugged in. I cannot boot b14b without losing USB ports. The Xeon boots faster? Are you saying that you are losing both USB 2 and USB 3.0 ports - I was not sure from what you said. So i would submit that the potential is there for anyone with USB 3.0 ports that doesn't use a UPS to also have problems. If the flash is plugged into a USB 3.0 port, the system will fail when the flash goes offline. I was wondering if the UPS issue is just a symptom of a deeper underlying issue? If so and exactly what it is is identified then it is likely that a decent short term fix/workaround can be identified, and hopefully a robust long term one will also come along.
February 27, 201511 yr Author Jonp, Don't put any time into this right now. I am doing some more testing and think I have found the problem. I will post the results once I am convinced that I really have it. Right now I don't think it is an LT problem.
February 27, 201511 yr Author Ok, after some more troubleshooting I have come up with "why is this happening" and the solution. This is one for the books! First, I need to grovel a bit and say that I violated the first rule of troubleshooting issues in unraid - I did not strip out plugins. My bad. I did not think that there would be any issues involving the plugins since none of them changed. So here is what I found. When the cache_dirs plugin is disabled on start up, there is no problem. When cache_dirs is enabled on start up, the problem occurs. I put a slight delay in the cache_dirs plugin just before cache_dirs is started and there is no problem. I do not understand what cache_dirs has to do with the USB 3.0 ports, but there is an obvious interaction with apcupsd and cache_dirs. It probably did not occur with the apcupsd plugin because there was a different timing than with the package in the core unraid. I have marked this as solved and will work with bonienl to come up with a final solution.
February 27, 201511 yr Ok, after some more troubleshooting I have come up with "why is this happening" and the solution. This is one for the books! First, I need to grovel a bit and say that I violated the first rule of troubleshooting issues in unraid - I did not strip out plugins. My bad. I did not think that there would be any issues involving the plugins since none of them changed. So here is what I found. When the cache_dirs plugin is disabled on start up, there is no problem. When cache_dirs is enabled on start up, the problem occurs. I put a slight delay in the cache_dirs plugin just before cache_dirs is started and there is no problem. I do not understand what cache_dirs has to do with the USB 3.0 ports, but there is an obvious interaction with apcupsd and cache_dirs. It probably did not occur with the apcupsd plugin because there was a different timing than with the package in the core unraid. I have marked this as solved and will work with bonienl to come up with a final solution. Thanks for tracking this one down and remaining vigilant in your testing!!
February 27, 201511 yr I do not understand what cache_dirs has to do with the USB 3.0 ports, but there is an obvious interaction with apcupsd and cache_dirs. It probably did not occur with the apcupsd plugin because there was a different timing than with the package in the core unraid. If cache_dirs is interacting with apsupsd, it suggests that they are being started at the same time. That is a little bizarre because I would expect cache_dirs to wait until the array is started - isn't it, after all, intended to cache the user share directories? Hmmm, I've just looked at the cache_dirs settings and one of the options is 'Wait until array is online'. Now I'm confused!
February 27, 201511 yr Author I do not understand what cache_dirs has to do with the USB 3.0 ports, but there is an obvious interaction with apcupsd and cache_dirs. It probably did not occur with the apcupsd plugin because there was a different timing than with the package in the core unraid. If cache_dirs is interacting with apsupsd, it suggests that they are being started at the same time. That is a little bizarre because I would expect cache_dirs to wait until the array is started - isn't it, after all, intended to cache the user share directories? Hmmm, I've just looked at the cache_dirs settings and one of the options is 'Wait until array is online'. Now I'm confused! I believe you are correct about them starting at the same time. The invocation of apcupsd is in the background. I'm going to suggest that LT not invoke apcupsd in the background.
February 28, 201511 yr Author I communicated with boniel about a potential issue with the cache_dirs plugin and we confirmed that the plugin is starting at the array_mounted event as it should. The slight delay I added before cache_dirs started did solve the problem, but it was a fix for the symptom, not the cause. I decided to dig into it further and have found and tested the proper solution. This issue has been a bit difficult, but I believe my hunch was partly correct on the start up of the apcups daemon. Currently the apcups daemon is started at the driver_loaded event and the invocation of the apcups daemon is sent to the background. Currently: #!/bin/bash ( /usr/bin/php /usr/local/emhttp/plugins/apcupsd/apcupsdctl.php "autostart" ) & I modified the bzroot of b14b and made the following change: #!/bin/bash /usr/bin/php /usr/local/emhttp/plugins/apcupsd/apcupsdctl.php "autostart" Testing this on both my servers worked and there were no problems. I would also suggest that LT modify the startup section of the apcupsdctl.php to remove an unnecessary 5 second delay. This was added so the messages on the bottom of the UPS Settings web page would not scroll too fast. It's in fact not necessary and adds a 5 second delay to start up. function startapcupsd() { global $newline; echo("Starting apcupsd...$newline"); exec_log("/etc/rc.d/rc.apcupsd start"); sleep(5); <-------- remove this delay echo("Completed...$newline"); sleep(1); } I am marking this defect back to unsolved as a reminder to LT about this issue. I'm sure they are very involved right now. These a very straight forward changes and should require minimal testing by LT. I do not understand how the cache_dirs could have caused this problem to show up, but I surmise that while the apcups daemon was doing its job, cache_dirs stepped on it.
February 28, 201511 yr I'll make those changes but realize this is only a band-aide solution. You really need to understand what is the timing interaction between these subsystems that is the root cause of the issue. Without knowing this and putting in a real fix, this problem will likely happen again.
Archived
This topic is now archived and is closed to further replies.