Jump to content

dlandon

Community Developer
  • Posts

    10,398
  • Joined

  • Last visited

  • Days Won

    20

Everything posted by dlandon

  1. I was finally able to reproduce some of the hangs people have experienced. I had to add code to handle some Linux commands that hang when a remote CIFS share server goes offline. Some commands that didn't even have anything to do with the CIFS mounts. All Linux commands now have time outs to prevent hangs. You'll see error messages in the log when a command times out. I may have to extend some time outs for different systems. Anyone having the UD hang, please test and then give me a diagnostics regardless if you have issues or not. I need to see if the time outs are appropriate for all cases.
  2. Some log entries: Nov 28 18:09:56 Tower01 unassigned.devices: Unmount cmd: /bin/umount -fl '10.69.69.240://mnt/user/black_hole' 2>&1 Nov 28 18:19:59 Tower01 kernel: nfs: server 10.69.69.240 not responding, still trying ### [PREVIOUS LINE REPEATED 1 TIMES] ### Nov 28 19:00:20 Tower01 unassigned.devices: Successfully unmounted '10.69.69.240://mnt/user/black_hole' It looks like the unount of the nfs share is stuck. The unmount '-fl' parameter is supposed to force an unmount. Looks like that is taking a long time. The new timeout on the 'umount' should help.
  3. When there is an unclean shutdown, a diagnostics fie is created if possible at /flash/logs/. Please post it if there is one.
  4. Just copy it and that's all. If you reboot, the file will be replaced by the UD plugin when the array is started. The lib.php file is a test version I want tested by people that have issues with UD hanging. I'll release it in an updated plugin once I get some feedback on if it is helping and not creating problems.
  5. Read the prior post. I'd like you to try that to see if that helps keep UD from hanging. After you copy the file, work with UD a bit to see if it hangs. Post diagnostics so I can see where things go wrong.
  6. For anyone having issues with UD appearing to hang, please extract and copy the attached lib.php file. cp lib.php /usr/local/emhttp/plugins/unassigned.devices/include/ I've added timeouts to all shell_exec commands to terminate any hanging commands with logging if it takes too long. You'll see a warning in the log. Hopefully UD won't hang any more. lib.zip
  7. You mounted the seagate disk. I don't see anything in the log, but I see issues in the smart report: SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) 7 Seek_Error_Rate POSR-- 084 060 045 - 256581222 195 Hardware_ECC_Recovered -O-RC- 079 064 000 - 87027788 I am not an expert on disk drives, but this looks like a controller or cable issue. Disk problems can cause UD to hang if the Linux commands used to check the disk hang.
  8. Don't mount or unmount manually. Use the rc.unassigned script. It will manage things properly. The script runs when the drive is mounted, unmounted, or there are errors. The auto mount has nothing to do with the script execution.
  9. I'm still waiting for someone to supply some diagnostics so I can help troubleshoot. Can't offer any solutions until I can determine the problem.
  10. I don't understand your concern here. The $OWNER variable only tells you how the script was initiated. The UD script is just like any other bash script. The other variables are for your use in the script as you see fit. It is open source, but it's written in php with some html for the web page. Linux commands are executed to get the device information needed to manage the UD devices.
  11. $OWNER is set by UD before the script is called. 'udev' means the device was physically plugged in or removed. 'user' means the device was mounted/unmounted from the UD gui by clicking 'Mount' or 'Unmount' buttons.
  12. This situation can be quite confusing. It is not necessarily a VM or Docker specific issue. It seems to relate to static ip addresses in VMs and Dockers. Everything appears to work, but the logging is extreme. I ran into this situation when I was testing the beta 6.8. I was able to get around it by not using static ip addresses in Dockers. Others have been able to solve the issue with an added NIC or setting up VLANS. When I first ran into this situation, I did a little research and the error comes from 'tun'. The developers implemented the log message because they felt that the network error that causes this message was important and the underlying error should be resolved, rather than just being ignored by turning off the logging. This issue came up for me on a specific version of 6.8 beta. LT is researching this issue and what changed between the version that worked for me and the version where I ran into the problem LT will eventually get it resolved. Just turning off the logging is probably not be right answer. Be patient while they work on it. They are not ignoring it! Don't get frustrated if LT does not respond to every post or PM. I'd rather they work on a solution and not spend a lot of time discussing the issue over and over. There are several options you have if you have this issue: Do not use this release candidate and wait for a resolution. Use another NIC for Dockers or VMs so they are separated. Set up VLANs to isolate Dockers and VMs. Remove static ip addresses from Dockers and/or VMs until the error logging stops.
  13. For those of you using Home Assistant, it looks like Home Assistant is now updated to work with this version of Zoneminder.
  14. Still happening: Nov 22 23:59:23 MediaServer kernel: tun: unexpected GSO type: 0x0, gso_size 125, hdr_len 179 Nov 22 23:59:23 MediaServer kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Nov 22 23:59:23 MediaServer kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Nov 22 23:59:23 MediaServer kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Nov 22 23:59:23 MediaServer kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ I have two windows VMs running with static ip addresses. I have nine dockers running and when I set one of the dockers to static ip, the errors start.
  15. Please remove the preclear plugin and post diagnostics as requested. Also change disk with the duplicate uuid.
  16. I as well as many other users have remote mounted SMB shares. I have two to an out of state backup server and have not had any issues with UD hanging. There are some common things I have found with UD hanging: preclear plugin. The preclear plugin has a background task called rc.diskinfo that gathers information about all UD disks needed to perform preclears. There have been situations where having the preclear plugin installed causes issues. Using Jumbo frames on a LAN when mounting remote SMB shares. Jumbo frames are for network experts and require a very particular setup to work properly. This involves NICs, switches, and routers on the LAN. Jumbo frames will not add any noticeable performance to your LAN and is not worth the headaches. Remote mounted SMB shares seem to have more issues that NFS. If possible use NFS as remote mounted shares. UD pings the remote server to see if it is on-line before performing any queries to the remote mounts, and does a lot to try to prevent hanging. Time outs on commands are only a part of that. Let's concentrate on troubleshooting your particular situation rather than commenting on my programming skills.
  17. You have multiple problems you need to sort out: Nov 22 00:49:04 Headless kernel: XFS (sdi1): Filesystem has duplicate UUID a2e64706-14cd-44b6-b7b2-e1ec18f0e97e - can't mount Nov 22 00:49:04 Headless unassigned.devices: Mount of '/dev/sdi1' failed. Error message: mount: /mnt/disks/WCC132184639: wrong fs type, bad option, bad superblock on /dev/sdi1, missing codepage or helper program, or other error. Nov 22 00:49:31 Headless rc.diskinfo[3938]: SIGHUP received, forcing refresh of disks info. Looks like preclear is installed. I've already asked you to remove it. NOTES "Filesystem has duplicate UUID" is not exactly true - the whole reason that the drive is mated with this system (as the 'source') is because if it was on the 'object' system IT WOULD have a duplicate UUID, this drive and the one that replaced it on that system. This is not a UD issue. You need to sort it out. Nov 21 20:55:04 Tower kernel: ata14.00: exception Emask 0x32 SAct 0x0 SErr 0x0 action 0xe frozen Nov 21 20:55:04 Tower kernel: ata14.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus Nov 21 20:55:04 Tower kernel: ata14.00: failed command: READ DMA EXT Nov 21 20:55:04 Tower kernel: ata14.00: cmd 25/00:00:20:49:80/00:03:b6:00:00/e0 tag 29 dma 393216 in Nov 21 20:55:04 Tower kernel: res 50/00:00:1f:4c:80/00:00:b6:00:00/e6 Emask 0x32 (host bus error) Nov 21 20:55:04 Tower kernel: ata14.00: status: { DRDY } Nov 21 20:55:04 Tower kernel: ata14: hard resetting link Nov 21 20:55:05 Tower kernel: ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Nov 21 20:55:05 Tower kernel: ata14.00: configured for UDMA/133 Nov 21 20:55:05 Tower kernel: ata14: EH complete You have a cable, controller, or disk problem. Nov 22 00:10:19 Tower kernel: CIFS VFS: BAD_NETWORK_NAME: \\HEADLESS\OCZ-AGILITY3 Nov 22 00:10:19 Tower kernel: reconnect tcon failed rc = -2 I have no idea what this is about. Reboot your servers and run them for a while then post diagnostics. Don't go to the UD page until you gather diagnostics. It's better to post information such as log snippets as .txt and not .rtf.
  18. I really don't appreciate the capitalization emphasis - it appears you are yelling. You don't need to come across with that kind of attitude to get help with your problem. I volunteer my time in maintaining UD, and I respond the best I can to all requests. I appreciate it can be frustrating when things don't work as you'd like. It's frustrating trying to help every one with an issue when I can't reproduce the issue and users don't post diagnostics so I can try to find the problem. I need diagnostics to help you. UD can not be restarted. Just restarting UD, if it was possible is a terrible solution. Let's find out why it "hangs" and fix that. UD is a GUI that displays information about unassigned devices, and mounts/unmounts disk drives. UD has been known to "hang" in the past when there were issues with getting device information. The worst are CIFS devices (NFS and SMB remote mounted devices). When they are off-line, the 'df' command hangs and will not terminate, even with an error. Even 'df' commands on hard drives will hang if the remote CIFS mount is included and off-line. I've had to limit the 'df' commands to specific devices and not use the generic 'df' on all devices. Time outs have been implemented on all device operations that can potentially "hang". That's the best I can do because I did not write the 'df' command. If you have the preclear plugin installed, please remove it. Preclear also works with unassigned disks in the background and it can sometimes cause a hang when UD is also trying to work with the unassigned devices. CIFS mounts are dependent on solid network operation. Do not use Jumbo frames. If you are, change back to the default MTU values in all NICs, switches, router, etc. If you continue to have problems, post your diagnostics. Please don't come across as yelling. It doesn't inspire me to help you, and will actually annoy me enough so I won't be interested in helping you.
  19. You need to fix this. Look at this post and see if it applies to your situation:
×
×
  • Create New...