XtremeOwnage

Members
  • Posts

    52
  • Joined

  • Last visited

Everything posted by XtremeOwnage

  1. Sure- and to note- it has been rock solid since re-configuring the networking.
  2. Since, doing those changes this morning, its been rock solid. Appears my issue was related to networking, bridges, docker, etc....
  3. Ran a full hardware diagnostics / memory scan. Came back clean. So, re-updated to 6.12.4. Moved docker to a dedicated NIC, following the instructions linked from release notes. Changed docker from ipvlan to macvlan. Disabled bridging on NICs. Has been running stable so far. Will post again this evening.
  4. So far, appears that may have resolved the issue. If the issue doesn't crop up again tomm, will close this one.
  5. Rebooted the underlying host. going to keep troubleshooting things on my side...
  6. Nope. went and ate dinner after stopping suspect containers. Tried to turn plex on TV, and its dead. Get back to observe, Unraid is completely unresponsive, using around 16 threads worth of CPU. No services responding. No NFS, SMB, No docker containers responding. No SSH. Nothing.
  7. Issue still exists. Would appear, this issue appears to potentially be related to one of my docker containers. Happened to have a top running, and noticed a certain "java" process using a ton of CPU. stopped the container which uses java on the backend. Will post updates, if this turns out it was the problem.
  8. I upgraded to 6.12.4 earlier this week, after upgrading, I went from a perfectly stable, and reliable system, to one that now... just breaks randomly. The symptoms I notice, include maxed out CPU. I am unable to ssh, or even use the root console while this is occuring. Disabled a few things, doubled the allocated resources. No VMs, only docker, storage, nfs, and smb. Going to downgrade to 6.12.3. Diagnostics are attached. tower-diagnostics-20230907-1750.zip
  9. Marking this one as closed- as one of the recent RCs disabled exclusive shares when NFS is enabled, technically fixing the issue.
  10. Reproduced the issue. root@Tower:~# tail -n 40 /var/log/samba/log.winbindd-idmap [2023/06/11 21:30:34.502927, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/13 11:17:36.643885, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/13 11:17:49.141681, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/14 13:24:32.275543, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/14 13:26:44.716683, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/14 13:27:03.166807, 0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler) Got sig[15] terminate (is_parent=0) [2023/06/14 13:27:26.754073, 0] ../../source3/passdb/pdb_smbpasswd.c:1251(build_sam_account) build_sam_account: smbpasswd database is corrupt! username media with uid 1003 is not in unix passwd database! Just- created a new user account, and added it to a share.
  11. Yea, I don't know how to explain this, but, it only randomly spins up disks for a day or two after booting. (Disk in the array, empty, unused) Another disk in the array, also empty, unused. Random, unmounted, unused disk. Not in any pool or array. This matches what I noticed before. To note, these images were taken on 6/14 @ 11:47am. Disks have not spun up in two days. Any ideas on that one?
  12. I... think I have solved the cause here- Appears restarting samba "fixed" being able to su/ssh/sftp/etc.... root@Tower:/etc/samba# /etc/rc.d/rc.samba restart Starting Samba: /usr/sbin/smbd -D /usr/sbin/nmbd -D /usr/sbin/wsdd2 -d -4 /usr/sbin/winbindd -D root@Tower:/etc/samba# cd .. root@Tower:/etc# nano nsswitch.conf root@Tower:/etc# su root root@Tower:~# exit exit root@Tower:/etc# time id root uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),17(audio),281(docker) real 0m0.011s user 0m0.006s sys 0m0.003s root@Tower:/etc# Guess the remaining question- is what happened? Is it a bug? And- should it be fixed? Edit... Appears some of the logs got cutoff from earlier.
  13. root@Tower:/etc/samba# tail -n 40 /var/log/samba/log.winbindd [2023/06/10 13:16:08.651602, 0] ../../source3/winbindd/winbindd.c:821(winbind_client_processed) winbind_client_processed: request took 61.024531 seconds [struct process_request_state] ../../source3/winbindd/winbindd.c:437 [2023/06/10 13:15:07.627039] ../../source3/winbindd/winbindd.c:618 [2023/06/10 13:16:08.651570] [61.024531] -> TEVENT_REQ_DONE (2 0)) [struct winbindd_getpwnam_state] ../../source3/winbindd/winbindd_getpwnam.c:49 [2023/06/10 13:15:07.627044] ../../source3/winbindd/winbindd_getpwnam.c:108 [2023/06/10 13:16:08.651516] [61.024472] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct wb_lookupname_state] ../../source3/winbindd/wb_lookupname.c:47 [2023/06/10 13:15:07.627053] ../../source3/winbindd/wb_lookupname.c:99 [2023/06/10 13:16:08.651513] [61.024460] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct dcerpc_wbint_LookupName_state] librpc/gen_ndr/ndr_winbind_c.c:795 [2023/06/10 13:15:07.627069] librpc/gen_ndr/ndr_winbind_c.c:862 [2023/06/10 13:16:08.651511] [61.024442] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_wbint_LookupName_r_state] librpc/gen_ndr/ndr_winbind_c.c:707 [2023/06/10 13:15:07.627073] librpc/gen_ndr/ndr_winbind_c.c:742 [2023/06/10 13:16:08.651508] [61.024435] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_call_state] ../../librpc/rpc/binding_handle.c:370 [2023/06/10 13:15:07.627076] ../../librpc/rpc/binding_handle.c:520 [2023/06/10 13:16:08.651503] [61.024427] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_raw_call_state] ../../librpc/rpc/binding_handle.c:148 [2023/06/10 13:15:07.627090] ../../librpc/rpc/binding_handle.c:203 [2023/06/10 13:16:08.651494] [61.024404] -> TEVENT_REQ_DONE (2 0)) [struct wbint_bh_raw_call_state] ../../source3/winbindd/winbindd_dual_ndr.c:93 [2023/06/10 13:15:07.627093] ../../source3/winbindd/winbindd_dual_ndr.c:209 [2023/06/10 13:16:08.651490] [61.024397] -> TEVENT_REQ_DONE (2 0)) [struct wb_domain_request_state] ../../source3/winbindd/winbindd_dual.c:506 [2023/06/10 13:15:07.627099] ../../source3/winbindd/winbindd_dual.c:744 [2023/06/10 13:16:08.651480] [61.024381] -> TEVENT_REQ_DONE (2 0)) [struct wb_child_request_state] ../../source3/winbindd/winbindd_dual.c:202 [2023/06/10 13:16:08.646594] ../../source3/winbindd/winbindd_dual.c:305 [2023/06/10 13:16:08.651477] [0.004883] -> TEVENT_REQ_DONE (2 0)) [struct tevent_queue_wait_state] ../../tevent_queue.c:351 [2023/06/10 13:16:08.646600] ../../tevent_queue.c:371 [2023/06/10 13:16:08.646604] [0.000004] -> TEVENT_REQ_DONE (2 0)) [struct wb_simple_trans_state] ../../nsswitch/wb_reqtrans.c:375 [2023/06/10 13:16:08.646610] ../../nsswitch/wb_reqtrans.c:432 [2023/06/10 13:16:08.651469] [0.004859] -> TEVENT_REQ_DONE (2 0)) [struct req_write_state] ../../nsswitch/wb_reqtrans.c:158 [2023/06/10 13:16:08.646612] ../../nsswitch/wb_reqtrans.c:194 [2023/06/10 13:16:08.646690] [0.000078] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../../lib/async_req/async_sock.c:267 [2023/06/10 13:16:08.646613] ../../lib/async_req/async_sock.c:373 [2023/06/10 13:16:08.646687] [0.000074] -> TEVENT_REQ_DONE (2 0)) [struct resp_read_state] ../../nsswitch/wb_reqtrans.c:222 [2023/06/10 13:16:08.646694] ../../nsswitch/wb_reqtrans.c:275 [2023/06/10 13:16:08.651467] [0.004773] -> TEVENT_REQ_DONE (2 0)) [struct read_packet_state] ../../lib/async_req/async_sock.c:480 [2023/06/10 13:16:08.646696] ../../lib/async_req/async_sock.c:568 [2023/06/10 13:16:08.651464] [0.004768] -> TEVENT_REQ_DONE (2 0)) [struct resp_write_state] ../../nsswitch/wb_reqtrans.c:307 [2023/06/10 13:16:08.651528] ../../nsswitch/wb_reqtrans.c:344 [2023/06/10 13:16:08.651567] [0.000039] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../../lib/async_req/async_sock.c:267 [2023/06/10 13:16:08.651530] ../../lib/async_req/async_sock.c:373 [2023/06/10 13:16:08.651551] [0.000021] -> TEVENT_REQ_DONE (2 0)) [2023/06/10 14:18:33.918514, 0] ../../source3/winbindd/winbindd.c:821(winbind_client_processed) winbind_client_processed: request took 60.084613 seconds [struct process_request_state] ../../source3/winbindd/winbindd.c:437 [2023/06/10 14:17:33.833877] ../../source3/winbindd/winbindd.c:618 [2023/06/10 14:18:33.918490] [60.084613] -> TEVENT_REQ_DONE (2 0)) [struct winbindd_getpwnam_state] ../../source3/winbindd/winbindd_getpwnam.c:49 [2023/06/10 14:17:33.833881] ../../source3/winbindd/winbindd_getpwnam.c:108 [2023/06/10 14:18:33.918438] [60.084557] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct wb_lookupname_state] ../../source3/winbindd/wb_lookupname.c:47 [2023/06/10 14:17:33.833888] ../../source3/winbindd/wb_lookupname.c:99 [2023/06/10 14:18:33.918435] [60.084547] -> TEVENT_REQ_USER_ERROR (3 10483072397370982515)) [struct dcerpc_wbint_LookupName_state] librpc/gen_ndr/ndr_winbind_c.c:795 [2023/06/10 14:17:33.833898] librpc/gen_ndr/ndr_winbind_c.c:862 [2023/06/10 14:18:33.918432] [60.084534] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_wbint_LookupName_r_state] librpc/gen_ndr/ndr_winbind_c.c:707 [2023/06/10 14:17:33.833901] librpc/gen_ndr/ndr_winbind_c.c:742 [2023/06/10 14:18:33.918429] [60.084528] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_call_state] ../../librpc/rpc/binding_handle.c:370 [2023/06/10 14:17:33.833903] ../../librpc/rpc/binding_handle.c:520 [2023/06/10 14:18:33.918424] [60.084521] -> TEVENT_REQ_DONE (2 0)) [struct dcerpc_binding_handle_raw_call_state] ../../librpc/rpc/binding_handle.c:148 [2023/06/10 14:17:33.833912] ../../librpc/rpc/binding_handle.c:203 [2023/06/10 14:18:33.918414] [60.084502] -> TEVENT_REQ_DONE (2 0)) [struct wbint_bh_raw_call_state] ../../source3/winbindd/winbindd_dual_ndr.c:93 [2023/06/10 14:17:33.833914] ../../source3/winbindd/winbindd_dual_ndr.c:209 [2023/06/10 14:18:33.918411] [60.084497] -> TEVENT_REQ_DONE (2 0)) [struct wb_domain_request_state] ../../source3/winbindd/winbindd_dual.c:506 [2023/06/10 14:17:33.833920] ../../source3/winbindd/winbindd_dual.c:744 [2023/06/10 14:18:33.918401] [60.084481] -> TEVENT_REQ_DONE (2 0)) [struct wb_child_request_state] ../../source3/winbindd/winbindd_dual.c:202 [2023/06/10 14:18:33.913592] ../../source3/winbindd/winbindd_dual.c:305 [2023/06/10 14:18:33.918398] [0.004806] -> TEVENT_REQ_DONE (2 0)) [struct tevent_queue_wait_state] ../../tevent_queue.c:351 [2023/06/10 14:18:33.913596] ../../tevent_queue.c:371 [2023/06/10 14:18:33.913599] [0.000003] -> TEVENT_REQ_DONE (2 0)) [struct wb_simple_trans_state] ../../nsswitch/wb_reqtrans.c:375 [2023/06/10 14:18:33.913602] ../../nsswitch/wb_reqtrans.c:432 [2023/06/10 14:18:33.918389] [0.004787] -> TEVENT_REQ_DONE (2 0)) [struct req_write_state] ../../nsswitch/wb_reqtrans.c:158 [2023/06/10 14:18:33.913603] ../../nsswitch/wb_reqtrans.c:194 [2023/06/10 14:18:33.913664] [0.000061] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../../lib/async_req/async_sock.c:267 [2023/06/10 14:18:33.913606] ../../lib/async_req/async_sock.c:373 [2023/06/10 14:18:33.913661] [0.000055] -> TEVENT_REQ_DONE (2 0)) [struct resp_read_state] ../../nsswitch/wb_reqtrans.c:222 [2023/06/10 14:18:33.913666] ../../nsswitch/wb_reqtrans.c:275 [2023/06/10 14:18:33.918387] [0.004721] -> TEVENT_REQ_DONE (2 0)) [struct read_packet_state] ../../lib/async_req/async_sock.c:480 [2023/06/10 14:18:33.913668] ../../lib/async_req/async_sock.c:568 [2023/06/10 14:18:33.918383] [0.004715] -> TEVENT_REQ_DONE (2 0)) [struct resp_write_state] ../../nsswitch/wb_reqtrans.c:307 [2023/06/10 14:18:33.918448] ../../nsswitch/wb_reqtrans.c:344 [2023/06/10 14:18:33.918487] [0.000039] -> TEVENT_REQ_DONE (2 0)) [struct writev_state] ../../lib/async_req/async_sock.c:267 [2023/06/10 14:18:33.918450] ../../lib/async_req/async_sock.c:373 [2023/06/10 14:18:33.918471] [0.000021] -> TEVENT_REQ_DONE (2 0)) Well, I found the 60 second delay.... Not- sure what its doing here though.
  14. OK, got something useful. root@Tower:~# nano /etc/nsswitch.conf root@Tower:~# time id root uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),17(audio),281(docker) real 0m0.007s user 0m0.004s sys 0m0.003s root@Tower:~# I commented out "windbind" for passd, and group. passwd: files #winbind group: files # winbind Any, ideas as to what that is, or why its going slowly? After making that change, SSH works normally. SU works normally...
  15. root@Tower:/etc# time su NotAUser No passwd entry for user 'NotAUser' real 1m0.970s user 0m0.006s sys 0m0.010s So... its taking one minute, just to look up a user account. root@Tower:/etc# time id root uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),17(audio),281(docker) real 1m2.090s user 0m0.006s sys 0m0.003s
  16. @KluthR Correction, after a few minutes, it logged in. Something, appears to be going quite slowly.
  17. @KluthR root@Tower:/etc# cat ~/.bash_profile # console coloring for kool kids PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ ' # if this is an xterm set the title to user@host:dir case "$TERM" in xterm*|rxvt*) PS1="\[\e]0;\u@\h: \w\a\]$PS1" ;; *) ;; esac # impersonate a user alias user="su -ls /bin/bash" alias v="ls -lA" Just kinda chilling there.
  18. Its not an ssh issue. I can't open ANY new sessions. Doesn't matter if its from su root, trying to log into the console, anything. Restarting sshd was one of the first steps I tried.
  19. Another interesting thing I just noticed- I cannot log in through the root console either. Just, kinda hangs there.
  20. Hunh. guess my reply vanished... Nothing special in .profile or .bashrc root@Tower:~# cat ~/.profile cat: /root/.profile: No such file or directory root@Tower:~# cat ~/.bashrc #!/bin/bash source /etc/profile source /root/.bash_profile Nothing odd in the passwd file either. root@Tower:~# cat /etc/passwd root:x:0:0:Console and webGui login account:/root:/bin/bash And- bash has no issues opening new sessions root@Tower:~# /bin/bash root@Tower:~# echo "hi" hi root@Tower:~# exit exit root@Tower:~# I did test using an account other then root earlier to rule out potential issues with the root account, with the same results produced.
  21. I found an odd bug, which I so far, have reproduced twice. The symptoms- logging into ssh will just time out. Jun 10 10:07:18 Tower sshd[51491]: Connection from 10.100.10.3 port 2782 on 10.100.4.24 port 22 rdomain "" Jun 10 10:07:46 Tower sshd[53503]: Connection from 10.100.10.3 port 2790 on 10.100.5.2 port 22 rdomain "" Jun 10 10:08:46 Tower sshd[58123]: Connection from 10.100.10.3 port 2803 on 10.100.5.2 port 22 rdomain "" Jun 10 10:08:59 Tower sshd[58123]: Received disconnect from 10.100.10.3 port 2803:11: Session closed [preauth] Jun 10 10:08:59 Tower sshd[58123]: Disconnected from 10.100.10.3 port 2803 [preauth] Jun 10 10:09:15 Tower sshd[51491]: Received disconnect from 10.100.10.3 port 2782:11: Session closed [preauth] Jun 10 10:09:15 Tower sshd[51491]: Disconnected from authenticating user root 10.100.10.3 port 2782 [preauth] Jun 10 10:09:36 Tower sshd[62244]: Connection from 10.100.10.3 port 2812 on 10.100.5.2 port 22 rdomain "" Jun 10 10:09:43 Tower sshd[53503]: Connection closed by authenticating user root 10.100.10.3 port 2790 [preauth] Jun 10 10:09:54 Tower sshd[63481]: Connection from 10.100.10.3 port 2818 on 10.100.4.24 port 22 rdomain "" However, it will never connect. As well, when trying "su root" (when run as root), this also hangs. the logs, however, will show a successful su. Jun 10 10:34:05 Tower su[51135]: Successful su for root by root Jun 10 10:34:05 Tower su[51135]: + /dev/pts/0 root:root When this issue occurs, there is nothing of interest from the dmesg output. As well, using the terminal from the web interface works fine, SMB and NFS works fine, and I am unable to identify anything else impacted aside from su, and ssh/sftp. SFTP exhibits the same symptoms noticed by SSH, which is expected. I can confirm, restarting the server does correct this. Restarting the service using: root@Tower:~# /etc/rc.d/rc.sshd stop root@Tower:~# /etc/rc.d/rc.sshd start However, does not help this. Will note, when this does occur, I am unable to su to ANY user account. The logs, however, do show success. Jun 10 10:41:06 Tower su[17603]: Successful su for scanner by root Jun 10 10:41:06 Tower su[17603]: + /dev/pts/0 root:scanner Jun 10 10:41:16 Tower su[18337]: Successful su for root by root Jun 10 10:41:16 Tower su[18337]: + /dev/pts/0 root:root Jun 10 10:41:19 Tower su[18640]: Successful su for root by root Jun 10 10:41:19 Tower su[18640]: + /dev/pts/0 root:root Jun 10 10:41:23 Tower su[18844]: Successful su for root by root Jun 10 10:41:23 Tower su[18844]: + /dev/pts/0 root:root Based on what I have seen, I don't think this is related to SSH, but, rather, it seems something is getting hung when creating a new session somewhere in linux. I have attached diagnostics, and I don't have any need to reboot currently, so- if you have any other steps you would like executed, please let me know.... and I should be able to execute them while in this "broken" state. Diagnostics are attached. tower-diagnostics-20230610-1044.zip
  22. The next time this issue reproduces, I'll grab diagnostics for ya. It seems to be a non-consistent issue.
  23. So- here is something interesting. Nothing has changed. But, after noon on May 21st- all of the disks stopped randomly spinning up. (and, is now working as expected?) Going to continue to monitor this, to see if I can figure out exactly... what exactly happened.....