P_K Posted January 21, 2017 Share Posted January 21, 2017 This morning I got a notification (from PlexPy) that my Plex server is down. I tried to open the unraid webpage to check but it doesn't open. I can ssh into the server and can see in top that shfs uses 100% cpu. Any ideas to bring my server back to normal? unraid 6.2.4 with several dockers. Quote Link to comment
testdasi Posted January 21, 2017 Share Posted January 21, 2017 Do you have CachedDir plugin with usershare option in the plugin turned on? I vaguely remember having shfs using CPU everytime it runs but then nothing near 100%. What's your server spec? Quote Link to comment
P_K Posted January 21, 2017 Author Share Posted January 21, 2017 cachedirs is installed but I turned it off long time ago as I had some issues with it. This is an i5 processor with 16Gb of memory. Quote Link to comment
trurl Posted January 21, 2017 Share Posted January 21, 2017 Tools - Diagnostics, post complete zip Quote Link to comment
P_K Posted January 21, 2017 Author Share Posted January 21, 2017 I restarted the server as the load in top went up to 50. The diagnostics files are taken after the reload (before reload I couldn't access the gui to take them). Hopefully they are still useful. tower-diagnostics-20170121-2053.zip Quote Link to comment
JonathanM Posted January 21, 2017 Share Posted January 21, 2017 Do you have any ReiserFS formatted drives? Quote Link to comment
P_K Posted January 22, 2017 Author Share Posted January 22, 2017 Yes, I do indeed. See attachment. Quote Link to comment
Marvel Posted February 13, 2017 Share Posted February 13, 2017 I have this issue as well. How can I kill the process and restart? kill pid is not working. Quote Link to comment
fireplex Posted February 15, 2017 Share Posted February 15, 2017 I have this issue also, never seen it before. top - 13:52:17 up 8 days, 14:25, 2 users, load average: 12.60, 11.40, 7.22 Tasks: 303 total, 1 running, 302 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 25.7 sy, 0.0 ni, 73.6 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 12065688 total, 265184 free, 4316796 used, 7483708 buff/cache KiB Swap: 0 total, 0 free, 0 used. 6720088 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2723 root 20 0 1371464 22564 816 S 101.0 0.2 103:54.83 shfs GUI unresponsive, shares failing to respond. Nothing in the logs: Feb 15 12:56:35 Tower kernel: mdcmd (178): spindown 3 Feb 15 12:57:20 Tower kernel: mdcmd (179): spindown 5 Feb 15 13:11:12 Tower kernel: mdcmd (180): spindown 0 Feb 15 13:37:39 Tower emhttp: shcmd (113218): mkdir '/mnt/user/Home Videos' |& logger Feb 15 13:37:39 Tower emhttp: shcmd (113219): chmod 0777 '/mnt/user/Home Videos' Feb 15 13:37:39 Tower emhttp: shcmd (113220): chown 'nobody':'users' '/mnt/user/Home Videos' Feb 15 13:37:52 Tower emhttp: shcmd (113248): smbcontrol smbd close-share 'Home Videos' Feb 15 13:38:05 Tower emhttp: shcmd (113276): smbcontrol smbd close-share 'Home Videos' Feb 15 13:42:17 Tower sshd[18890]: Accepted password for root from 192.168.1.31 port 49922 ssh2 Feb 15 13:47:44 Tower sshd[21497]: Accepted password for root from 192.168.1.31 port 49998 ssh2 As you can see just created a new share and was moving data from existing share to new share Quote Link to comment
deadsoulz Posted February 16, 2017 Share Posted February 16, 2017 This is happening to me as well. 2 days after latest update. Quote Link to comment
deadsoulz Posted February 16, 2017 Share Posted February 16, 2017 Seeing this is syslog. Feb 15 20:49:00 uNAS shfs/user: err: get_key_info: get_message: /boot/config/._Plus.key (-3) Is there a easy way to rollback to 6.3 or 6.2.x from command line? Quote Link to comment
John_M Posted February 16, 2017 Share Posted February 16, 2017 Seeing this is syslog. Feb 15 20:49:00 uNAS shfs/user: err: get_key_info: get_message: /boot/config/._Plus.key (-3) That's a red herring. I'd lay money on the fact that you prepared your boot device on a Mac and copied the licence key using the Finder. That created the ._Plus.key file alongside the original Plus.key file. unRAID just sees the file as a spurious alternative key file and rejects it. You might want to delete it: rm /boot/config/._Plus.key Is there a easy way to rollback to 6.3 or 6.2.x from command line? Yes: cp /boot/previous/bz* /boot and then reboot. Quote Link to comment
DavejaVu Posted February 16, 2017 Share Posted February 16, 2017 I'm seeing the same thing, no cache_dirs plugin here, and yes there are ReiserFS shares. Seemed to happen when I tried to delete a bunch of files, but that may be coincidental. top - 16:15:19 up 4 days, 2:11, 1 user, load average: 7.02, 7.16, 7.17 Tasks: 420 total, 2 running, 417 sleeping, 0 stopped, 1 zombie %Cpu(s): 0.5 us, 13.3 sy, 0.0 ni, 86.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 16410152 total, 921220 free, 1721092 used, 13767840 buff/cache KiB Swap: 0 total, 0 free, 0 used. 13508808 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6464 root 20 0 1031988 7600 744 S 99.7 0.0 150:20.69 shfs Nothing interesting in syslog, GUI unresponsive: Feb 16 01:20:32 Tower kernel: mdcmd (151): spindown 6 Feb 16 01:20:33 Tower kernel: mdcmd (152): spindown 7 Feb 16 01:20:34 Tower kernel: mdcmd (153): spindown 8 Feb 16 01:20:34 Tower kernel: mdcmd (154): spindown 9 Feb 16 01:20:35 Tower kernel: mdcmd (155): spindown 10 Feb 16 01:20:41 Tower kernel: mdcmd (156): spindown 11 Feb 16 03:30:28 Tower kernel: mdcmd (157): spindown 3 Feb 16 03:30:57 Tower kernel: mdcmd (158): spindown 4 Feb 16 03:31:05 Tower kernel: mdcmd (159): spindown 11 Feb 16 03:31:08 Tower kernel: mdcmd (160): spindown 0 Feb 16 03:31:08 Tower kernel: mdcmd (161): spindown 2 Feb 16 03:31:08 Tower kernel: mdcmd (162): spindown 6 Feb 16 03:31:09 Tower kernel: mdcmd (163): spindown 7 Feb 16 03:31:10 Tower kernel: mdcmd (164): spindown 8 Feb 16 03:31:10 Tower kernel: mdcmd (165): spindown 9 Feb 16 03:31:11 Tower kernel: mdcmd (166): spindown 10 Feb 16 03:31:13 Tower kernel: mdcmd (167): spindown 1 Feb 16 03:31:14 Tower kernel: mdcmd (168): spindown 5 Feb 16 10:42:14 Tower in.telnetd[7649]: connect from 192.168.1.10 (192.168.1.10) Feb 16 10:42:21 Tower login[7650]: ROOT LOGIN on '/dev/pts/0' from '192.168.1.10' Feb 16 11:49:40 Tower kernel: mdcmd (169): spindown 1 Feb 16 11:49:41 Tower kernel: mdcmd (170): spindown 5 Feb 16 12:16:15 Tower kernel: mdcmd (171): spindown 4 Feb 16 12:16:15 Tower kernel: mdcmd (172): spindown 6 Feb 16 12:16:16 Tower kernel: mdcmd (173): spindown 7 Feb 16 12:16:17 Tower kernel: mdcmd (174): spindown 8 Feb 16 12:16:17 Tower kernel: mdcmd (175): spindown 9 Feb 16 12:16:18 Tower kernel: mdcmd (176): spindown 10 Feb 16 12:16:18 Tower kernel: mdcmd (177): spindown 11 Feb 16 13:18:43 Tower kernel: mdcmd (178): spindown 1 Feb 16 13:18:44 Tower kernel: mdcmd (179): spindown 5 Feb 16 14:11:27 Tower in.telnetd[27488]: connect from 192.168.1.10 (192.168.1.10) Feb 16 14:11:31 Tower login[27489]: ROOT LOGIN on '/dev/pts/3' from '192.168.1.10' Quote Link to comment
DavejaVu Posted February 16, 2017 Share Posted February 16, 2017 Also - any suggestions on how to cleanly reboot at this point? Nothing seems to do anything: poweroff, powerdown, shutdown, reboot just give syslog messages: Feb 16 16:21:24 Tower shutdown[31150]: shutting down for system halt Feb 16 16:22:04 Tower in.telnetd[31777]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:22:10 Tower login[31779]: ROOT LOGIN on '/dev/pts/0' from '192.168.1.10' Feb 16 16:22:15 Tower shutdown[31948]: shutting down for system halt Feb 16 16:22:31 Tower in.telnetd[32156]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:22:35 Tower login[32158]: ROOT LOGIN on '/dev/pts/1' from '192.168.1.10' Feb 16 16:22:38 Tower root: /usr/local/sbin/powerdown has been deprecated Feb 16 16:22:38 Tower shutdown[32240]: shutting down for system halt Feb 16 16:23:36 Tower in.telnetd[659]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:23:39 Tower login[660]: ROOT LOGIN on '/dev/pts/4' from '192.168.1.10' Feb 16 16:23:49 Tower root: /usr/local/sbin/powerdown has been deprecated Feb 16 16:23:49 Tower shutdown[946]: shutting down for system reboot Feb 16 16:24:42 Tower in.telnetd[1878]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:24:46 Tower login[1879]: ROOT LOGIN on '/dev/pts/5' from '192.168.1.10' Feb 16 16:24:47 Tower shutdown[1947]: shutting down for system reboot Unless someone has an idea on how to deal with this, I'll have to go in via IPMI and reset the server, which I'm sure will mean a nice day-long parity check. Starting to regret this upgrade... Quote Link to comment
trurl Posted February 16, 2017 Share Posted February 16, 2017 Also - any suggestions on how to cleanly reboot at this point? Nothing seems to do anything: poweroff, powerdown, shutdown, reboot just give syslog messages: Feb 16 16:21:24 Tower shutdown[31150]: shutting down for system halt Feb 16 16:22:04 Tower in.telnetd[31777]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:22:10 Tower login[31779]: ROOT LOGIN on '/dev/pts/0' from '192.168.1.10' Feb 16 16:22:15 Tower shutdown[31948]: shutting down for system halt Feb 16 16:22:31 Tower in.telnetd[32156]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:22:35 Tower login[32158]: ROOT LOGIN on '/dev/pts/1' from '192.168.1.10' Feb 16 16:22:38 Tower root: /usr/local/sbin/powerdown has been deprecated Feb 16 16:22:38 Tower shutdown[32240]: shutting down for system halt Feb 16 16:23:36 Tower in.telnetd[659]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:23:39 Tower login[660]: ROOT LOGIN on '/dev/pts/4' from '192.168.1.10' Feb 16 16:23:49 Tower root: /usr/local/sbin/powerdown has been deprecated Feb 16 16:23:49 Tower shutdown[946]: shutting down for system reboot Feb 16 16:24:42 Tower in.telnetd[1878]: connect from 192.168.1.10 (192.168.1.10) Feb 16 16:24:46 Tower login[1879]: ROOT LOGIN on '/dev/pts/5' from '192.168.1.10' Feb 16 16:24:47 Tower shutdown[1947]: shutting down for system reboot Unless someone has an idea on how to deal with this, I'll have to go in via IPMI and reset the server, which I'm sure will mean a nice day-long parity check. Starting to regret this upgrade... There is a 90-second timeout before it kills everything if it has to. Not sure if that timeout gets reset whenever you keep banging on it as you have done here. Quote Link to comment
DavejaVu Posted February 16, 2017 Share Posted February 16, 2017 There is a 90-second timeout before it kills everything if it has to. Not sure if that timeout gets reset whenever you keep banging on it as you have done here. Appreciate the quick response. I issued a 'poweroff' that has been sitting in a window untouched (unbanged?) for 5 minutes now that isn't doing anything. Server doesn't want to die apparently. Any other ideas? Just another (minor) data point: even though the primary GUI is unresponsive, I have a half dozen containers that are all responding like nothing is happening. Quote Link to comment
John_M Posted February 16, 2017 Share Posted February 16, 2017 Are there any ReiserFS errors mentioned in the syslog? Quote Link to comment
DavejaVu Posted February 16, 2017 Share Posted February 16, 2017 No reiser messages in syslog after the initial boot stuff some days ago. But, your question adds to my suspicion that I need to go down to path of migrating to XFS rather soon. I did a reiserfsck on all the disks after a recent kernel oops (another thread) that came back clean. Looks like the system decided for me. While troubleshooting I did a 'lsof' that hung, and I wasn't able to log back in from either telnet or console, so I've reset and it's now checking that dirty, dirty filesystem. So much for that I guess. Still would like to understand why shfs decided to go crazy. Quote Link to comment
John_M Posted February 16, 2017 Share Posted February 16, 2017 I don't think anyone knows at the moment, but Lime Tech are well aware of the problem. Since the user file systems are presented as an aggregation of several different filesystems (/mnt/user/share = /mnt/cache/share + /mnt/disk1/share + /mnt/disk2/share + ... ) an issue affecting one of the components is going to have a bad effect on that user share. There are known problems with ReiserFS in recent kernels, in particular its handling of extended attributes seems to be broken. I don't have any Reiser-formatted disks but if I did I would be looking to migrate the data to XFS. Quote Link to comment
the_larizzo Posted February 17, 2017 Share Posted February 17, 2017 This is happening to me also but all my filesystems are XFS. My server with run for about 2 days then the load goes through the roof and I can no longer run any commands or shut off. I have to hold the power button to restart. Quote Link to comment
John_M Posted February 17, 2017 Share Posted February 17, 2017 This is happening to me also but all my filesystems are XFS. My server with run for about 2 days then the load goes through the roof and I can no longer run any commands or shut off. I have to hold the power button to restart. You may well have a different problem with similar symptoms so start a new thread and attach your diagnostics zip (Tools -> Diagnostics). Quote Link to comment
zeroryu Posted February 17, 2017 Share Posted February 17, 2017 i'm having similar issue: 100% shfs and all my shares/drives are reiserfs. poweroff, reboot are non responsive. the only option i can do is to manually turn off my server which will trigger the 10+ hours of parity check. On top of that, i'm also getting some errors when parity check finishes. tower-diagnostics-20170216-2125.zip Quote Link to comment
lionelhutz Posted February 18, 2017 Share Posted February 18, 2017 I upgraded to 6.3.1 and had the same thing happen last night. Definitely a new problem introduced since 6.2.4 never did this. unRAID pushed ReiserFS as the only filesystem for years so it's BS to now say that existing disks won't work and need to be changed to XFS. Quote Link to comment
Squid Posted February 18, 2017 Share Posted February 18, 2017 I upgraded to 6.3.1 and had the same thing happen last night. Definitely a new problem introduced since 6.2.4 never did this. unRAID pushed ReiserFS as the only filesystem for years so it's BS to now say that existing disks won't work and need to be changed to XFS. Not too much different than Windows XP effectively forcing you to change hard drives to NTFS from FAT32 Quote Link to comment
John_M Posted February 18, 2017 Share Posted February 18, 2017 Didn't Microsoft provide a tool that allowed FAT32 file systems to be converted to NTFS with files in situ without losing very many of them (meaning that quite a sizeable amount of free working space had to be present on the disk for the process to be successful)? OTOH it's hardly fair to blame LT for the fact that ReiserFS is no longer maintained. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.