Release: Folding@Home Docker


Recommended Posts

Found this post on LTT

 

root@fold8:~# cd /var/lib/fahclient/work/
root@fold8:/var/lib/fahclient/work# ls -alh
total 72K
drwxrwxrwx 3 fahclient root 4.0K Mar  9 19:00 .
drwxrwxr-x 6 fahclient root 4.0K Feb 28 20:36 ..
drwxrwxrwx 3 fahclient root 4.0K Mar  9 19:17 00
-rw-r--r-- 1 fahclient root  40K Mar  9 19:17 client.db
-rw-r--r-- 1 fahclient root  17K Mar  9 19:17 client.db-journal

Should be like this so not sure why it creates them as drwxr-xr-x.

Link to comment
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/my_init.d/10_syslog-ng.init...
Mar 17 01:53:56 e65ca64ccba1 syslog-ng[13]: syslog-ng starting up; version='3.13.2'
*** Running /etc/my_init.d/firstrun.sh...
Using existing config file.
*** Booting runit daemon...
*** Runit started as PID 26
Mar 17 01:53:57 e65ca64ccba1 cron[30]: (CRON) INFO (pidfile fd = 3)
Mar 17 01:53:57 e65ca64ccba1 cron[30]: (CRON) INFO (Running @reboot jobs)
01:53:57:INFO(1):Read GPUs.txt
01:53:57:************************* Folding@home Client *************************
01:53:57: Website: https://foldingathome.org/
01:53:57: Copyright: (c) 2009-2018 foldingathome.org
01:53:57: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
01:53:57: Args: --config /config/config.xml
01:53:57: Config: /config/config.xml
01:53:57:******************************** Build ********************************
01:53:57: Version: 7.5.1
01:53:57: Date: May 11 2018
01:53:57: Time: 19:59:04
01:53:57: Repository: Git
01:53:57: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
01:53:57: Branch: master
01:53:57: Compiler: GNU 6.3.0 20170516
01:53:57: Options: -std=gnu++98 -O3 -funroll-loops
01:53:57: Platform: linux2 4.14.0-3-amd64
01:53:57: Bits: 64
01:53:57: Mode: Release
01:53:57:******************************* System ********************************
01:53:57: CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
01:53:57: CPU ID: GenuineIntel Family 6 Model 45 Stepping 7
01:53:57: CPUs: 32
01:53:57: Memory: 125.93GiB
01:53:57:Free Memory: 829.36MiB
01:53:57: Threads: POSIX_THREADS
01:53:57: OS Version: 4.19
01:53:57:Has Battery: false
01:53:57: On Battery: false
01:53:57: UTC Offset: 0
01:53:57: PID: 31
01:53:57: CWD: /config
01:53:57: OS: Linux 4.19.98-Unraid x86_64
01:53:57: OS Arch: AMD64
01:53:57: GPUs: 1
01:53:57: GPU 0: Bus:66 Slot:0 Func:0 NVIDIA:6 GM200 [GeForce GTX Titan X] 6144
01:53:57: CUDA: Not detected: cuInit() returned 100
01:53:57: OpenCL: Not detected: clGetPlatformIDs() returned -1001
01:53:57:***********************************************************************
01:53:57:<config>
01:53:57: <!-- Client Control -->
01:53:57: <fold-anon v='true'/>

:
01:53:57: <!-- HTTP Server -->
01:53:57: <allow v='10.240.100.0/24'/>

:
01:53:57: <!-- Remote Command Server -->
01:53:57: <password v='*******'/>

:
01:53:57: <!-- User Information -->
01:53:57: <team v='227802'/>
01:53:57: <user v='joker169'/>

:
01:53:57: <!-- Web Server -->
01:53:57: <web-allow v='10.240.100.0/24'/>

:
01:53:57: <!-- Folding Slots -->
01:53:57: <slot id='0' type='CPU'/>
01:53:57: <slot id='1' type='GPU'/>
01:53:57:</config>
01:53:57:Trying to access database...
01:53:57:Successfully acquired database lock
01:53:57:Enabled folding slot 00: READY cpu:30
01:53:57:Enabled folding slot 01: READY gpu:0:GM200 [GeForce GTX Titan X] 6144
[91m01:53:57:ERROR:No compute devices matched GPU #0 NVIDIA:6 GM200 [GeForce GTX Titan X] 6144. You may need to update your graphics drivers.[0m
01:53:57:WU01:FS01:Starting
[91m01:53:57:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually[0m
01:53:57:WU00:FS00:Starting
01:53:57:WU00:FS00:Running FahCore: /opt/fah/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 31 -checkpoint 15 -np 30
01:53:57:WU00:FS00:Started FahCore on PID 40
01:53:57:WU00:FS00:Core PID:44
01:53:57:WU00:FS00:FahCore 0xa7 started
01:53:57:WU01:FS01:Starting
[91m01:53:57:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually[0m
01:53:58:WU00:FS00:0xa7:*********************** Log Started 2020-03-17T01:53:57Z ***********************
01:53:58:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
01:53:58:WU00:FS00:0xa7: Type: 0xa7
01:53:58:WU00:FS00:0xa7: Core: Gromacs
01:53:58:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 705 -lifeline 40 -checkpoint 15 -np 30
01:53:58:WU00:FS00:0xa7:************************************ CBang *************************************
01:53:58:WU00:FS00:0xa7: Date: Nov 5 2019
01:53:58:WU00:FS00:0xa7: Time: 06:06:57
01:53:58:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
01:53:58:WU00:FS00:0xa7: Branch: master
01:53:58:WU00:FS00:0xa7: Compiler: GNU 8.3.0
01:53:58:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
01:53:58:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
01:53:58:WU00:FS00:0xa7: Bits: 64
01:53:58:WU00:FS00:0xa7: Mode: Release
01:53:58:WU00:FS00:0xa7:************************************ System ************************************
01:53:58:WU00:FS00:0xa7: CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
01:53:58:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 45 Stepping 7
01:53:58:WU00:FS00:0xa7: CPUs: 32
01:53:58:WU00:FS00:0xa7: Memory: 125.93GiB
01:53:58:WU00:FS00:0xa7:Free Memory: 635.04MiB
01:53:58:WU00:FS00:0xa7: Threads: POSIX_THREADS
01:53:58:WU00:FS00:0xa7: OS Version: 4.19
01:53:58:WU00:FS00:0xa7:Has Battery: false
01:53:58:WU00:FS00:0xa7: On Battery: false
01:53:58:WU00:FS00:0xa7: UTC Offset: 0
01:53:58:WU00:FS00:0xa7: PID: 44
01:53:58:WU00:FS00:0xa7: CWD: /config/work
01:53:58:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
01:53:58:WU00:FS00:0xa7: Version: 0.0.18
01:53:58:WU00:FS00:0xa7: Author: Joseph Coffland <joseph@cauldrondevelopment.com>
01:53:58:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
01:53:58:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
01:53:58:WU00:FS00:0xa7: Date: Nov 5 2019
01:53:58:WU00:FS00:0xa7: Time: 06:13:26
01:53:58:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
01:53:58:WU00:FS00:0xa7: Branch: master
01:53:58:WU00:FS00:0xa7: Compiler: GNU 8.3.0
01:53:58:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
01:53:58:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
01:53:58:WU00:FS00:0xa7: Bits: 64
01:53:58:WU00:FS00:0xa7: Mode: Release
01:53:58:WU00:FS00:0xa7:************************************ Build *************************************
01:53:58:WU00:FS00:0xa7: SIMD: avx_256
01:53:58:WU00:FS00:0xa7:********************************************************************************
01:53:58:WU00:FS00:0xa7:Project: 14303 (Run 0, Clone 10, Gen 31)
01:53:58:WU00:FS00:0xa7:Unit: 0x000000219bf7a4d55e655fa9eda521bf
01:53:58:WU00:FS00:0xa7:Digital signatures verified
01:53:58:WU00:FS00:0xa7:Calling: mdrun -s frame31.tpr -o frame31.trr -cpi state.cpt -cpt 15 -nt 30
01:53:58:WU00:FS00:0xa7:Steps: first=15500000 total=500000
01:54:00:WU00:FS00:0xa7:Completed 106942 out of 500000 steps (21%)
01:54:36:WU00:FS00:0xa7:Completed 110000 out of 500000 steps (22%)
01:54:58:WU01:FS01:Starting
[91m01:54:58:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually[0m
01:55:30:WU00:FS00:0xa7:Completed 115000 out of 500000 steps (23%)
01:56:22:WU00:FS00:0xa7:Completed 120000 out of 500000 steps (24%)
01:56:35:WU01:FS01:Starting
[91m01:56:35:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually[0m
01:57:15:WU00:FS00:0xa7:Completed 125000 out of 500000 steps (25%)
01:58:08:WU00:FS00:0xa7:Completed 130000 out of 500000 steps (26%)
01:58:59:WU00:FS00:0xa7:Completed 135000 out of 500000 steps (27%)
01:59:12:WU01:FS01:Starting
[91m01:59:12:ERROR:WU01:FS01:Failed to start core: OpenCL device matching slot 1 not found, try setting 'opencl-index' manually[0m

I have been trying for 30 min plus to get this to use GPU, I know its "old" but...

Do I need to use the opencl-index?, if so where?

Link to comment
1 hour ago, SpaceInvaderOne said:

Made a quick video that may help people having difficulty setting up 

 

Hey spaceinvader, I didn't know about that GPU statistics plugin. Tried it, and it doesn't seem to work for me. I seem to remember blacklisting my GPUs at boot so that they can be passed through to VMs. Does that affect the plugin's ability to recognize them? My cards are a GTX-970 and a GTX-760.

 

Right now, I'm folding@home with CPU only. But I intend to bring my GPUs into it as well, as neither is used for the majority of most days.

Edited by cyberspectre
Link to comment
On 3/15/2020 at 12:29 PM, Bigpawpaww said:

I have two GPU's in my server.  How do i pick which GPU to pass to the docker?  No matter what I do it picks the wrong one.

I'm also using 2 GPUs (1080ti and 1050) and the GPU UUID I specify in the F@H docker container is from the 1080ti but listed in the container logs GPU0 is the 1050 and GPU1 is the 1080ti. Funny thing is that everything works fine setting up with the 1050 (I'm guessing, because it doesn't start throwing errors) until it assigns GPU0 (1050) work and it tries to do it. Trouble is that the 1050 is already given to the plex docker container. The F@H logs start throwing out a lot of warnings in hex and I shut the container down. Any ideas why it's using the 1050 instead of the 1080ti even when I'm using the right UUID in the docker template? Logs below, I didn't capture the parts in hex, this is before all that.

07:10:14: Author: Joseph Coffland <[email protected]>
07:10:14: Args: --config /config/config.xml
07:10:14: Config: /config/config.xml
07:10:14:******************************** Build ********************************
07:10:14: Version: 7.5.1

07:10:14: Date: May 11 2018
07:10:14: Time: 19:59:04
07:10:14: Repository: Git
07:10:14: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:10:14: Branch: master
07:10:14: Compiler: GNU 6.3.0 20170516
07:10:14: Options: -std=gnu++98 -O3 -funroll-loops
07:10:14: Platform: linux2 4.14.0-3-amd64
07:10:14: Bits: 64
07:10:14: Mode: Release
07:10:14:******************************* System ********************************
07:10:14: CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

07:10:14: CPU ID: GenuineIntel Family 6 Model 79 Stepping 1
07:10:14: CPUs: 12
07:10:14: Memory: 31.39GiB
07:10:14: Free Memory: 7.94GiB
07:10:14: Threads: POSIX_THREADS
07:10:14: OS Version: 4.19

07:10:14: Has Battery: false
07:10:14: On Battery: false
07:10:14: UTC Offset: 0
07:10:14: PID: 33
07:10:14: CWD: /config
07:10:14: OS: Linux 4.19.107-Unraid x86_64
07:10:14: OS Arch: AMD64
07:10:14: GPUs: 2
07:10:14: GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 LP] 1862
07:10:14: GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
07:10:14: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.2

07:10:14:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:440.59

07:10:14:***********************************************************************
07:10:14:<config>
07:10:14: <!-- Client Control -->
07:10:14: <fold-anon v='true'/>

:
07:10:14: <!-- HTTP Server -->
07:10:14: <allow v=‘IP/24'/>

:
07:10:14: <!-- Remote Command Server -->
07:10:14: <password v='***************************'/>

:
07:10:14: <!-- User Information -->
07:10:14: <team v='227802'/>
07:10:14: <user v=‘usernamegoeshere’/>

:
07:10:14: <!-- Web Server -->
07:10:14: <web-allow v=‘IP/24'/>

:
07:10:14: <!-- Folding Slots -->
07:10:14: <slot id='0' type='CPU'/>
07:10:14: <slot id='1' type='GPU'/>
07:10:14:</config>
07:10:14:Trying to access database...
07:10:14:Successfully acquired database lock
07:10:14:Enabled folding slot 00: READY cpu:10
07:10:14:Enabled folding slot 01: READY gpu:0:GP107 [GeForce GTX 1050 LP] 1862
07:10:14:WU00:FS00:Starting
07:10:14:WU00:FS00:Running FahCore: /opt/fah/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 33 -checkpoint 15 -np 10

07:10:14:WU00:FS00:Started FahCore on PID 44
07:10:14:WU00:FS00:Core PID:48
07:10:14:WU00:FS00:FahCore 0xa7 started
07:10:15:WU01:FS01:Connecting to 65.254.110.245:8080
07:10:15:WU00:FS00:0xa7:*********************** Log Started 2020-03-17T07:10:14Z ***********************
07:10:15:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
07:10:15:WU00:FS00:0xa7: Type: 0xa7
07:10:15:WU00:FS00:0xa7: Core: Gromacs
07:10:15:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 705 -lifeline 44 -checkpoint 15 -np 10

07:10:15:WU00:FS00:0xa7:************************************ CBang *************************************
07:10:15:WU00:FS00:0xa7: Date: Nov 5 2019
07:10:15:WU00:FS00:0xa7: Time: 06:06:57
07:10:15:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
07:10:15:WU00:FS00:0xa7: Branch: master
07:10:15:WU00:FS00:0xa7: Compiler: GNU 8.3.0
07:10:15:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
07:10:15:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
07:10:15:WU00:FS00:0xa7: Bits: 64
07:10:15:WU00:FS00:0xa7: Mode: Release
07:10:15:WU00:FS00:0xa7:************************************ System ************************************
07:10:15:WU00:FS00:0xa7: CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

07:10:15:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 79 Stepping 1
07:10:15:WU00:FS00:0xa7: CPUs: 12
07:10:15:WU00:FS00:0xa7: Memory: 31.39GiB
07:10:15:WU00:FS00:0xa7:Free Memory: 7.91GiB
07:10:15:WU00:FS00:0xa7: Threads: POSIX_THREADS
07:10:15:WU00:FS00:0xa7: OS Version: 4.19

07:10:15:WU00:FS00:0xa7:Has Battery: false
07:10:15:WU00:FS00:0xa7: On Battery: false
07:10:15:WU00:FS00:0xa7: UTC Offset: 0
07:10:15:WU00:FS00:0xa7: PID: 48
07:10:15:WU00:FS00:0xa7: CWD: /config/work
07:10:15:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
07:10:15:WU00:FS00:0xa7: Version: 0.0.18

07:10:15:WU00:FS00:0xa7: Author: Joseph Coffland <[email protected]>
07:10:15:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
07:10:15:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
07:10:15:WU00:FS00:0xa7: Date: Nov 5 2019
07:10:15:WU00:FS00:0xa7: Time: 06:13:26
07:10:15:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
07:10:15:WU00:FS00:0xa7: Branch: master
07:10:15:WU00:FS00:0xa7: Compiler: GNU 8.3.0
07:10:15:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
07:10:15:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
07:10:15:WU00:FS00:0xa7: Bits: 64
07:10:15:WU00:FS00:0xa7: Mode: Release
07:10:15:WU00:FS00:0xa7:************************************ Build *************************************
07:10:15:WU00:FS00:0xa7: SIMD: avx_256
07:10:15:WU00:FS00:0xa7:********************************************************************************
07:10:15:WU00:FS00:0xa7:Project: 14328 (Run 4, Clone 1358, Gen 3)
07:10:15:WU00:FS00:0xa7:Unit: 0x000000059bf7a4d65e6d10919d4b5148
07:10:15:WU00:FS00:0xa7:Digital signatures verified
07:10:15:WU00:FS00:0xa7:Calling: mdrun -s frame3.tpr -o frame3.trr -cpi state.cpt -cpt 15 -nt 10
07:10:15:WU00:FS00:0xa7:Steps: first=750000 total=250000
07:10:16:WU01:FS01:Assigned to work server 128.252.203.10
07:10:16:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP107 [GeForce GTX 1050 LP] 1862 from 128.252.203.10
07:10:16:WU01:FS01:Connecting to 128.252.203.10:8080
07:10:17:WU00:FS00:0xa7:Completed 2482 out of 250000 steps (0%)
07:10:18:WU00:FS00:0xa7:Completed 2500 out of 250000 steps (1%)

Edit: When looking at the Unraid-Nvidia settings page I was bothered that it listed GPU0 as the 1080ti and GPU1 as the 1050. So I thought I might try changing the NVIDIA_VISIBLE_DEVICES value inside the docker container to 0 (for GPU0 instead of the UUID) but it didn't change anything. How is it that the container has access to devices that are not listed in the template?

Edited by ClintWilkenson
additional information
Link to comment
3 hours ago, cyberspectre said:

Hey spaceinvader, I didn't know about that GPU statistics plugin. Tried it, and it doesn't seem to work for me. I seem to remember blacklisting my GPUs at boot so that they can be passed through to VMs. Does that affect the plugin's ability to recognize them? My cards are a GTX-970 and a GTX-760.

 

Right now, I'm folding@home with CPU only. But I intend to bring my GPUs into it as well, as neither is used for the majority of most days.

 

Not sure if those cards are still supported by the driver. You also need to install the nvidia plugin and download the nvidia build.

For the driver to work, you have to remove the blacklisting so the driver can load.

Link to comment

 

On 3/16/2020 at 8:38 PM, tjb_altf4 said:

Passkey will come eventually, I had mine turn up 12-24 hours later... servers are being crushed with requests from new folders... which is fantastic.

Same applies with handing out WU, they simply are struggling to keep up.

I wonder if the software trying to get WU very often is also causing issues as if i leave it running i get nothing but if i stop it overnight ill get a new WU straight away when restarted

Link to comment

I keep getting this error

Exception: Failed reading core package header.

 

I dont know what to make of it or how to fix. Any ideas or suggestions?

 

Full log below.

12:18:56:Trying to access database...
12:18:56:Successfully acquired database lock
12:18:56:Enabled folding slot 00: READY cpu:11
12:18:56:Enabled folding slot 01: READY gpu:0:GP107GL [Quadro P620]
12:18:56:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
12:18:56:WU01:FS01:Connecting to cores.foldingathome.org:80
12:18:56:WU00:FS00:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah
12:18:56:WU00:FS00:Connecting to cores.foldingathome.org:80
[91m12:18:57:ERROR:WU00:FS00:Exception: Failed reading core package header.[0m
[91m12:18:57:ERROR:WU01:FS01:Exception: Failed reading core package header.[0m
12:19:52:114:192.168.1.40:New Web connection
12:19:56:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
12:19:56:WU01:FS01:Connecting to cores.foldingathome.org:80
12:19:57:WU00:FS00:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah
12:19:57:WU00:FS00:Connecting to cores.foldingathome.org:80
[91m12:19:57:ERROR:WU00:FS00:Exception: Failed reading core package header.[0m
[91m12:19:59:ERROR:WU01:FS01:Exception: Failed reading core package header.[0m
12:21:34:WU01:FS01:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/Core_22.fah
12:21:34:WU01:FS01:Connecting to cores.foldingathome.org:80
[91m12:21:34:ERROR:WU01:FS01:Exception: Failed reading core package header.[0m
12:21:34:WU00:FS00:Downloading core from http://cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah
12:21:34:WU00:FS00:Connecting to cores.foldingathome.org:80
[91m12:21:34:ERROR:WU00:FS00:Exception: Failed reading core package header.[0m

Link to comment

watch out when restricting memory for the boinc docker container, if you specify too low a memory constraint (2GB in my case) then OOM killer will kick in resulting in the process getting getting killed and re-created, which then results in your your syslog filling up!. 

 

TLDR, dont set memory lower than 4GB.

Edited by binhex
Link to comment

An FYI for anyone else who didn't know, and is afraid this is going to turn their CPU into a toaster... you can limit CPU usage using

--cpus=x

in the "extra parameters" (found in advanced). x is a number correlated to CPU usage (honestly the exact correlation is not entirely clear to me, you can use decimals or integers - all I know is higher = more CPU, lower = less CPU). Either way toying with this seems to help greatly with keeping down the temperature and fans...

https://docs.docker.com/config/containers/resource_constraints/#cpu

Edited by scottpk
Link to comment

So I wanted to contribute some of my cpu/gpu time but all I seem to get it is this error, Am i missing something ??.

 

Quote

Mar 17 21:17:01 3025539a4c71 CRON[52]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
22:16:52:WU00:FS00:Connecting to 65.254.110.245:8080
22:16:52:WU01:FS01:Connecting to 65.254.110.245:8080
[93m22:16:53:WARNING:WU00:FS00:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration[0m
22:16:53:WU00:FS00:Connecting to 18.218.241.186:80
[93m22:16:53:WARNING:WU01:FS01:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration[0m
22:16:53:WU01:FS01:Connecting to 18.218.241.186:80
[93m22:16:53:WARNING:WU01:FS01:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration[0m
[91m22:16:53:ERROR:WU01:FS01:Exception: Could not get an assignment[0m
[93m22:16:54:WARNING:WU00:FS00:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration[0m
[91m22:16:54:ERROR:WU00:FS00:Exception: Could not get an assignment[0m

 

Link to comment
37 minutes ago, JaseNZ said:

So I wanted to contribute some of my cpu/gpu time but all I seem to get it is this error, Am i missing something ??.

 

 

I think there are a lot of us getting that same error.  I had everything up and running on Sunday, and processed several WUs on my two computers (including my Unraid box).  After Sunday evening, I haven't had any new WUs.  I gather that the influx of new contributors has outpaced the FaH infrastructure's ability to generate and process new WUs.  Essentially, there isn't enough work to go around.

 

I think the Rosetta project on BOINC is still providing work to users, you might look into that.

Link to comment
37 minutes ago, JaseNZ said:

So I wanted to contribute some of my cpu/gpu time but all I seem to get it is this error, Am i missing something ??.

If I'm not mistaken that just means it wasn't able to retrieve a unit of work. You might just need to wait, their infrastructure is under heavier load with the spike in interest from COVID-19.

Link to comment
8 hours ago, JaseNZ said:

So I wanted to contribute some of my cpu/gpu time but all I seem to get it is this error, Am i missing something ??.

 

7 hours ago, C4RBON said:

I gather that the influx of new contributors has outpaced the FaH infrastructure's ability to generate and process new WUs.  Essentially, there isn't enough work to go around.

 

To both: if you have waited more than 8 hours and still no work, restarting the docker / app (if you use windows) may help. The server is indeed overwhelmed by the public responses, complicating the matter with a bug that can cause f@h to be stuck in a loop if it receives http error.

Obviously, don't restart the docker all the time because you will worsen the matter by giving extra unnecessary load to the already overwhelmed servers.

 

And of course, it's even better if you can run both BOINC and f@h at the same time.

I recommend 2 BOINC projects: Rosetta and World Community Grid.

Between the 2 of them, you should have plenty of work to do (not necessarily everything contributing to COVID-19 but certainly to other good medical causes).

Rosetta has already participated in the COVID-19 research (and released news article of their contribution)..

WCG has also released a statement that they are reviewing COVID-19 related projects to add.

 

Link to comment

After having been folding most of the week, I'm not getting the following errors in my logs:

Any ideas?

22:37:52:WU01:FS00:0xa7:ERROR:

22:37:52:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

22:37:52:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown

22:37:52:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902

22:37:52:WU01:FS00:0xa7:ERROR:

22:37:52:WU01:FS00:0xa7:ERROR:Fatal error:

22:37:52:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm

22:37:52:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings

22:37:52:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition

22:37:52:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS

22:37:52:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors

22:37:52:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

22:37:57:WU01:FS00:0xa7:WARNING:Unexpected exit() call

22:37:57:WU01:FS00:0xa7:WARNING:Unexpected exit from science code

 

Link to comment
14 hours ago, testdasi said:

 

 

To both: if you have waited more than 8 hours and still no work, restarting the docker / app (if you use windows) may help. The server is indeed overwhelmed by the public responses, complicating the matter with a bug that can cause f@h to be stuck in a loop if it receives http error.

Obviously, don't restart the docker all the time because you will worsen the matter by giving extra unnecessary load to the already overwhelmed servers.

 

And of course, it's even better if you can run both BOINC and f@h at the same time.

I recommend 2 BOINC projects: Rosetta and World Community Grid.

Between the 2 of them, you should have plenty of work to do (not necessarily everything contributing to COVID-19 but certainly to other good medical causes).

Rosetta has already participated in the COVID-19 research (and released news article of their contribution)..

WCG has also released a statement that they are reviewing COVID-19 related projects to add.

 

My desktop GPU has been working all day, but still nothing on my Unraid server.  Restarted the docker and after a few attempts it finally got some work to do.  Thanks for the tip.

Link to comment

I managed to get my gpu passed through without breaking the web gui, however one of my cpu cores is pegged.  Is there a way to force the computations to be GPU-only?  My server only has four cores, and I'd rather not impact its primary functionality.  Here's my configuration:

 

Quote

<config>
  <!-- Client Control -->
  <fold-anon v='false'/>

  <!-- Folding Slot Configuration -->
  <gpu v='TRUE'/>

  <!-- HTTP Server -->
  <allow v='192.168.1.2/24'/>

  <!-- Remote Command Server -->
  <password v='PASSWORD'/>

  <!-- Slot Control -->
  <power v='MEDIUM'/>

  <!-- User Information -->
  <passkey v=''/>
  <team v='227802'/>
  <user v='my_name'/>

  <!-- Web Server -->
  <web-allow v='192.168.1.2/24'/>

  <!-- Folding Slots -->
  <slot id='1' type='GPU'/>
</config>

 

Thanks for your help!

Link to comment
6 hours ago, joebot said:

I managed to get my gpu passed through without breaking the web gui, however one of my cpu cores is pegged.  Is there a way to force the computations to be GPU-only?  My server only has four cores, and I'd rather not impact its primary functionality.  Here's my configuration:

You can't. That pegged core is feeding data to the GPU.

You can add --cpu-shares=64 to the fah docker extra parameters box (advanced view) and pin a single core to the docker. That way the core would still run at 100% but when there are other dockers needing the power, those dockers will be prioritised over fah.

If you have important VM that needs to run then you isolate those VM cores and not let fah use it.

 

 

 

 

Link to comment
6 hours ago, joebot said:

I managed to get my gpu passed through without breaking the web gui, however one of my cpu cores is pegged.  Is there a way to force the computations to be GPU-only?  My server only has four cores, and I'd rather not impact its primary functionality.  Here's my configuration:

 

 

Thanks for your help!

it need the cpu to pass work to the gpu.

you can limit the docker images for using cpu cores, just assign 1 core to it.

also setting in the 'extra parameters': --memory=Xg --cpu-shares=1  (X = how much GB is allowed)

so you can also limit the RAM use of it ;)

Link to comment
On 3/17/2020 at 3:39 AM, ClintWilkenson said:

I'm also using 2 GPUs (1080ti and 1050) and the GPU UUID I specify in the F@H docker container is from the 1080ti but listed in the container logs GPU0 is the 1050 and GPU1 is the 1080ti. Funny thing is that everything works fine setting up with the 1050 (I'm guessing, because it doesn't start throwing errors) until it assigns GPU0 (1050) work and it tries to do it. Trouble is that the 1050 is already given to the plex docker container. The F@H logs start throwing out a lot of warnings in hex and I shut the container down. Any ideas why it's using the 1050 instead of the 1080ti even when I'm using the right UUID in the docker template? Logs below, I didn't capture the parts in hex, this is before all that.


07:10:14: Author: Joseph Coffland <[email protected]>
07:10:14: Args: --config /config/config.xml
07:10:14: Config: /config/config.xml
07:10:14:******************************** Build ********************************
07:10:14: Version: 7.5.1

07:10:14: Date: May 11 2018
07:10:14: Time: 19:59:04
07:10:14: Repository: Git
07:10:14: Revision: 4705bf53c635f88b8fe85af7675557e15d491ff0
07:10:14: Branch: master
07:10:14: Compiler: GNU 6.3.0 20170516
07:10:14: Options: -std=gnu++98 -O3 -funroll-loops
07:10:14: Platform: linux2 4.14.0-3-amd64
07:10:14: Bits: 64
07:10:14: Mode: Release
07:10:14:******************************* System ********************************
07:10:14: CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

07:10:14: CPU ID: GenuineIntel Family 6 Model 79 Stepping 1
07:10:14: CPUs: 12
07:10:14: Memory: 31.39GiB
07:10:14: Free Memory: 7.94GiB
07:10:14: Threads: POSIX_THREADS
07:10:14: OS Version: 4.19

07:10:14: Has Battery: false
07:10:14: On Battery: false
07:10:14: UTC Offset: 0
07:10:14: PID: 33
07:10:14: CWD: /config
07:10:14: OS: Linux 4.19.107-Unraid x86_64
07:10:14: OS Arch: AMD64
07:10:14: GPUs: 2
07:10:14: GPU 0: Bus:2 Slot:0 Func:0 NVIDIA:7 GP107 [GeForce GTX 1050 LP] 1862
07:10:14: GPU 1: Bus:1 Slot:0 Func:0 NVIDIA:8 GP102 [GeForce GTX 1080 Ti] 11380
07:10:14: CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:6.1 Driver:10.2

07:10:14:OpenCL Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:1.2 Driver:440.59

07:10:14:***********************************************************************
07:10:14:<config>
07:10:14: <!-- Client Control -->
07:10:14: <fold-anon v='true'/>

:
07:10:14: <!-- HTTP Server -->
07:10:14: <allow v=‘IP/24'/>

:
07:10:14: <!-- Remote Command Server -->
07:10:14: <password v='***************************'/>

:
07:10:14: <!-- User Information -->
07:10:14: <team v='227802'/>
07:10:14: <user v=‘usernamegoeshere’/>

:
07:10:14: <!-- Web Server -->
07:10:14: <web-allow v=‘IP/24'/>

:
07:10:14: <!-- Folding Slots -->
07:10:14: <slot id='0' type='CPU'/>
07:10:14: <slot id='1' type='GPU'/>
07:10:14:</config>
07:10:14:Trying to access database...
07:10:14:Successfully acquired database lock
07:10:14:Enabled folding slot 00: READY cpu:10
07:10:14:Enabled folding slot 01: READY gpu:0:GP107 [GeForce GTX 1050 LP] 1862
07:10:14:WU00:FS00:Starting
07:10:14:WU00:FS00:Running FahCore: /opt/fah/usr/bin/FAHCoreWrapper /config/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 00 -suffix 01 -version 705 -lifeline 33 -checkpoint 15 -np 10

07:10:14:WU00:FS00:Started FahCore on PID 44
07:10:14:WU00:FS00:Core PID:48
07:10:14:WU00:FS00:FahCore 0xa7 started
07:10:15:WU01:FS01:Connecting to 65.254.110.245:8080
07:10:15:WU00:FS00:0xa7:*********************** Log Started 2020-03-17T07:10:14Z ***********************
07:10:15:WU00:FS00:0xa7:************************** Gromacs Folding@home Core ***************************
07:10:15:WU00:FS00:0xa7: Type: 0xa7
07:10:15:WU00:FS00:0xa7: Core: Gromacs
07:10:15:WU00:FS00:0xa7: Args: -dir 00 -suffix 01 -version 705 -lifeline 44 -checkpoint 15 -np 10

07:10:15:WU00:FS00:0xa7:************************************ CBang *************************************
07:10:15:WU00:FS00:0xa7: Date: Nov 5 2019
07:10:15:WU00:FS00:0xa7: Time: 06:06:57
07:10:15:WU00:FS00:0xa7: Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
07:10:15:WU00:FS00:0xa7: Branch: master
07:10:15:WU00:FS00:0xa7: Compiler: GNU 8.3.0
07:10:15:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
07:10:15:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
07:10:15:WU00:FS00:0xa7: Bits: 64
07:10:15:WU00:FS00:0xa7: Mode: Release
07:10:15:WU00:FS00:0xa7:************************************ System ************************************
07:10:15:WU00:FS00:0xa7: CPU: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

07:10:15:WU00:FS00:0xa7: CPU ID: GenuineIntel Family 6 Model 79 Stepping 1
07:10:15:WU00:FS00:0xa7: CPUs: 12
07:10:15:WU00:FS00:0xa7: Memory: 31.39GiB
07:10:15:WU00:FS00:0xa7:Free Memory: 7.91GiB
07:10:15:WU00:FS00:0xa7: Threads: POSIX_THREADS
07:10:15:WU00:FS00:0xa7: OS Version: 4.19

07:10:15:WU00:FS00:0xa7:Has Battery: false
07:10:15:WU00:FS00:0xa7: On Battery: false
07:10:15:WU00:FS00:0xa7: UTC Offset: 0
07:10:15:WU00:FS00:0xa7: PID: 48
07:10:15:WU00:FS00:0xa7: CWD: /config/work
07:10:15:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
07:10:15:WU00:FS00:0xa7: Version: 0.0.18

07:10:15:WU00:FS00:0xa7: Author: Joseph Coffland <[email protected]>
07:10:15:WU00:FS00:0xa7: Copyright: 2019 foldingathome.org
07:10:15:WU00:FS00:0xa7: Homepage: https://foldingathome.org/
07:10:15:WU00:FS00:0xa7: Date: Nov 5 2019
07:10:15:WU00:FS00:0xa7: Time: 06:13:26
07:10:15:WU00:FS00:0xa7: Revision: 490c9aa2957b725af319379424d5c5cb36efb656
07:10:15:WU00:FS00:0xa7: Branch: master
07:10:15:WU00:FS00:0xa7: Compiler: GNU 8.3.0
07:10:15:WU00:FS00:0xa7: Options: -std=c++11 -O3 -funroll-loops -fno-pie
07:10:15:WU00:FS00:0xa7: Platform: linux2 4.19.0-5-amd64
07:10:15:WU00:FS00:0xa7: Bits: 64
07:10:15:WU00:FS00:0xa7: Mode: Release
07:10:15:WU00:FS00:0xa7:************************************ Build *************************************
07:10:15:WU00:FS00:0xa7: SIMD: avx_256
07:10:15:WU00:FS00:0xa7:********************************************************************************
07:10:15:WU00:FS00:0xa7:Project: 14328 (Run 4, Clone 1358, Gen 3)
07:10:15:WU00:FS00:0xa7:Unit: 0x000000059bf7a4d65e6d10919d4b5148
07:10:15:WU00:FS00:0xa7:Digital signatures verified
07:10:15:WU00:FS00:0xa7:Calling: mdrun -s frame3.tpr -o frame3.trr -cpi state.cpt -cpt 15 -nt 10
07:10:15:WU00:FS00:0xa7:Steps: first=750000 total=250000
07:10:16:WU01:FS01:Assigned to work server 128.252.203.10
07:10:16:WU01:FS01:Requesting new work unit for slot 01: READY gpu:0:GP107 [GeForce GTX 1050 LP] 1862 from 128.252.203.10
07:10:16:WU01:FS01:Connecting to 128.252.203.10:8080
07:10:17:WU00:FS00:0xa7:Completed 2482 out of 250000 steps (0%)
07:10:18:WU00:FS00:0xa7:Completed 2500 out of 250000 steps (1%)

Edit: When looking at the Unraid-Nvidia settings page I was bothered that it listed GPU0 as the 1080ti and GPU1 as the 1050. So I thought I might try changing the NVIDIA_VISIBLE_DEVICES value inside the docker container to 0 (for GPU0 instead of the UUID) but it didn't change anything. How is it that the container has access to devices that are not listed in the template?

I think I figured out how to run the 2nd GPU

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.