Release: Folding@Home Docker


Recommended Posts

On 3/17/2020 at 2:15 AM, saarg said:

 

Not sure if those cards are still supported by the driver. You also need to install the nvidia plugin and download the nvidia build.

For the driver to work, you have to remove the blacklisting so the driver can load.

Oh, okay. I'll keep running it with CPU only, then.

 

There's no way to only get COVID19 work units, is there?

Link to comment
10 minutes ago, cyberspectre said:

Oh, okay. I'll keep running it with CPU only, then.

 

There's no way to only get COVID19 work units, is there?

F@H (and probably BOINC now) are prioritizing covid WU's.  (With F@H, you select that you're fighting for any disease)  But, do to the massive surge in people folding and doing their part, WUs on both platforms for covid aren't always available.  BOINC (which has internationally lower adoptance than F@H) seems to never run out of WUs in the interim, whereas F@H seems to be always running short of everything.

 

Net result is that I'm running both as they are complimentary researches and not mutually exclusive.

Link to comment

<!-- Folding Slots -->
  <slot id='0' type='CPU'>
    <paused v='true'/>
  </slot>
  <slot id='3' type='GPU'>
    <paused v='true'/>
  </slot>
  <slot id='2' type='GPU'>
 

This is the entry I put into the Folding Slots, it shows both of my cards but only my 1070 only works. Now I can't tell you why it does that and how to choose one or the other. All I know it works now....Good Luck.

Link to comment
3 hours ago, IGOBYD said:

<!-- Folding Slots -->
  <slot id='0' type='CPU'>
    <paused v='true'/>
  </slot>
  <slot id='3' type='GPU'>
    <paused v='true'/>
  </slot>
  <slot id='2' type='GPU'>
 

This is the entry I put into the Folding Slots, it shows both of my cards but only my 1070 only works. Now I can't tell you why it does that and how to choose one or the other. All I know it works now....Good Luck.

Definitely worked! Check the log below:

 

05:08:55: <!-- Folding Slots -->
05:08:55: <slot id='0' type='CPU'>
05:08:55: <paused v='true'/>
05:08:55: </slot>
05:08:55: <slot id='3' type='GPU'>
05:08:55: <paused v='true'/>
05:08:55: </slot>
05:08:55: <slot id='2' type='GPU'/>
05:08:55:</config>
05:08:55:Trying to access database...
05:08:55:Successfully acquired database lock
05:08:55:Enabled folding slot 00: PAUSED cpu:9 (by user)
05:08:55:Enabled folding slot 03: PAUSED gpu:0:GP107 [GeForce GTX 1050 LP] 1862 (by user)
05:08:55:Enabled folding slot 02: READY gpu:1:GP102 [GeForce GTX 1080 Ti] 11380
 

Link to comment

1420911743_Schermafdrukvan2020-03-2122-11-26.png.95f7ae21bc41c6f1cd9ab22220b033fe.png

 

2 GPU's assign to FaH, bold are working, no cpu slot enabled (it still use some cpu cores to pass work to the gpu's, for my setup its using 4 cores +/-75%)

config.xml:

  <slot id='0' type='GPU'/>
  <slot id='1' type='GPU'/>

Docker setting, NVIDIA_VISIBLE_DEVICES:

all

if you have a nvidia gpu assign to a VM make it disabled for FaH! or the system will crash.

I'm not really sure how to do that (i have an AMD card) but if i understand it correct you can change the syslinux configuration:

pci-stub.ids=xxxx:xxxx,xxxx:xxxx (IOMMU id, graphic/vga and sound of it!), after change reboot system.

Link to comment
On 3/18/2020 at 6:39 PM, DayspringGaming said:

After having been folding most of the week, I'm not getting the following errors in my logs:

Any ideas?


22:37:52:WU01:FS00:0xa7:ERROR:

22:37:52:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

22:37:52:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown

22:37:52:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902

22:37:52:WU01:FS00:0xa7:ERROR:

22:37:52:WU01:FS00:0xa7:ERROR:Fatal error:

22:37:52:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.37225 nm

22:37:52:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings

22:37:52:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition

22:37:52:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS

22:37:52:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors

22:37:52:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

22:37:57:WU01:FS00:0xa7:WARNING:Unexpected exit() call

22:37:57:WU01:FS00:0xa7:WARNING:Unexpected exit from science code

 

I have the same issue. 

Link to comment

So I have setup everything now the docker claims 40+k points and just the first 9k are registered. 

 

Also, if I restart the container it gets reassigned and starts folding if I leave it be, nothing happens. 

Very different behavior compared to my windows machine folding. 

 

Getting a few warnings and errors:

ERROR:WU03:FS01:Exception: Failed to connect to 40.114.52.201:80: Connection timed out

ERROR:WU03:FS01:Exception: Could not get an assignment

WU02:FS01:Sending unit results: id:02 state:SEND error:NO_ERROR project:11779 run:0 clone:8658 gen:5 core:0x22 unit:

 

is this expected?

 

 

Link to comment

How is this possible? 

 

874843349_Screenshot2020-03-24at23_34_41.png.64214e5a90f03e6e6163c0ad0f918368.png

 

when set up for a Nvidia GPU? It is a AMD Radeon RX580 in the server as well. The log file show both GPU`s, but it is grabbing GPU:0, and not GPU1: which is the Nvidia.

 

Any thoughts for fixing this?

 

 

 

799319844_Screenshot2020-03-24at23_49_02.thumb.png.4a38d61ba8b5ef994cd32f7f7163cb39.png

 

 

 

<config>
  <!-- Client Control -->
  <fold-anon v='true'/>

  <!-- Folding Slot Configuration -->
  <gpu v='true'/>

  <!-- HTTP Server -->
  #Following allows access from local network
  <allow v='10.X.X.0/24'/>

  <!-- Remote Command Server -->
  #Change password for remote access
  <password v='PASSWORD'/>

  <!-- User Information -->
  #Change Team Number and Username if desired. Currently folding for UnRAID team!
  <team v='227802'/> <!-- Your team number (Team UnRAID is # 227802)-->
  <user v='frodm'/> <!-- Enter your user name here -->
  <passkey v='xxxxxxxxxxxxxxxxxxxxx'/> <!-- 32 hexadecimal characters if provided (Get one here: http://fah-web.stanford.edu/cgi-bin/getpasskey.py)-->

  <!-- Web Server -->
  #Following allows access from local network
  <web-allow v='10.X.X.0/24/24'/>

  <!-- CPU Use -->
  <power v='medium'/> 
  
  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <slot id='1' type='GPU'/>
</config>

 

 

FaH logfile.rtf

Edited by frodr
Link to comment

I was able to get both GPU`s and CPU up in the control panel. I added to the folding slots:

 

 <!-- Folding Slots -->
  <slot id='0' type='GPU'/>
  <slot id='1' type='GPU'/>
  <slot id='2' type='CPU'/>
</config>

 

Two questions:

 

1) How to disable GPU:0? 

Removing slot id=0 is not possible. Removing id=1 and only the AMD card is visible in the Web Control.

 

2) The Folding keeps stopping, and like now, it is not starting again. Any thoughts?

 

 

1625846568_Screenshot2020-03-26at16_53_29.png.d2bfdadf72b7782ee772b2cb003cef43.png

 

Logg:

Mar 26 15:17:01 a26d1e0fc4ed CRON[40]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
15:39:00:WU01:FS02:Connecting to 65.254.110.245:8080
[93m15:39:00:WARNING:WU01:FS02:Failed to get assignment from '65.254.110.245:8080': No WUs available for this configuration[0m
15:39:00:WU01:FS02:Connecting to 18.218.241.186:80
[93m15:39:01:WARNING:WU01:FS02:Failed to get assignment from '18.218.241.186:80': No WUs available for this configuration[0m
[91m15:39:01:ERROR:WU01:FS02:Exception: Could not get an assignment[0m

Link to comment
On 3/25/2020 at 6:51 AM, frodr said:

Any thoughts for fixing this?

AMD card wont work in the docker, it doesn't have the drivers present in unraid OS.

The Nvidia card can work with the lsio nvidia build of unraid (which adds drivers and nvidia runtime), however if you want both cards working, create a vm for them with a couple of cores, and retain the docker for CPU only.

Edited by tjb_altf4
Link to comment
On 3/29/2020 at 10:41 AM, tjb_altf4 said:

AMD card wont work in the docker, it doesn't have the drivers present in unraid OS.

The Nvidia card can work with the lsio nvidia build of unraid (which adds drivers and nvidia runtime), however if you want both cards working, create a vm for them with a couple of cores, and retain the docker for CPU only.

I'm aware of AMD not working. But FoldingatHome Docker is picking up this GPU for uknown reasons. If I set up according to SIO's video and correct nvidia plugin setup, the FoldingatHome Docker picks up only the AND GPU. First when I added another GPU slot, the Nvidia GPU showed in the Control Panel.

 

Now I want to get rid of the AMD in the Control Panel. And to get the docker working.

Link to comment
  • 2 weeks later...

Hi!

 

Since a few weeks ago I'm running the docker for F@H and I got some issues... the first probably related to the docker, the second probably not: any help on the matter is appreciated! :)


1. Impossible to cleanup work folders

From time to time F@H has trouble cleaning up the work folder since a "fuse" file there and it's not removable by the application...

 

Here the error:

15:52:51:WU00:FS00:Cleaning up
15:52:51:ERROR:WU00:FS00:Exception: Failed to remove directory './work/00': boost::filesystem::remove: Directory not empty: "./work/00"

Here the content:

# v work/00/
total 6868
-rw-r--r-- 0 nobody users 7029760 Apr  9 17:36 .fuse_hidden0000a8d90000004b

Here the "lsof":

# lsof work/00/.fuse_hidden0000a8d90000004b 
COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF              NODE NAME
FAHCoreWr 11585 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b
FahCore_a 11589 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b

I can manually force the deletion but it would be preferable that the system was able to do it autonomously... :)

Any idea?

 

2. WU not compatible with CPU?

At the beginning I used to get a lot of "No WUs available for this configuration" and some "gromacs" errors...

 Here an example of the "gromacs" error:

05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
05:39:44:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
05:39:44:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
05:39:44:WU01:FS00:0xa7:ERROR:
05:39:44:WU01:FS00:0xa7:ERROR:Fatal error:
05:39:44:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
05:39:44:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
05:39:44:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
05:39:44:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
05:39:44:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

After a not-so-deep research on various forums I got that the number of available "cpus" can determine the ability of the application to organize the job... also there was some comment about using only multipliers of 6 for this "cpus" parameter... maybe I misunderstood something but I applied that workaround and limited the numbers of "cpus" that F@H can use directly from within the config.xml:

<config>
  <!-- Folding Slot Configuration -->
  <cpus v='18'/>

  <!-- Slot Control -->
  <power v='FULL'/>

  <!-- User Information -->
  <passkey v='***********'/>
  <team v='***********'/>
  <user v='***********'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <!-- slot id='1' type='GPU'/ -->
</config>

This enabled me to start folding.

 

A note: the UnRaid machine is an AMD 3950X (16C/32T).

Another note: I also have a "service" GPU (GeForce GT 730) that never received a job so I disabled it to remove it from the UI.


Can anyone confirm this behavior?

Edited by sirfaber
Link to comment
On 4/9/2020 at 12:29 PM, sirfaber said:

2. WU not compatible with CPU?

At the beginning I used to get a lot of "No WUs available for this configuration" and some "gromacs" errors...

 Here an example of the "gromacs" error:


05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
05:39:44:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
05:39:44:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
05:39:44:WU01:FS00:0xa7:ERROR:
05:39:44:WU01:FS00:0xa7:ERROR:Fatal error:
05:39:44:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
05:39:44:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
05:39:44:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
05:39:44:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
05:39:44:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------

After a not-so-deep research on various forums I got that the number of available "cpus" can determine the ability of the application to organize the job... also there was some comment about using only multipliers of 6 for this "cpus" parameter... maybe I misunderstood something but I applied that workaround and limited the numbers of "cpus" that F@H can use directly from within the config.xml:


<config>
  <!-- Folding Slot Configuration -->
  <cpus v='18'/>

  <!-- Slot Control -->
  <power v='FULL'/>

  <!-- User Information -->
  <passkey v='***********'/>
  <team v='***********'/>
  <user v='***********'/>

  <!-- Folding Slots -->
  <slot id='0' type='CPU'/>
  <!-- slot id='1' type='GPU'/ -->
</config>

This enabled me to start folding.

 

A note: the UnRaid machine is an AMD 3950X (16C/32T).

Another note: I also have a "service" GPU (GeForce GT 730) that never received a job so I disabled it to remove it from the UI.


Can anyone confirm this behavior?

 

 

I tried this and it did not work for me. I'm still getting the following in my logs for this particular work unit, all others worked fine:


ERROR:There is no domain decomposition for 10 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm

 

I'm on an AMD Ryzen R9 3900X (12C/24T). I edited the /config/confix.xml file in the docker container with vim while it was running. Tried restarting after editing the file, still same error.

Link to comment
On 4/9/2020 at 12:29 PM, sirfaber said:

1. Impossible to cleanup work folders

From time to time F@H has trouble cleaning up the work folder since a "fuse" file there and it's not removable by the application...

 

Here the error:


15:52:51:WU00:FS00:Cleaning up
15:52:51:ERROR:WU00:FS00:Exception: Failed to remove directory './work/00': boost::filesystem::remove: Directory not empty: "./work/00"

Here the content:


# v work/00/
total 6868
-rw-r--r-- 0 nobody users 7029760 Apr  9 17:36 .fuse_hidden0000a8d90000004b

Here the "lsof":


# lsof work/00/.fuse_hidden0000a8d90000004b 
COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF              NODE NAME
FAHCoreWr 11585 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b
FahCore_a 11589 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b

I can manually force the deletion but it would be preferable that the system was able to do it autonomously... :)

Any idea?

 

Did you ever discover how to correct this?

 

I installed the docker container today and this is happening on every WU.

 

EDIT: They eventually clean themselves up. I did nothing and after a few more WU it was able to clean up after itself.

 

Edited by draeh
Link to comment
  • 2 months later...
On 4/22/2020 at 9:48 PM, draeh said:

 

Did you ever discover how to correct this?

 

I installed the docker container today and this is happening on every WU.

 

EDIT: They eventually clean themselves up. I did nothing and after a few more WU it was able to clean up after itself.

 

Nope.

Actually I stopped looking at F@H logs altogether :P

It's working and crunching data and that's what matters.

Link to comment

Now that there is an option to fold specifically for Covid-19 in the menu's; is there an update that can push this functionality out to docker instances? It is available in the client on windows, but when i try to manage and change it to covid through the dropdown menu in the basic view, it doesn't show as an option. As well, I get an error when attempting the same change via the "advanced view" (running on my main win box), which i enabled remote access to change my docker settings on my unraid machine.

Link to comment

Does anybody have an idea why my F@H will not use more than 2 CPU cores when I have 24 available? I can't figure out why it is not using the rest of them. I am using the latest LSIO docker.

 

I have tried all of the following below with no success:

  • New installation with default configuration, no CPU pinning
  • Changing power from MEDIUM to FULL
  • CPU pinning to specific cores (CPU 2 - 10 HT; 18 total)
  • Specifying the number of cores (18) in config.xml

When I look in the F@H container log I can see that it recognizes my CPU correctly as it shows the following:

Quote

01:30:28:******************************* System ********************************
01:30:28: CPU: Intel(R) Xeon(R) CPU X5680 @ 3.33GHz
01:30:28: CPU ID: GenuineIntel Family 6 Model 44 Stepping 2
01:30:28: CPUs: 24
01:30:28: Memory: 47.23GiB
01:30:28:Free Memory: 275.46MiB
01:30:28: Threads: POSIX_THREADS

 

Here is what my config.xml file looks like. Any advice or ideas would be very much appreciated I am pretty stumped.

Quote

01:30:28:***********************************************************************
01:30:28:<config>
01:30:28: <!-- Folding Slot Configuration -->
01:30:28: <cause v='COVID_19'/>
01:30:28: <cpus v='18'/>

:
01:30:28: <!-- HTTP Server -->
01:30:28: <allow v='172.16.1.0/24'/>

:
01:30:28: <!-- Slot Control -->
01:30:28: <power v='FULL'/>

:
01:30:28: <!-- User Information -->
01:30:28: <passkey v='*****'/>
01:30:28: <team v='*****'/>
01:30:28: <user v='*****'/>

:
01:30:28: <!-- Web Server -->
01:30:28: <web-allow v='172.16.1.0/24'/>

:
01:30:28: <!-- Folding Slots -->
01:30:28: <slot id='0' type='CPU'>
01:30:28: <paused v='true'/>
01:30:28: </slot>
01:30:28:</config>

 

FAH_cpu.png

Link to comment
  • 5 weeks later...

I've been using the docker container to fold with 4 GPUs / no CPU. Everything seem to be working well but I've noticed that the container seems to use a CPU core for each GPU slot, and each CPU core it uses is pinned at 100% utilization. Is anyone else getting similar behavior?

 

I realize the GPUs have to be fed data to fold, but it seems like that shouldn't take up 100% of a core. The CPUs are Xeon x5690s, so not exactly new, but not slouches either. Can anyone offer any thoughts? Am I misunderstanding something in how all this works?

Link to comment
1 hour ago, Execut1ve said:

I've been using the docker container to fold with 4 GPUs / no CPU. Everything seem to be working well but I've noticed that the container seems to use a CPU core for each GPU slot, and each CPU core it uses is pinned at 100% utilization. Is anyone else getting similar behavior?

 

I realize the GPUs have to be fed data to fold, but it seems like that shouldn't take up 100% of a core. The CPUs are Xeon x5690s, so not exactly new, but not slouches either. Can anyone offer any thoughts? Am I misunderstanding something in how all this works?

That's normal. The CPU thread is used to load data to and from the GPU and it's a substantial amount of data to load.

That's why it's important to ensure you pin the right cores for the F@H docker to prevent lag to the important stuff.

Link to comment
12 hours ago, testdasi said:

That's normal. The CPU thread is used to load data to and from the GPU and it's a substantial amount of data to load.

That's why it's important to ensure you pin the right cores for the F@H docker to prevent lag to the important stuff.

Hm, I wonder if I'd notice a hit to folding performance if I assigned the container 2 cores and 2 hyperthreads instead of 4 cores? Time for some experimentation!

Link to comment

After some informal experimentation, I'm not seeing much difference (if any) in my total PPD between allocating the container 4 cores (with nothing on the HTs) VS allocating 2 cores with their 2 HTs.

 

For reference I'm folding on 4 GPUs: 3 of the Zotac 1060 mining variants and 1 GTX960. They are connected to the mainboard via powered PCIE riser cables. Two of them are in x8 slots and two are in x4 slots. All the PCIE slots are Gen 2. The computer is a PowerEdge R710 server with dual Xeon X5690 processors. I'm averaging 800k-1M total PPD, with each card sitting in the 200k-250k range. I don't notice any substantial difference between the cards on the x4 slots vs the x8 slots.

 

Can anyone else with a hyperthreaded CPU offer any observations?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.