sirfaber

Members
  • Posts

    7
  • Joined

  • Last visited

Posts posted by sirfaber

  1. On 4/22/2020 at 9:48 PM, draeh said:

     

    Did you ever discover how to correct this?

     

    I installed the docker container today and this is happening on every WU.

     

    EDIT: They eventually clean themselves up. I did nothing and after a few more WU it was able to clean up after itself.

     

    Nope.

    Actually I stopped looking at F@H logs altogether :P

    It's working and crunching data and that's what matters.

  2. Hi!

     

    Since a few weeks ago I'm running the docker for F@H and I got some issues... the first probably related to the docker, the second probably not: any help on the matter is appreciated! :)


    1. Impossible to cleanup work folders

    From time to time F@H has trouble cleaning up the work folder since a "fuse" file there and it's not removable by the application...

     

    Here the error:

    15:52:51:WU00:FS00:Cleaning up
    15:52:51:ERROR:WU00:FS00:Exception: Failed to remove directory './work/00': boost::filesystem::remove: Directory not empty: "./work/00"
    

    Here the content:

    # v work/00/
    total 6868
    -rw-r--r-- 0 nobody users 7029760 Apr  9 17:36 .fuse_hidden0000a8d90000004b
    

    Here the "lsof":

    # lsof work/00/.fuse_hidden0000a8d90000004b 
    COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF              NODE NAME
    FAHCoreWr 11585 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b
    FahCore_a 11589 nobody    8r   REG   0,41  7029760 10977524093294902 work/00/.fuse_hidden0000a8d90000004b
    

    I can manually force the deletion but it would be preferable that the system was able to do it autonomously... :)

    Any idea?

     

    2. WU not compatible with CPU?

    At the beginning I used to get a lot of "No WUs available for this configuration" and some "gromacs" errors...

     Here an example of the "gromacs" error:

    05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
    05:39:44:WU01:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
    05:39:44:WU01:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
    05:39:44:WU01:FS00:0xa7:ERROR:
    05:39:44:WU01:FS00:0xa7:ERROR:Fatal error:
    05:39:44:WU01:FS00:0xa7:ERROR:There is no domain decomposition for 20 ranks that is compatible with the given box and a minimum cell size of 1.4227 nm
    05:39:44:WU01:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
    05:39:44:WU01:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
    05:39:44:WU01:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
    05:39:44:WU01:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
    05:39:44:WU01:FS00:0xa7:ERROR:-------------------------------------------------------
    

    After a not-so-deep research on various forums I got that the number of available "cpus" can determine the ability of the application to organize the job... also there was some comment about using only multipliers of 6 for this "cpus" parameter... maybe I misunderstood something but I applied that workaround and limited the numbers of "cpus" that F@H can use directly from within the config.xml:

    <config>
      <!-- Folding Slot Configuration -->
      <cpus v='18'/>
    
      <!-- Slot Control -->
      <power v='FULL'/>
    
      <!-- User Information -->
      <passkey v='***********'/>
      <team v='***********'/>
      <user v='***********'/>
    
      <!-- Folding Slots -->
      <slot id='0' type='CPU'/>
      <!-- slot id='1' type='GPU'/ -->
    </config>

    This enabled me to start folding.

     

    A note: the UnRaid machine is an AMD 3950X (16C/32T).

    Another note: I also have a "service" GPU (GeForce GT 730) that never received a job so I disabled it to remove it from the UI.


    Can anyone confirm this behavior?