GlideinWMS The Glidein-based Workflow Management System

Search Results

Glidein Recipes

Overview

Jump to:
  1. Overview

Description

This recipe is designed to give an example on how to configure a factory and frontend to submit user jobs to a batch cluster via BOSCO.

Requirement Description
A functioning glideinwms factory The factory should be completely configured and functioning for Grid submissions. The main reason for this is to be able to be assured that the factory is running and works before we do any configuration for BOSCO.
A functioning glideinwms frontend The frontend should be completely configured and functioning for Grid submissions. The same reasoning for the factory applies here.
A valid BOSCO resource Valid, current, enabled account to access a submit host and submit to the cluster. Specifically, you need the private and public ssh keys are needed for submission. Then you can add the resource by invoking the "bosco_cluster --add" command. This can be invoked from any host but we suggest to do it form the Frontend so that you don't need to transfer the ssh keys. See the BOSCO manual for more information on adding a BOSCO resource.

Example BOSCO Factory Entry

<entry name="BOSCO_TEST_carver" 
    auth_method="key_pair" 
    enabled="True" 
    gatekeeper="cmsuser@carvergrid.nersc.gov"
    gridtype="batch pbs" 
    schedd_name="fermicloud199.fnal.gov" 
    trust_domain="bosco" 
    verbosity="std" 
    work_dir="AUTO">

    <config>
        <max_jobs glideins="3" held="2" idle="1">
            <max_job_frontends></max_job_frontends>
        </max_jobs>
        <release max_per_cycle="20" sleep="0.2"/>
        <remove max_per_cycle="5" sleep="0.2"/>
        <restrictions require_glidein_glexec_use="False" require_voms_proxy="False"/>
        <submit cluster_size="10" max_per_cycle="100" sleep="0.2"/>
    </config>
    <allow_frontends></allow_frontends>

    <attrs>
        <attr name="CONDOR_ARCH" const="True" glidein_publish="False" job_publish="False"
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="CONDOR_OS" const="True" glidein_publish="False"  job_publish="False" 
              parameter="True" publish="False" type="string" value="default"/>
        <attr name="GLEXEC_BIN" const="True" glidein_publish="False" job_publish="False" 
              parameter="True" publish="True" type="string" value="NONE"/>
        <attr name="GLIDEIN_Site" const="True" glidein_publish="True" job_publish="True" 
              parameter="True" publish="True" type="string" value="BOSCO_PBS"/>
        <attr name="USE_CCB" const="False" glidein_publish="True" job_publish="False" 
              parameter="True" publish="True" type="string" value="True"/>
        <attr name="X509_CERT_DIR" const="True" glidein_publish="False" job_publish="True" 
              parameter="True" publish="True" type="string" value="/osg/certificates"/>
     </attrs>
    <files></files>
    <submit_attrs>
        <submit_attr name="+remote_queue" value='"serial"'/>
    <submit_attrs>
    <infosys_refs></infosys_refs>
    <monitorgroups></monitorgroups>
</entry>
            

The important pieces of the entry stanza listed above are listed below:

Name Type Value Description
auth_method Element attribute for <entry> "key_pair"

The key pair in this case refers to the ssh keypair installed to access the BOSCO resource (remote cluster submit host).

See Factory Configuration for a complete description.

gatekeeper Element attribute for <entry> "cmsuser@carvergrid.nersc.gov"

The gatekeeper attribute in the BOSCO case is the username and hostname used by the user to login to the cluster and submit jobs.

See Factory Configuration for a complete description.

gridtype Element attribute for <entry> "batch pbs"

It must be the keyword "batch" followed by the batch system used in the cluster (must be one supported by HTCondor/BOSCO, e.g pbs, condor, lsf, sge.

See Factory Configuration for a complete description.

trust_domain Element attribute for <entry> "bosco"

The trust domain can be any arbitrary value. Both the factory and the frontend must be configured to use the same value of the trust_domain. In this example, "bosco" is the arbitrary value.

See Factory Configuration for a complete description.

work_dir Element attribute for <entry> "AUTO"

The working directory that the pilot starts up in can be any one supported by the remote cluster or batch system.

See Factory Configuration for a complete description.

glideins Element attribute for <max_jobs> "3"

This is a hard limit for the number of glideins that the factory will submit to the remote batch system. For testing purposes this example was restricted to 3 running VMs.

See Factory Configuration for a complete description.

held Element attribute for <max_jobs> "1"

This is a limit for the number of glideins requests that can be in held state. If the number of held requests match this number, the factory will stop asking for more. For purposes of testing, this number was set extremely low.

See Factory Configuration for a complete description.

idle Element attribute for <max_jobs> "1"

This is a limit for the number of glideins requests that can be in idle state. Ordinarily, this attribute is used to determine "pressure" at a grid site.

See Factory Configuration for a complete description.

submit_attr Element <submit_attr> -

This element is used to specify RSL equivalent info for gt2/gt5. Name and value of the submit attribute configured will be put in the glidein's JDL before submission. For example, the above configuration shows how to configure glidein submission to a specific remote queue and will result in the following line in the glidein's JDL.

+remote_queue = "serial"

See Factory Configuration for a complete description.

Example BOSCO Frontend Configuration

This only configuration for the frontend in this example is for the credential setup. The credential setup can be included in the group credential definition or in the global credential definition.

<credential absfname="/path/to/grid_proxy"
            security_class="frontend"
            trust_domain="OSG"
            type="grid_proxy">
<credential absfname="/path/to/bosco_key.rsa.pub"
            keyabsfname="/path/to/bosco_key.rsa"
            security_class="frontend"
            trust_domain="bosco"
            type="key_pair"> 

The important pieces of the credential stanza listed above are listed below:

Name Type Value Description
absfname Element attribute for <credential> "/path/to/grid_proxy"

This is the full path to the file containing the grid proxy used to identify the glidein with the Frontend

See Frontend Configuration for a complete description.

absfname Element attribute for <credential> "/path/to/bosco_key.rsa.pub"

This is the full path to the file containing the publik key installed on the BOSCO resource to allow ssh access

See Frontend Configuration for a complete description.

keyabsfname Element attribute for <credential> "/path/to/bosco_key.rsa"

This is the full path to the file containing the secret key used to access the BOSCO resource via ssh

See Frontend Configuration for a complete description.

security_class Element attribute for <credential> "frontend"

This is the security class that is defined for the other credentials on this frontend

See Frontend Configuration for a complete description.

trust_domain Element attribute for <credential> "bosco"

The trust domain can be any arbitrary value. Both the factory and the frontend must be configured to use the same value of the trust_domain. In this example, "bosco" is the arbitrary value.

See Frontend Configuration for a complete description.

type Element attribute for <credential> "key_pair"

The key pair in this case refers to the public and secret keys that can be used to ssh to the BOSCO resource submit host.

This must match the value specified in the factory for the credentials to be matched properly

See Frontend Configuration for a complete description.

pilotabsfname Element attribute for <credential> "/path/to/pilot_proxy"

A proxy for the pilot is required in all cases, even if proxies are not used to authenticate on the gatekeeper. This is because the proxy is used to establish secure communication between the pilot and the user collector.

See Frontend Configuration for a complete description.