SOAR is a framework which automates the execution of groups of jobs. These jobs form a DAG (directed acyclic graph), are run under Condor, and SOAR assembles the results within single directory. For information on Condor visit: http://www.cs.wisc.edu/condor
The system is flexible, such that it can run all data sets at once, or it can keep track of completions, running new data sets on a periodic basis. It reports on the progress by both logging information and progressively plotting of the progress of the run in terms of how many jobs are running and how many are ready to run. The system becomes adapted and personalized for each project.
A single tie in so a program runs under SOAR does some things for you. It maps a location for data sources to a program. It make an easy way for you to either feed in new data sources or to create a new version of the program to run against the existing data sources. The third primary location(the tie to soar) is a set of scripts which fetch the data sources to the location needed for each job within the DAG and scrapes the expected results to the results location.
SOAR is currently intended and written to be managed by one person or possible several through a shared account. The product is young but reliable and the focus has been on tools to easily automate research. In time work will be done to explore making the various systems of software play nicely in the area of multiple users and groups.
Most projects once adapted to SOAR sit in a mode where additional jobs are automatically run as additional datasets are placed in that projects data area. Multiple people can run the same research by simply having a derived project thus establishing their own data and result locations via their own portion of the web interface. SOAR uses Condor's ability to run jobs on any sort of periodic basis needed. If the project needs a sweep for new data every 4 hours, it is trivial.
SOAR is generally managed by a single person who watches over disk consumption, adapts new research to SOAR projects, assist with special runs and expediting software changes for projects that rapidly change the science they are doing on the data.
No matter how one starts the runs, the web interface allows four things:
There are some distinct advantages to using SOAR. And it very similar to having a folder with data(each in a distinct loaction) and a single submit file. The following are provided by SOAR only:
Each project will have its own unique steps leading to automation and maintenance of the automated jobs. Accounting for the unique nature of jobs, each initial SOAR setup will follow these steps:
Steps 3, 4, and 5 will be repeated for each new project to be automated.
The first part to installing SOAR is deciding where to place things. SOAR requires a minimum of 2 installation locations and one URL up to four loactions and three URLS. This is a combination of limiting what the web interface can access combined with allowing for large amounts of disk to be consumed between initial datasets for projects and copious resulting data from a system allowing for an ease of doing research not easily reachable before. See Configurable locations below.
NOTE: SOAR is designed to leverage Condor_dagman within a Condor environment. The run location must be on local disk to avoid file locking issues with most shared file systems.
To effectively do multiple datasets in any system you want your application to either accept command line arguments and read parameters from one or more files. Anything hard coded into the program you are running can not be easily varied.
The SOAR system utilizes and organizes various components related to a project's runs. As these components may be kept in a variety of locations, the file fsconfig identifies the component locations. Each component listed is also the directory name. Within the directory, each project may have its own subdirectory for project-specific versions of the component.
Additionally, each project can have its data location specified in this file in the format:
Project_name,Location_holding _data_in _project_name_folder
Soar is told where to find datasets for your jobs. These will be folders with unique names with the variable data for your jobs or folders named datasetXXXXXXX which will contain the unique job folders. The code and jobs folders under sources contain the unchanging parts for the first and the glue scripts in the second.
This directory gets all the sources to make your job run. It also gets the results of compiling if that is something your job needs be it Matlab or some regular computer language. For security purposes it must have an .htaccess file or the code will not be placed where it needs to be for the job to find it. This is to ensure that the sources on the web are only accessible by authorized persons.
Normally all files in the code directory are copied to the submit location where the job is started. However any file listed in the file SKIP will not be moved.
Another file called BLACKLIST must exist. It contains a name which starts with a number, a colon the word blacklist and then a reason within [ ]. Here follows an example:
1000: blacklist [ condensation ]
This directory holds the glue scripts which adapt SOAR to handling the data sets and the code of your research. The glue scripts tie together the data sets to whatever processing you want to do.
A basic job consists of a single node which submits to a pool and then another analysis job which could be run if desired based on results. A faulty start can have us execute a null piece of work for the first node and we usually do a null follow-up node. After all the jobs have run we have a report node, an optional clean node, an after the report mode which preps the data collected if we are delivering it and a push the data node.
There are a number of scripts that run before or after which can or should be customized:
All the template files are filled in with variable data.
There are some additional files which allow extra features.
The most convenient way to do production with SOAR is to place entries in a file which condor manages the frequency of. There are two included(per_runs and per_plotsandreports). Per_runs fires off the commands in continuous.cron once a day and per_plotsandreports fires off the commands in checkprogress.com every 5 minutes. Setting this up is as easy as placing what you want done similar to the sample files and submitting them with condor_submit(condor_submit per_runs). The first usually contains usage of the control.pl script and --kind=new so tracking of datasets already done has only new datasets run. The second as of version 0.7.5 is done for you. Every run gets an entry added when the run is started and is removed when the run completes. This allows us to use the information in the report system to accurately move jobs which were running to complete. This information allows the next run to search out all currently running jobs and not start them again.
If you need to remove a run, you must use the soar_rm.pl script to both extract this set of reports from the recheck interval or you waste the cycles on reports on a completed/removed run.
This way once you tie a project to do a particular job, you or the person doing the research only need to worry about creating more data sets into the image location for the project and pull results from the result location.
Input data for jobs in folders or folders in datasets are located either in the directory specified by IMAGERUNS is control/fsconfig, or in a location specified by the project name in the same location. Lets say your data is expected to be in your home in a directory rundata. So job data for project redapple would be in /home/me/rundata/redapple.
The way code replacement works is that anything placed in /home/me/rundata/redapple_objs is compared against the age of the current files for the currently requested version of your workflow code. Newer code from that location is inserted into your work flow. You have some control though its minimal at this point. The following four attributes can be set in /home/me/rundata/redapple_objs/objconfig. The contents of this file will only be active if files are found to update.
(maybe you usually run with --kind=new but you want to run all your data with the new code. This will run every data set you have.)
You want to set up for rerunning all of your data but want to inspect results from just 2 runs first.
The changes are substantial enough that you want v5 to now become v6 and you will run it as v6(Soar does that for you).
If, when doing a code update, one wants to run a few jobs they can
be separated by commas. If the jobs are not in a dataset then they would
be listed like this:
whitelist = job1,job2,job3,job4
If the jobs are in a dataset, it would be like this:
whitelist = dataset_XXXXXXX-job1,dataset_XXXXXXX-job2
This will fire off one job for each job in the dataset. (USING WHITELIST AND DATASET AT THE SAME TIME WILL BRING UNEXPECTED RESULTS. USE ONE OR THE OTHER)
So we don't build software but the update works because we replace a binary with a binary or an R script with a newer one or we even replace Matlab scripts/programs. In the last case all the scripts used by these two M files will be Matlab compiled before your jobs are run. At this point this is the only processing we do. Soar is not a build system but matlab jobs can not be run unless we compile them.
DATASET: A file is located in the input location for the project called DATASET. The dataset named in this file will be entered as if "--datasets=name" had been entered and --kind will be set to "new".
./control.pl --project=gravitropism --version=v3 --limit=10 --kind=oneoff --clean=CLThe above will run an arbitrary 10 jobs from the available data sets for project gravitropism version v3 and after will clean the run directory some and clear any use of the tarcache space. Results and data sets are not touched.
./control.pl --project=gravitropism --version=v3 --kind=white --white=/tmp/whitelistThe above will run a predefined set of your data against a particular version of a particular project.
./control.pl --project=gravitropism --preversion=v3 --version=v4 --kind=install --code=/tmp/code.tar.gzThis script ...
./control.pl --project=gravitropism --version=v3 --kind=nightly # switches ENVVARS fileThis script ...
./control.pl --project=gravitropism --version=v3 --kind=newThis is the most basic use. Run only data sets of this project and version which have not been run yet. If run once a day, new data can be prepared and placed in the projects data location and the new data sets will all be run.
./control.pl --project=gravitropism --version=v3 --kind=white --white=/tmp/whitelistThis script ...
./control.pl --preproject=natehighres --preversion=v1 --project=DanLewisArabidopsis --version=v1 --kind=install
./control.pl --project=gravitropism --version=v3 --kind=install --type=param.datThis script ...
Each project has a master file ENVVARS in the run directory, which is used various things. It is changed out only for --kind=nightly and --kind=new for. However, if it comes out of sync or to make it the last "oneoff", the command is
./control.pl --project=gravitropism --updatewith one of the following additional command line options:
--kind=installto set up a new version of code and compile
--kind=nightlyto change the production code
--kind=newto do all the new data sets since the ones run by --kind=nightly
--kind='other text'to do a full run, unless limited with 'Other Text' labeling the web page for the run
The framework usually runs this script at end of a job run.
Its command line options are
./status.pl --project=gravitropism --kind=profile --lastCreate the profile plot for the last or currently running set of jobs for this project.
./status.pl --project=gravitropism --kind=profile --env=run15523_8_24_2008Create a profile plot for a particular run.
./status.pl --project=gravitropism --kind=summary --lastCreate the report for the last or currently running set of jobs for this project.
./status.pl --kind=summary --project=gravitropism --env=run6823_8_26_200Create a report for a particular run.
./status.pl --kind=whitelist --project=med_mc2 --env=run12700_10_10_2009 --match="fail \[No result files\]"Create a white list from the section of the report labeled "fail [No result files]"
When a run is started you are givin an environment strings which describes where the data for that particular run is. "soar_rm.pl" uses this to both find the actual job id to use with "condor_rm", but also uses it to remove the run specific periodic "status.pl" runs which update the plots and reports at some interval.
You probably have a project running in SOAR but you want one of the following changes:
Base a new project on the current project.
Let say you have a project positioning and the best version of it is v3. Your new user is sam and you want it to be called sams_liquids.
Generate a new version of the code.
This is very easy to do. If you make a new version you can name it something meaningful to mimic why you created it. The new version now has access to all the project data allowing you to change the science you are doing. If the project is positioning and the old version is v1 and you want the new version to be surfaces then go to "sources/positioning" and enter this command: cp -r v1 surfaces.
Now simply place new binaries in "sources/positioning/surfaces/code".
NOTE if you change the behavior and files needed or created you'll likely need need to change the scripts "prejob.pl, postjob.pl and pushdata.pl".
The goal of the web interface is to inspect a run of data sets done as a single Condor DAG. One gets access to the run directory for the DAG which has a subdirectory for each job, to the plot showing current progress of that DAG, a report which breaks out aspects of each job which has currently ended, and allows access to results for each job in the DAG when it completes.
This file sits at the top most level and has sample links to projects.php. These two files make it possible to have links created from the file RUNS in the top of the projects run directory which allow the links and access described above.