Overview
Each supported computational engine within WebMO utilizes a series of files including:
- Data files (engine configuration, job templates)
- HTML files (web pages for using and administering engine)
- Perl scripts (parse HTML forms, run engine, parse engine output)
Adding a new computational engine to WebMO simply requires creating the requisite set of files. No modifications of the 'main' WebMO source code is required. The presence of the additional computational engine is automatically deduced by the presence of engine-specific configuration files (details below). These files are dynamically loaded and used by WebMO when engine-specific functions are required.
clone_engine.pl
The script clone_interface.pl in the WebMO.install/scripts directory is designed to assist in creating the new set of files for another engine. It clones a computational chemistry engine with a new name, e.g., uses "gaussian" to create "gaussian2".
Select the original engine most similar to the new engine for which you wish to create a new interface. Clone the new engine. Follow the instructions to hand-edit the new engine *.int file and update the text as appropriate. You should then see entries for both engines, e.g., "gaussian" and "gaussian2" everywhere, from "Choose Computational Engine" to the "Interface Manager".
This script is useful both for installing two versions of the same engine and for providing a nice starting point for interfaces to new programs.
Conventions
For the remainder of this document, the following conventions are used:
- $engine - The short non-descriptive name of the engine (lowercase, <= 10 characters)
- $cgiBase - The WebMO cgi-bin directory
- $htmlBase - The WebMO HTML source directory
- $userBase - The WebMO user directory tree, where user jobs are stored
Required Files
Before giving a detailed description of each of the various files, it is useful to list and give a brief description of each of the required elements.
File type | File name | Location | Description |
Data | $engine.int | $cgiBase/interfaces | The computational engine configuration file; contains information specific to that particular computational engine |
Data | $engine.tmpl | $cgiBase/interfaces | The job template file, from which calculation types are defined, and input files are generated |
HTML | $engine.html | $htmlBase | Defines the various HTML form elements that determine the available job options |
HTML (v12.1+) | $engine.js | $htmlBase/javascript | Associated javascript specific to the job options |
HTML | ${engine}mgr_admin.html | $htmlBase | For the web-based configuration of the engine |
Script | $engine.cgi | $cgiBase | Parses the HTML form data associated with $engine.html |
Script | run_$engine.cgi | $cgiBase | Handles the details of executing the computational engine |
Script | ${engine}mgr_admin.cgi | $cgiBase | Facilitates the web-based configuration of the computational engine |
Script | parse_$engine.cgi | $cgiBase | Parses the raw text output of the engine to a standard WebMO format |
Although this list looks rather intimidating, in fact, most of the required files are trivial to generate through simple modifications of the corresponding file from a pre-existing computational engine. Usually, only minor modifications are required, e.g., changing variable names, or command line arguments to the computational engine. These modifications are usually possible even for someone who has only a basic understanding of HTML and Perl
The exception is in the case of the parse_$engine.cgi file, which handles the (sometimes complex) parsing of the text output generated by the computational engine. In general, this can be a complicated process, which frequently requires a detailed understanding of Perl and regular expressions. However, it is still frequently possible to find an example where an existing computational engine structures its output in a similar manner, and work from that starting point. Yet sometimes it is necessary to start from scratch.
Data Files
$engine.int
This configuration file, location in the $cgiBase/interfaces directory, determines all of the saved configuration options that WebMO utilizes to run the computational engine. This existence of this file is also what WebMO uses to dynamically determine that an engine is supported; if this file is removed, or its extension is changed, WebMO will not utilize the computational engine.
In particular, this file may save the directory of the engine, the path to the executable, version number, etc. Each variable is stored in the format, variable=value. For example,
nwchemVersion="5.0"
The value must be surrounded by quotation marks if it is a string (and can optionally be quoted for numeric values as well). The contents of this file is arbitrary, but should be restricted to engine-specific configuration variables. Although the names of the variables is also arbitrary, convention dictates that the variable names should be prefixed with the name of the computational engine, to prevent conflicts among engines.
Along with these arbitrary user-define variables, the configuration file also defines a few 'required' variables, that are used throughout WebMO.
- interfacesDescription{'$engine'} - Defines the description of the engine as it appears on the 'Choose Computational Engine' page.
- interfacesDescription{'${engine}_admin'} - Defines the description of the engine as it appears in the administrative 'Interface Manager'.
- interfacesDescription{'${engine}_name'} - The "full" name of the engine, displayed in the Job Manager, etc. In constrast to the "short" name defined in the $engine variable, this name can be arbitrarily long, mixed case, etc.
- nodesMin, nodesMax, nodeDefault, ppnMin, ppnMax, ppnDefault - All these variables should be set to 1 by default, as they are used to determine the minimum and maximum numbers of processors and nodes that can be allocated to a job. Unless parallel jobs are supported, this will always be 1. Even if parallel jobs are supported, this values should only be changed from 1 by the end user.
- nodeTypeDefault - This value should be set to "" (blank string) by default, as it determines the default node type (PBS sense) that a job can be allocated to. This is to be configured, if applicable, by the end user.
$engine.tmpl
The job template file, located in the $cgiBase/interfaces directory, defines the various calculation types supported by this particular computational engine. This file is divided into section, and each section corresponds to a particular calculation type that is supported by this engine. This file is then used to build the Job Options page, and create appropriate input file. The format of the files is as follows:
- The first line of each section of the file gives the job name, as it will appear in the $engine 'Job Options' calculation drop-down.
- The second and subsequent lines of each section give the contents of the corresponding input file for this type of calculation. This section can contain various conditional statements and variable substitutions, as explained below.
- Each section of the template file is then terminated with a line of equal signs ('=')
The bulk of the template file consists of input files for each calculation type. Obvious the exact contents of the input file varies from job to job (i.e. different geometry, charge, basis set, etc.) For this reason, the input file contains a variety of variables, each which begins with a dollar sign. The existence of such a variable in the template triggers an variable expansion, replacing the variable name by its contents.
Many such variables are defined by WebMO, and list of the standard ones can be found in the standard WebMO documentation. In general, each variable in the template corresponds to an HTML form variable of the same name, defined in the corresponding 'Job Options' page ($engine.html). Thus, it is possible to add variables specific to your particular computational engine by defining a field on HTML form of the job options page; a corresponding variable of the same name will automatically made available in the job template.
The input file can also contain conditional expressions, which greatly enhances the power and flexibility of the template system. As of the current version of WebMO, the Perl 'Template Toolkit' package is utilized. This package is quite powerful and flexible, and quite simple to use. The syntax for these conditional statements is rather simple, and the reader is referred to any one of the existing template files for a large number of examples. Complete documentation for the Template Toolkit is available online at:
http://template-toolkit.org/
HTML Files
$engine.html
This HTML file contains the source of the 'Job Options' page associated with the computational engine. It is strongly suggested that you simply copy / modify the source from an existing engine, both for reasons of consistency and expediency.
Change all occurences of the engine name to reflect that of the new engine. Also change the list of available job options to reflect the capabilities of the engine. The VALUES of the options in the drop-down boxes may need to be altered to reflect the keywords used by that particular program.
In general, no other changes should be required.
$engine.js
This Javascript file contains the necessary Javascript of the 'Job Options' page associated with the computational engine. It is strongly suggested that you simply copy / modify the javascript from an existing engine, both for reasons of consistency and expediency.
There are only a few elements which likely need to be modified:
- Change all occurences of the engine name to reflect that of the new engine.
- In the 'SubmitJob' JavaScript subroutine, change the name of the geometry format to reflect a format compatible with your engine. For example, if your program reads Gaussian-like z-matrices, you could specify 'GaussianFormat', etc. This is best format is determined by examining the existing computational engines to find one that matches the geometry format expected by the new engine. If an appropriate format cannot be found, it is possible to either modify an existing format via Perl (not ideal), or you will have to contact WebMO to have a translator written for the format (as the Editor source code is not publicly available).
- If special 'dynamic' featues are required (e.g. enabling a drop down box only under certain circumstances), Javascript event handlers may need to be added / changed.
In general, no other changes should be required.
${engine}mgr_admin.html
This HTML file contains the source of the 'Interface Manager' page associated with the computational engine. It is strongly suggested that you simply copy / modify the source from an existing engine, both for reasons of consistency and expediency.
There are only a few elements which likely need to be modified:
- Change all occurrences of the engine name to reflect that of the new engine.
- Change the list of available fields to reflect the entries in the engine configuration files ($engine.int).
- You may (optionally) implement a 'suggest' button to help with the configuration of the program (as done with Gaussian and several other engines). This is not recommended.
Script Files
The following Perl scripts define the code necessary for WebMO to interface with the new computational engine. Each file contains a variety of engine-specific subroutines that are used by WebMO to run the engine jobs, and process the output. Make sure to follow the naming conventions EXACTLY, as WebMO dynamically loads the required modules BY NAME; deviations from the convention will result in errors.
$engine.cgi
As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine. The following subroutines are required:
- process_${engine}_output - Defines a 'properties' array that defines the properties that will be parsed for this job. This properties will correspond to parsing subroutines in the parse_${engine}.cgi file. And calls the 'super-class' function process_engine_output with the following arguments: job number, engine name, output file suffix, and array of properties.
- submit_${engine}_job - Call the 'super-class' function submit_engine_job with the following arguments: engine name, input file suffix, output file suffix.
- do_${engine}_presubmit - This subroutine is normally blank, but can be used to handle any items that need to occur immediately before the job is submitted for execution. (For example, copying the checkpoint file from a previous job to the job directory, etc.)
- import_${engine}_job - Call the 'super-class' function import_engine_job with the following arguments: job number, path to imported job file, engine name, output file suffix (will be created from given imported file)
- read_${engine}_form_data - Call the 'super-class' function read_engine_form_data. By default, this reads EVERY field from the HTML form on the job options page into a variable of the same name. It is also possible to define NEW variables (i.e. variables that did not appear on the HTML form) at this point. In order to make those new variables available for use in the job templates, you must call this add_sandboxed_var with the name of the variable, and a reference to the desired contents of the variable.
run_${engine}.cgi
As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine.
The most important aspects to mention here is that environmental variables can be set by modifying the perl hash %ENV. This is often important when setting up an environment to execute the engine.
Engines are run by first fork()ing a new process, writing the PID of the child process (which eventually will contain the engine) to a file, and then exec()ing the computational engine. This process allows WebMO to obtain the PID of the computational engine process so that it can be monitored by the WebMO daemon.
When the engine is executed, it is vital to direct the program to read from the appropriate input file, and write to the appropriate output file. By convention, the input file is 'input.inp' (the extension can be modified, as determined in $engine.cgi) and 'output.log', where these files are located in the $userBase directory, in the tree corresponding to the current user. These file locations are typically passed to the computational program on the command line, but conventions vary.
${engine}mgr_admin.cgi
As above, it is recommended to copy / modify and existing example, changing the any appearances of the engine name (including in subroutine names!) to reflect the new engine.
Beyond updating appearances of the engine name, almost no changes should be required. The default minimalist implementation simply updates the variables in the engine configuration file to reflect that changes made through the 'Interface Manager' web-based configuration tool.
parse_${engine}.cgi
As above, it is recommended to copy / modify and existing example. However, unlike in most of the above cases, dramatic changes are likely to be required for each computational engine.
This file contains a sequence of subroutines, which are called in turn when parsing the text output created by the computational engine. In general, this subroutines write value of the parsed property, in a well-define format, to a file handle that is passed to the subroutine. This information is later used on the 'View Job' page to visualize the results.
For each property define in the 'properties' array of the process_${engine}_output subroutine, a corresponding process_{$engine}_$property subroutine is called. It is the responsibility of this subroutine to determine if the corresponding property exists in the output file, and if so, parse the results to a well-defined format. With a few exceptions, the subroutine is passed the handle of the properties file (to which the parsed results are to be written), and an array of the output file contents (one line per array entry).
In addition to the user-specified properties, WebMO requires some special 'properties' to be parsed for each job. Thus, the following subroutines MUST exist:
- parse_${engine}_version - Parses the input file to determine if this is a valid input file (i.e. it came from the specified computational engine, and what VERSION of the program generated the input file). Returns "" (blank string) for an invalid file, otherwise the program version.
- parse_${engine}_normal_termination - Returns true (non-zero) if the output file reflects the successful termination of the job, and false (zero) for an error condition. WebMO uses this information to mark the job as successful or failed in the Job Manager.
- parse_${engine}_failure_code - Parses (if possible) and return the CAUSE of the job failure. A minimalist implementation can simply return $FailureCodes::UnknownFailure. This information is reflected on the mouse-over of the 'Failure' message on the job manager, as well as in the 'View Job' page for failed jobs.
- parse_${engine}_cpu_time - Parses and returns the cumulative CPU time (in seconds) for the job. In the case of a multi-step job, the last reported CPU time may or may not be cumulative, depending on how the engine tracks these things. In the latter case, it is important to iterate through the file to locate all of the reported CPU times and add them up.
- parse_${engine}_geometry - This subroutine parses and stores (to disk) the last occurrence of the full molecular geometry stored in the output file. It is assumed the final geometry corresponds to the final or optimized structure. This is the geometry that is displayed in the 'View Job' page and for restarting future calculations. THIS SUBROUTINE IS CALLED EVEN FOR FAILED JOBS, i.e. to extract the last geometry for a failed optimization so that the optimization can be restarted at the last geometry.
The parsing process is aided by the use of a variety of pre-defined parsing functions, which can be used to search through the file to locate various strings, etc. These functions are defined in parse_output.cgi, and the files is normally require()d at the top of the parse_${engine}.cgi file to utilize these functions. Of particular interest is,
search_from_beginning($regexp, \@logfileText)
which can be used to search for the first match to the given regular expression (when passing a constant string, use ' delimiters around the string rather than " to avoid having to escape the common \ in your regexp), starting from the first line in the output file (i.e. first element of the array). Also,
search_from_end($regexp, \@logfileText)
does the same thing, searching from the first match starting at the END of the file. By convention, WebMO usually parses the LAST occurrence of a property from the file.
In both cases, the functions return the array index (i.e. line number - 1, since the array is zero based) corresponding to the first match, OR -1 IF NO MATCH WAS FOUND.
Also potentially useful are
search_forward($regexp, $start, \@logfileText)
search_backward($regexp, $start, \@logfileText)
which accomplish the same thing, but do not start the search until the specified starting index. This can be useful to iterating a search until the last occurrence is found, at which point the function returns -1.
Beyond this advice, the parsing of properties is difficult to generalize. Progress can be made by modifying the parsing of the corresponding property from a different computational engine, where the properties (i.e., table of normal modes) is formatted in a similar manner. Other times one must start from scratch. No attempt is made to fully document the format of a parsed property (since MANY properties are parsed by WebMO), but the format can easily by deduced from the existing examples.