1HEADING1SPACER

 

 

 

 

 

 

2          The Virtual Data Language

Revision $Revision: 1.4 $ of file $RCSfile: VDSUG_VDLReference.xml,v $.

 

 

The Virtual Data Language (VDL) comes in two flavors, textual and XML. This document deals with the textual version of VDL, also known as VDLt.

The language syntax is shown using railroad diagrams. In these diagrams, rectangular boxes show terminals of the language; that is, tokens that must appear exactly as shown in the box. Round elements show non-terminals of the language. Non-terminals are place-holders for another rule.

2.1         Definitions

Each Definition is either a Transformation or Derivation. These two elements may be repeated in any order any number of times. The term Definition applies when talking about either element.

Figure 1: Definition, Transformation and Derivation.

2.2         Transformations

A transformation starts with the keyword TR followed by a fully qualified definition identifier (fqdi). This identifier comes in various shapes, and is explained in the next section.

The transformation identifier is followed by an open parenthesis, a list of formal arguments (farg-list), and a closing parenthesis. This list of formal arguments is optional. The parentheses are mandatory. These elements constitute the header of a transformation.

The body of a transformation constitues an opening curly brace, a list of optional transformation body elements, and a closing curly brace. The transformation body itself may be empty; however the curly braces are mandatory.

Transformations come in two flavors, simple or compound. The body elements determine the flavor of a transformation. If there is a call  statement inside the body, the transformation becomes a compound transformation (tr-comp-body). The presence of an argument statement signals a simple transformation (tr-simple-body). While the profile body element may appear in both flavors of transformations, the argument and call element are mutually exclusive. The absence of both implies a simple transformation.

Figure 2: Make-up of a Transformation.

Note: Future versions may permit an empty transformation body by specifying a semicolon instead of an empty set of curly braces. The gray dashed line denotes this fact.

2.3         The Fully-qualified Definition Identifier

The fqdi uniquely identifies a definition. It consists of three parts:

1.       a namespace,

2.       a name (sometimes called identifier) within this namespace, and

3.       a version number.

The namespace and version portions of the fully qualified definition identifier are optional. If a namespace is present, two colons[1] separate the namespace from the name. If a version is present, a single colon separates it from the name. There mustn't be any white-spaces between the different parts of a fully qualified definition identifier nor around the colons.

 

Figure 3: A fully qualified definition identifier.

It is highly recommended to use only characters in the namespace and name portion which are also permissible in the C programming language. We strongly suggest refraining from using characters not found in C identifiers. The following set of regular expressions shows the permissible characters in a TR or DV identifier:

# currently permissable characters

namespace   :== [a-zA-Z_./-][a-zA-Z0-9_./-]*

name        :== [a-zA-Z_./-][a-zA-Z0-9_./-]*

version     :== [0-9][.0-9]*

The following set of regular expressions shows the recommended characters in a TR or DV identifier:

# recommended characters to actually use

namespace   :== [a-zA-Z_][a-zA-Z0-9_]*

name        :== [a-zA-Z_][a-zA-Z0-9_]*

version     :== [0-9]+

Please also avoid periods, hyphens and slashes in identifiers. The period or the hyphen is a candidate in future releases to partition a namespace hierarchy. They would thus become a forbidden character. A hyphen in identifiers is problematic for the scanner, and is thus slated for removal in future releases.

/some/name/space::mydv   # namespace::name

some.other.scheme::tr2   # namespace::name

mimi:12                  # name:version

dv2                      # just name

2.3.1        The Namespace

The namespace must start with an alphabetical character, a slash, or an underscore. Alphanumerical characters, slashes, underscores, periods and hyphens may follow it. The inclusion of slashes and periods allows a user for self-imposed hierarchy creation in namespaces. There is no explicit support for namespace hierarchies yet.

Warning: Only such characters as are also permissible in C language identifiers should be used. The use of hyphens is strongly discouraged! Periods and slashes are discouraged unless used to denote a namespace hierarchy.

It is recommended to always use a namespace with any fully qualified definition identifier.

2.3.2        The Name

The name, sometimes also called identifier, must start with an alphabetical character. Alphanumerical characters, underscores or hyphens may follow it. The name is the minimal and mandatory member of any fully qualified definition identifier.

Warning: The use of periods, slashes and hyphens in the name is strongly discouraged!

2.3.3        The Version

The version must start with a numerical digit. Further digits or periods may follow it. Unfortunately, there are no user callbacks to compare versions correctly - versions are currently compared using string comparison. The version comparison becomes important when mapping a derivation  to its transformation during the abstract planning process.

It is recommended to use only non-negative integers for the version number. Future releases are slated to make the version a true natural number. There are a couple of useful practices for the version number:

·         Monotonously increasing integers from 0 or 1 upwards.

·         A UTC timestamp, e.g. 1112031789.

·         An ISO 8601 style timestamp, e.g. 20050828112233

2.4         The Formal Argument List

The formal argument list is optional. If formal arguments exist, each formal argument constitutes a type followed by the identifier of the formal argument varname. The type may be specified in a long or alternative short form. Permissible values for the type field are:

long

short

meaning

none

 

This type denotes a symbolic argument, either a string or a number that is to be passed verbatim. The omission of a type also denotes the symbolic argument.

input

in

The input type denotes filenames that are consumed, at least logically.

output

out

The output type denotes filenames that are produced.

inout

io

The inout type is reserved for compound transformations to denote temporary glue filenames. Glue Filenames are produced by a call in a compound transformation, and are consumed by another call in the same compound. The inout type is only permissable for argument declarations, not for LFNs themselves.

The formal argument identifiers follow the syntax of the C programming language for identifiers. The scope of formal arguments pertains to the transformation itself. If the identifier is associated with a list value, it must be followed by the open bracket and close bracket tokens.

It is permissible to assign default values to any formal argument. Default arguments are not, unlike C++, limited to the final formal argument declarations. They may occur where needed.

Figure 4: List of Formal Arguments

The rules for default arguments are similar to actual arguments in a derivation. If a simple scalar formal argument is of type none, the default argument must be a text element. If the scalar formal argument has any other type, the default argument must an LFN. Similar rules apply to list formal arguments. With list values, the list of default argument is enclosed in brackets. Each element within the list must be of the correct element. The list may be empty, denoted by a closing bracket immediately following an opening bracket.

Commas separate multiple formal arguments in the formal argument list. The following example just illustrates various ways to specify formal arguments, omitting the transformation body:

TR t1:1( foo, out bar ) {..}

TR t1:2( foo = "2.0", out bar ) {..}

TR t1:3( foo="2.0", out bar=@{output:"f1"} ) {..}

TR t1:4( foo, out bar = @{output:"f1"} ) {..}

The previous examples all illustrate passing a none and an output argument. The order of the arguments is at the time of this writing of no concern. In the first case, a caller must supply values for both arguments, foo and bar. If no default exists, a binding must always be provided by the derivation's actual arguments.

In the next case, a caller may omit the binding of text for foo. In this case, the supplied default value is taken. If the caller choses to overwrite the default value, the caller must supply a value (binding) for foo.

2.5         The Simple Transformation Body

The simple transformation consists of any number (including zero) of argument and profile statements. Each statement must be terminated by a semicolon Pascal style.

Figure 5: Body of a Simple Transformation.

An identifier may follow the argument keyword; however, it is optional in almost all cases. Since the identifier serves currently no purposes, it should best be left out. The equals character is mandatory, followed by one or more tr-leaf non-terminals. Permissible are the text constant, or a reference to a formal argument identifier (use). These are defined in a later section.

The profile keyword is followed by a the identifier for a profile's namespace. A period, or a double colon, separates the namespace of the profile from the key within the namespace. The mandatory equals character introduces one or more tr-leaf non-terminals. Permissable are text constants and references to formal argument identifiers. These are defined in a later section.

TR t1() { }     # simplest

TR t2( in f1 ) {

  argument = "-i " f1;

}

TR t3( in f1, out f2 ) {

  argument = "-i " f1;    # short

  argument = "-o " ${f2}; # or longer

  profile env.HOME = "/home/snej";

}

2.6         The Compound Transformation Body

A compound transformation consists of at least one call statement and any number of profile statements. Each statement must be terminated by a semicolon Pascal style.

Figure 6: Body of a Compound Transformation.

The call statement is effectively an anonymous derivation. In other words, it behaves like a derivation, with some minor differences. A call does not feature a fully qualified definition identifier for itself. Since a call applies to a transformation, it offers the soft binding to matching transformations through the tr-map non-terminal described below.

Mandatory parenthesis follow the call reserved word. The list of call actual arguments (carg-list) is optional. Commas separate multiple entries. The carg-list  non-terminal permits either string constants (text) or references to the enclosing transformation's formal argument identifiers (use).

The profile keyword is followed by a the identifier for a profile's namespace. A period, or a double colon, separates the namespace of the profile from the key within the namespace. The mandatory equals character introduces one or more tr-leaf non-terminals. Permissable are text constants and references to formal argument identifiers. These are defined in a later section.

TR t4( in f1, io f2, out f3 ) { # compound

  call t3( f1=${f1}, f2=${out:f2} );

  call t3( f1=${in:f2}, f2=${f3} );

}

2.6.1        The Compound Argument List

The actual argument binding inside a call statement only allows elements that are permissible inside a transformation: Either verbatim strings (text) or bound formal argument references (use).

Figure 7: Argument list of Call Statements.

Each argument inside a call names the formal argument varname in the called transformation. This process is called binding. The equal character is followed either by a list or a scalar. If the called formal argument is a scalar, only the lower path may be taken. If the formal argument in the called transformation is a list, either the elements are enumerated between brackets, or a bound variable reference from the encompassing transformation may be used to pass the list.

2.7         Permissable Elements Inside a TR

The tr-leaf specifies the permissible arguments inside any transformation. The tr-leaf element permits either verbatim text  or referencing bound variables (use). Please note in the railroad diagrams which exhibit tr-leaf usage, if the tr-leaf element is a single element, a possible concatenation of tr-leaf elements, or a comma-separated list of simple tr-leaf elements.

Figure 8: The tr-leaf non-terminal.

2.7.1        The Text Element

The text element is a C-style string in quotes. A backslash can be used to quote both, the quote and the backslash.

Figure 9: The text element.

""

"some text"

"\"quoted\" quote"

"\\\\W2K\\C:\\WINNT"

The first example shows an empty string. The next example shows a string with some content. The 3rd example shows how to escape quotes inside a string. The final example shows how to escape backslashes inside a string.

2.7.2        The Use Element

The use element is a reference to a formal argument. The typical old-style use element starts with the dollar sign. Its argument, the identifier of the formal argument, is set in curly braces. A simplified version uses just the bound variable identifier, as commonly used in procedural programming languages.

Figure 10: The use element.

The simplified reference to formal argument identifiers (also known as bound variables) is shown in the top path. If the bound variable is used with the same input/output type as declared in the formal argument type, just the identifier needs to be specified. If type casting is required, C-style type casts can be used for simple casts. A C-style type cast puts the target type into paranethesis in front of the identifier.

While list type variables may be used with the simple notation, any special rendering requires the old-style notation.

The optional rendering, marked green, changes the default appearance in the command-line arguments and profile components, when rendering a list value. If a rendering is present, the vertical bar character separates it from the other parts of the use element. A rendering consists either of one or three text elements. Each text  element is a quoted string.

If your rendering requires to specify just the separating string between list elements, the 1-string case applies. In this case, the string is the only rendering element.

If you also require to specify the introduction (prefix) string of a list, or the the end (suffix) of a list, you need to use the 3-string case. In the 3-string case, colons separate all strings. The contents of the first string is placed before the first list element, the separator string in the second string between all list elements, and the contents of the third string as suffix after the list.

The type is described in the formal argument list. If a type is used, it must be suffixed by a colon character. The type casts to either input or output is usually necessary, if the bound variable's type is inout.

The final and mandatory part is the name of the formal argument identifier (varname) that is referenced.

simplest

${simple}

(out) more

${out:more}

 

${"-"|list1}

${"-"|out:list2}

${" [ ":", ":" ] "|list3}

The first case shows the simplest reference to a bound variable by just using its name. The next line shows an equivalent form, using old-style dollar-brace notation. The 3rd line shows a simple type-cast variable reference, with the 4th line showing the equivalent case.

The first list case shows a rendering which will render each list item separated by just a hyphen, no spaces. The second list also casts the type of the list. The final list item will render the list by putting it between brackets, and separating each element with a comma. If the list3contained three elements a, b, and c, the output would look like this:

 [ a, b, c ]

2.8         Derivations

A derivation starts with the reserved keyword DV followed by a fully qualified definition identifier (see above). The identifier uniquely describes the derivation.

The arrow operator, which consists of a hyphen immediately followed by a less-than character, separates the derivation identifier from the transformation mapping (tr-map). The transformation mapping allows a flexible binding of the given derivation to a set of matching transformations by their versioning. The in-depth support in the abstract planner is lacking, though.

Mandatory parenthesis enclose the actual argument list (aarg-list). The actual argument list is optional. It binds formal arguments to actual values. A derivation specification is terminated with a semicolon.

Figure 11: The Derivation.

2.9         The Transformation mapping tr-map

The tr-map non-terminal maps the current derivation or call onto a set of matching transformations. It is very similar to a fully qualified definition identifier, but differs with the version specification. As with the fully qualified definition identifier, white-spaces anywhere within the tr-map non-terminal are forbidden.

Figure 12: Mapping to a Transformation.

The namespace and name rules are the same as for fqdi. The version left of the comma is a minimum inclusive version number. The syntax of this version is the same as the fqdi version. The version to the right of the comma is a maximum inclusive version number. The syntax of this version is the same as the fqdi version.

The syntax diagram looks more complicated than the matter actually is. There are essentially four cases to distinguish:

1.       If a colon does not follow the name, there is no version specification, and no version matching will be done. In this case, all versions match.

2.       If the name is followed by a colon, one of the following mappings must be applied:

a.       The regular case specifies two version number, a minimum inclusive version followed by a comma, followed by a maximum inclusive version. An example is "name:10,20".

b.       The case with an open maximum version only specifies the minimum version followed by a comma, e.g. "name:10,".

3.       The case with an open minimum version only specified a comma followed by the maximum permissible version, e.g. "name:,20".

name          #  1: unrestricted versioning

name:10,20    # 2a: version range requirement

name:10,      # 2b: minimum version requirement

name:,20      #  3: maximum version requirement

2.10     The Actual Argument List

The list of actual argument in a derivation binds values to formal arguments of a transformation. The syntax is similar to the argument list for a call. The difference is in the permissible elements in the list. Being inside a derivation, only quoted strings (text) and logical filenames LFN may be passed.

Figure 13: Actual Arguments in Derivations.

Each actual argument names the formal argument in the called transformation to bind to (varname). Either a list or a scalar follows the mandatory equals character. If the bound formal argument is a scalar, only the lower path may be taken. If the formal argument in the called transformation is a list, the elements are enumerated between brackets,

2.10.1    Permissable Elements Inside a DV

The dv-leaf specifies the permissible arguments inside a derivation. Inside any derivation, only verbatim textual elements (text) and LFNs are permitted. Please note in the railroad diagrams which exhibit dv-leaf use, if it is a single dv-leaf element, a concatenation of dv-leaf elements, or a comma-separated list of dv-leaf elements.

Figure 14: Elements permitted in dv-leaf.

2.10.2    The Text Element

The text element is a C-style string in quotes. A backslash can be used to quote both, the quote and the backslash. Please refer to section 2.7.1 for details.

2.10.3    The LFN Element

The LFN element describes a logical filename. Any LFN is introduced by the "at" character, with further arguments in curly braces.

Figure 15: The LFN element.

The mandatory type describes a logical file as either input, output, or inout. The latter is usually reserved for transient files. A colon separates the logical filename itself from the type. Please note the use of the text  element for the filename - it must be enclosed in quotes.

In case of transient filenames, the pattern to construct temporary filenames can be specified as follows: A colon separates the pattern string, a text  element again, from the filename. The pattern is optional. In the presence of the pattern, the filename is deemed transient. Transient files are neither transferred nor cataloged. It will not participate in the staging process, except for inter-pool transfers as necessary. In the absence of the pattern, the file is assumed to be a fully tracked, transferable and registrable file. However, this behavior can be modified as shown below.

@{in:"lfn1"}

@{io:"lfn2":"tmp-XXXXXX"}

The first file specification shows a typical input file. The 2nd file specifies a transient file, which will neither be registered nor transferred to the final staging area. However, it may be transferred to different sites, if the consuming jobs require so.

Recent additions permit to specify a finer control over the handling of the file. The options are the last part of any logical filename syntax. A vertical bar separates the options from any preceding items like the filename or pattern. Zero or more one-character codes flag how a concrete planner is supposed to deal with the file.

flag

meaning

present

absent

r

register file

register file in RC

do not register file

t

transfer file

transfer file to output

only inter-pool transfers

T

no-fail transfer

don't fail if transfer fails

mutually exclusive with 't'

o

optional file

don't fail, if not in RC

fail if not found in RC

The following equivalences apply to short-cuts and compatibility with previous versions of VDL:

@{in:"lfn1"}      <=>  @{in:"lfn1"|rt}

@{in:"lfn2":"x"}  <=>  @{in:"lfn2":"x"|}

 


 



[1] The two colons must be one token, no spaces between them.