Task XML Reference

The XML format for the task files (see "Creating a New Task") is described in this document. Use cases are described here. Click here for a split-screen view.

Element hierarchy

    <task> (and <tasks>)
       <tags>
          <tag>
             <ui>
             <attr_set>
                <attr>
                <ui>
          <tag_group>
              <ui>
       <workflows>
          <workflow>
             <ui_settings>
                <setting>
             <step>
                <create_settings>
                    <setting>
                <run_settings>
                    <setting>
                <ui_settings>
                    <setting>
       <settings>
          <setting>
       <doc_enhancement_class>
       <java_subprocess_parameters>
       <web_customization>
          <js>
          <css>
          <short_name>
          <long_name>
       <model_config>
          <build_settings>
             <setting>
       <default_model>
       <workspace>
          <operation>
             <settings>
                <setting>
       <step_implementations>
          <step>
            <create_settings>
                <setting>

<task>

The toplevel element in the file. For historical reasons, some of the tags are obligatory and some not. Conceptually speaking, you always need to specify <tags>; <model_config>, <workflows> and <step_implementations> are required for using the engine and experiment infrastructure; <workspace> if you're going to use workspace mode. The other elements are for advanced customizations.

If you want to define multiple tasks in the same task.xml file (if, for instance, you're defining a task and a set of child tasks), you can use <tasks> as your toplevel element. This element has no attributes, and only one repeatable child: <task>.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the task. This name will appear in menus in the UI, and in help strings in the engine, so make it something mnemonic, distinctive and descriptive.
visible "no"
no If present, the task is not "visible" in the various lists of tasks the user will see. Typically, this is used if this task is not a leaf in the tree of tasks. You will seldom need this capability.
parent a string
no The name of the parent task in the hierarchy. If this is not specified, the system root task will be used. You will seldom need this capability. If you do, typically the parent will specify visible="no".
class a string, the name of a Python class
no If you've found a need to specialize the default task implementation, the value of this attribute should be "<file>.<classname>", where <file> corresponds to a file <taskdir>/python/<file>.py.

Children

Element
Obligatory?
Repeatable?
Description
<tags> yes no The tags which are used in this task.
<workflows> yes no The workflows that are used in the MAT engine.
<settings> no no The task-specific settings which may be viewed by specializations of the root task.
<doc_enhancement_class> no no If specified, this element should delimit a string "<file>.<classname>", where <file> corresponds to a file <taskdir>/python/<file>.py. The specified class is a class which contributes to specializations of this documentation for the task in question. This functionality is currently undocumented.

This element has no attributes or element children; its value is the text it delimits.
<java_subprocess_parameters>
no
no
If present, defaults for various JVM parameters for all Java subprocesses (e.g., Java Carafe training and tagging).
<web_customization> no no Customizations of the Web UI.
<model_config> no yes
Settings for the model building engine.
<default_model> no
no If present, this element should delimit a pathname where models will be saved if MATModelBuilder is invoked with --save_as_default_model. If the pathname is relative, it will be interpreted as relative to the task directory. This value may be inherited from the parent task.
<workspace> no no Implementations of the operations in the workspaces.
<step_implementations> no no Implementations of the named steps in the MAT engine workflows.

<tags> (of <task>)

The <tags> element contains the various tag declarations, as well as instructions about whether to inherit annotations from the parent. Typically, a user will inherit the structural annotations, but not the content annotations, unless the user has defined a tree of tasks where the leaves inherit the content annotations from an intermediate task in the tree.

If you inherit the structure annotations, the root task behaves as if it's specified as follows:

  <tags>
<tag name="lex" category="token">
<ui css="border: 1px solid #CCCCCC"/>
</tag>
<tag name="untaggable" category="untaggable">
<ui css="color: gray"/>
</tag>
<tag name="zone" category="zone"/>
</tags>

Attributes

Attribute
Value
Obligatory?
Description
inherit_structure "yes"
no If specified, the structure annotations of the parent task will be used in this task as well. Specifying this option does not rule out specifying other structure tags explicitly in this task. For the purposes of this flag, structure annotations count as any annotations whose category is not "content".
inherit_content "yes"
no If specified, the content annotations of the parent task will be used in this task as well. Specifying this option does not rule out specifying other content tags explicitly in this task. For the purposes of this flag, structure annotations count as any annotations whose category is "content".

Children

Element
Obligatory?
Repeatable?
Description
<tag> no yes An individual tag definition.
<tag_group>
no
yes
A means of defining a tag hierarchy for the UI.

<tag> (of <tags>)

This element defines a single annotation tag. These annotations will have a category and a name, and may correspond to multiple visible tagging options, if the <attr_set> child element has been used.

Attributes

Attribute
Value
Obligatory?
Description
category a string
yes The category of the annotation. Recognized categories are currently "content", "token", "zone" and "untaggable".
name a string
yes The annotation label.

Children

Element
Obligatory?
Repeatable?
Description
<ui> no no The UI-relevant properties of the annotation
<attr_set> no yes An attribute set, specifying a named collection of attribute values for this annotation which will be treated by the UI and scorer as a separate "annotation".

<ui> (of <tag>)

The UI-relevant properties of the annotation, including the CSS that the Web UI uses to display the annotation, and keyboard accelerators for tagging.

Attributes

Attribute
Value
Obligatory?
Description
css a string of legal CSS
yes On a token-by-token basis, the UI will apply this CSS to any span labeled by this annotation.
accelerator a single-character string
no If specified, this accelerator will be available in the annotation selection menu in the Web UI; if the user presses this key, this annotation will be selected for the span, just as if the element in the menu had been chosen.

<attr_set> (of <tag>)

In some cases, what the user thinks of as "things to annotate" don't correspond one-to-one with actual defined annotations. One example is found in a commonly-used scheme for named entity annotation, where all named entities are assigned the ENAMEX tag, and they are distinguished by the value of their "type" attribute (type=PERSON, type=ORGANIZATION, type=LOCATION). An <attr_set> is intended to map the actual annotations onto the "things to annotate" the user thinks of.

Concretely, when a <tag> has an <attr_set>, it's intended to represent a situation where the user intends to annotate something named by the "name" attribute of the <attr_set>, which is actually implemented by the <tag>'s label together with the attribute-value pairs specified in the <attr_set>'s <attr> children. If there are any annotations which match the tag but don't fall into any attr set, they'll be reported as belonging to the tag itself in the scorer. In the UI, the tag itself will be presented as an annotation option only if the <tag> has an immediate <ui> child; but we don't recommend enabling this possibility.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes the name of the attr set, as it will be known in the scorer and the Web UI.

Children

Element
Obligatory?
Repeatable?
Description
<attr> yes yes A pair of attribute and value.
<ui> no no The UI-relevant properties of the attr set. Identical to the <ui> element immediately within the <tag> element.

<attr> (of <attr_set>)

An attribute-value pair which contributes to the definition of the attr_set. Attribute-value pairs represent conjoined requirements (that is, there's no "or"), and literal values (no ranges, pattern matching, etc.).

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes the name of an annotation attribute
value a string
yes the required value of the attribute

<tag_group> (of <tags>)

Under some circumstances, you might want to create cascaded annotation menus in the MAT UI, perhaps in order to group together similar annotations, or provide options for more or less general annotations, or to compress the screen real estate taken up by the annotation popup menu. You can use the <tag_group> element to accomplish this.

Each tag group has a name. This might correspond to the name attribute of a <tag> or <attr_set> element (in which case it refers to that annotation), or it might be a previously unknown name, in which case it serves merely as a group. The children of each tag group can be actual annotation names, or other known tag groups. The annotations the tag group refers to must be content annotations. If the tag group is not otherwise known, it has the option of declaring CSS styling for the menu entry.

Tag groups are inherited from an available parent task if the inherit_content attribute is set on the <tags> element. Local tag groups override parent tag groups.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes the name of the tag group, either a new name or the name of an existing content annotation (matching the name attribute of a <tag> or <attr_set> element)
children a string
yes a comma-delimited sequence of names, either names of existing content annotations or of known tag groups

Children

Element
Obligatory?
Repeatable?
Description
<ui> no no Like the <ui> element for the <tag> and <attr_set> elements, except that only the css attribute is permitted on this version of the element. Used to assign styling to otherwise unknown tag groups.

<workflows> (of <task>)

Workflows are ordered sets of steps, corresponding to a larger-scale activity the user may wish to apply to the documents.

Attributes

Attribute
Value
Obligatory?
Description
inherit a string, a comma-delimited sequence of workflow names
no If the task has a non-root parent task, you may use this attribute to inherit workflows from the parent task. The implementations of the step names will also be inherited. You can list multiple workflows, delimited by commas, e.g., "Demo,Hand annotation".
inherit_all "yes"
no If the task has a non-root parent, you may use this attribute to specify that all workflows should be inherited from the parent.

Children

Element
Obligatory?
Repeatable?
Description
<workflow> no yes An individual workflow

<workflow> (of <workflows>)

Each non-inherited workflow is specified by a <workflow> tag.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the workflow that the user can specify in the Web UI or the MAT engine.
hand_annotation_available_at_end "yes"
no If specified, the user will be able to add or correct hand annotations in the document after the last step of this workflow is completed.

Children

Element
Obligatory?
Repeatable?
Description
<ui_settings> no
no
These are settings that are intended to be passed unmodified to the UI. This is not currently used.
<step> no yes An individual step of a workflow.

<ui_settings> (of <workflow>)

These are settings that are intended to be passed unmodified to the UI, in order to declaratively configure UI customizations for particular workflows. At the moment, no tasks use this feature. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <ui_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <ui_settings>)

An individual UI setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<step> (of <workflow>)

The steps are the basic elements of workflows.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the step. These names must be matched in the step implementations.
hand_annotation_available "yes"
no If specified, hand annotation is available in the Web UI during this step.
by_hand "yes"
no If specified, this step is performed by the user by hand, not automatically. This step must be defined as a tagging step in the step implementation, and it implies hand_annotation_available="yes".
pretty_name a string
no The name of this step that the user will see in the UI.
proxy_for_steps a comma-delimited string of step names
no Steps can be sequences of other steps (i.e., they can be composite). You may want this in a workflow if two steps will always be done as a group, for instance. The names in the value for this attribute must be the values of the "name" attribute of other steps, not the values of the "pretty_name" attribute.

Children

Element
Obligatory?
Repeatable?
Description
<create_settings> no no
Settings to pass to the initializer of the step
<run_settings>
no
no
Settings to pass to the execution of a step
<ui_settings>
no
no
Settings to pass to the UI for this step. Not currently used.

<create_settings> (of <step>)

These are settings that a step might pass to the initialization phase of its step class. These settings override the values in the <create_settings> element for <step_implementation>. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <create_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <create_settings>)

An individual step creation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<run_settings> (of <step>)

These are settings which are passed to the do() or doBatch() method of the step (that's the method that actually performs the step). You can configure the settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <run_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

Most predefined step implementations in MAT do not support any run settings. The two implementations which do are MAT.JavaCarafe.CarafeTokenizationStep and MAT.JavaCarafe.CarafeTagStep.

The MAT.JavaCarafe.CarafeTagStep step implements automatic tagging. Any step which implements automatic tagging can bear the following additional attribute-value pairs:

Key
Value
Description
tagger_local
"yes"
By default, the MAT engine will contact the MAT Web server to tag a document, because the Web server has the capability of starting up and monitoring a long-living tagger task. The reason this is beneficial is that the Carafe tagger, like many model-based taggers, has a fairly expensive startup cost. To block the engine from contacting the Web server, and force it to start up and shut down the tagger on its own, specify tagger_local="yes".
tagger_model
a string, a filename of a tagging model
If the task does not have a default model, the user must specify the location of the tagger model.

In addition, the Carafe tagging and tokenization steps supports other run settings, documented here.

<setting> (of <run_settings>)

An individual run setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<ui_settings> (of <step>)

These are settings that are intended to be passed unmodified to the UI, in order to declaratively configure UI customizations for particular tasks. At the moment, no tasks use this feature. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <ui_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <ui_settings>)

An individual UI setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<settings> (of <task>)

These are settings that a specialized task might require which the user wishes to be able to configure in XML, rather than by modifying the source code for the specialized task. The chances that a normal user will use this are extremely slim. These settings are not inherited by task children.

You can configure the settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <settings>)

An individual task-level setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the task-level setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the task-level setting. This element has no attributes or element children; its value is the text it delimits.

<java_subprocess_parameters> (of <task>)

MAT has some built-in tools to control Java Carafe and other Java subprocesses. Using this element, you can declare default settings for Java heap and stack sizes. If not set locally, these settings are inherited from parent tasks.

Children

Attribute
Value
Obligatory?
Description
heap_size a string no The value here is a value for the heap size for the Java VM. It is passed to the Java VM using the -Xmx argument. Values like 512M or 2G are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the Java Carafe engine for examples).
stack_size a string no The value here is a value for the stack size for the Java VM. It is passed to the Java VM using the -Xss argument. Values like 4096k or 512k are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the Java Carafe engine for examples).

<web_customization> (of <task>)

Among the ways that tasks can be customized is the Web UI can be customized in a number of ways. This process is quite complicated; it's almost entirely code-oriened, and it's not documented at all. This section is here for reference only; users who aren't really, really brave shouldn't go anywhere near these customizations.

Attributes

Attribute
Value
Obligatory?
Description
inherit_css "no"
no If the parent task has CSS customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance.
inherit_js "no"
no If the parent task has Javascript customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance.
display_config a string
no Each Web customization set has a name, so that when the user selects a particular task, the UI knows which customization set to use. Can be inherited from parent tasks; a value of "" cancels the inheritance.
alphabetize_labels
"no"
no
By default, the MAT UI orders the annotation labels alphabetically in the legend and the tag popup menu. If this attribute is set, the UI will list the annotation labels in the order they are defined in the <tags> element. Can be inherited from parent tasks; a value of "" cancels the inheritance.
tokenless_autotag_delimiters
a string
no
By default, if you ask the MAT UI to autotag similar strings when you're annotating without tokens, the only edge conditions that the UI recognizes are whitespace and zone boundaries. If your match abuts a punctuation mark, it will not recognize it as a delimiter. If you want other edge conditions to be recognized, you can list them in the value of this attribute. (Remember, though, that you may have to use the XML entity character codes for those characters which are significant to XML syntax, so that the XML parsing doesn't fail.) This setting can be inherited from parent tasks; a value of "" cancels the inheritance.
default_tag_window_position
x,y
no
To set the default position of annotations windows in the MAT UI, use this attribute. X and y must be integers. Example: "100,200". Can be inherited from parent tasks; a value of "" cancels the inheritance.
default_tag_window_size
width,height
no
To set the default position of annotations windows in the MAT UI, use this attribute. Width and height must be integers. Example: "100,200". Can be inherited from parent tasks; a value of "" cancels the inheritance.
text_right_to_left
"yes"
no
If specified, documents viewed in this task in the MAT UI will be treated as right-to-left text (e.g., Arabic). Can be inherited from parent tasks; a value of "" cancels the inheritance.

Children

Element
Obligatory?
Repeatable?
Description
<js> no yes The relative pathname of the Javascript customizations. This path is relative to the task directory. By convention, this file should be in the "js" subdirectory.

This element has no attributes or element children; its value is the text it delimits.
<css> no yes The relative pathname of the CSS customizations. This path is relative to the task directory. By convention, this file should be in the "css" subdirectory.

This element has no attributes or element children; its value is the text it delimits.
<short_name>
no
no
This is the name that the UI will display in the upper left corner if this customization is the only customization available. This setting will be inherited by child tasks.

This element has no attributes or element children; its value is the text it delimits.
<long_name>
no
no
This is the name that the UI will use as the title of the Web page if this customization is the only customization available. This setting will be inherited by child tasks.

This element has no attributes or element children; its value is the text it delimits.

<model_config> (of <task>)

It's also possible to configure various dimensions of the model build process in the task.xml file. The settings for this config are identical to the command-line options available for the MATModelBuilder. There is no default model build engine in a task.xml file; if you want to build models, you must declare a model config.

MAT is delivered with a default Carafe model builder.

You can have multiple <model_config> entries, as long as they differ by the config_name attribute. If a named or default model config isn't found when requested by MATModelBuilder or the experiment engine, MAT will look for it in the parent task.

Attributes

Attribute
Value
Obligatory?
Description
class
the name of a Python class
yes
This attribute names the class which will be used as the model builder. The default Carafe model builder class is MAT.JavaCarafe.CarafeModelBuilder
config_name
a string
no
If present, a config name to specify as the --config_name in MATModelBuilder, or for the config_name attribute in <build_settings> in the experiment engine. If omitted, this entry is the default model config. There can be only one default.

Children

Element
Obligatory?
Repeatable?
Description
<build_settings> no no
The settings for this model config

<build_settings> (of <model_config>)

The <build_settings> tag supports arbitrary attribute-value pairs which are passed to the model builder. See the documentation for the Carafe model builder to see which attributes should be supplied to that engine. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <build_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <build_settings>)

An individual build setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<workspace> (of <task>)

If you want to use workspace mode, you must declare how the various workspace operations are implemented. These operations are described here.

Attributes

Attribute
Value
Obligatory?
Description
inherit_operations "no"
no By default, workspace operation implementations are inherited from the task parent, if not available locally. Use this attribute to block inheritance.

Children

Element
Obligatory?
Repeatable?
Description
<operation> yes yes An individual operation.

<operation> (of <workspace>)

Specifies the implementation of a workspace operation. Note that in spite of the fact that operations are associated with folders, these operations are referenced only by name, because the operations should be named uniquely.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of the operation

Children

Element
Obligatory?
Repeatable?
Description
<settings> no no The operation settings.

<settings> (of <operation>)

The settings for the operation. What these settings are depend on what sort of operation it is. For instance, for operations which invoke the MAT engine, these settings will be the arguments to the MAT engine. For operations which invoke the MAT model builder, these settings will be the arguments to the MAT model builder. See the documentation on workspaces to find out what the options are for particular operations.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <settings>)

An individual operation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.

<step_implementations> (of <task>)

Step implementations associate a named step with an implementation for that step (i.e., a Python class), perhaps in the context of particular workflows. The effect of each named step in a task is global; e.g., the "tag" step might add content annotations. However, the way that effect is achieved may differ among step implementations; e.g., one implementation of the tag step may involve hand annotation, or there may be multiple possibilities for adding the tags automatically. By default, step implementations are inherited from the parent.

Children

Element
Obligatory?
Repeatable?
Description
<step> no yes An individual step implementation.

<step> (of <step_implementations>)

Each individual step implementation specifies the Python class, at least.

Attributes

Attribute
Value
Obligatory?
Description
name a string
yes The name of a step as it is used in workflows. These are values of the "name" attribute for the <workflow> <step> element, not the "pretty_name" attribute.
class a string, the name of a Python class
yes The Python class, including its module name, which implements this step.
tagging_step "yes"
no If this step involves adding content annotations, this attribute should be specified.
workflows a comma-delimited string of workflow names
no The workflow contexts in which this implementation holds. Different workflows can have different implementations for the same named step.

Children

Element
Obligatory?
Repeatable?
Description
<create_settings> no no
Default settings for initializing the step.

<create_settings> (of <step>)

These are settings that a step might pass to the initialization phase of its step class. These settings can be overridden by the values in the <create_settings> element for <step> in the <workflow> element. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.

Attributes

Attribute
Value
Obligatory?
Description
<attr> a string
no The <create_settings> tag supports arbitrary attribute-value pairs.

Children

Element
Obligatory?
Repeatable?
Description
<setting> no yes An attribute-value pair.

<setting> (of <create_settings>)

An individual step creation setting.

Children

Element
Obligatory?
Repeatable?
Description
<name> yes no The name of the setting. This element has no attributes or element children; its value is the text it delimits.
<value> yes no The value of the setting. This element has no attributes or element children; its value is the text it delimits.