The XML format for the task files (see "Creating a New Task") is described in this
document. Use cases are described here. Click here for a split-screen view.
<task> (and
<tasks>)
<workflows>
<workflow>
<ui_settings>
<setting>
<step>
<create_settings>
<setting>
<run_settings>
<setting>
<ui_settings>
<setting>
<settings>
<setting>
<doc_enhancement_class>
<java_subprocess_parameters>
<web_customization>
<js>
<css>
<short_name>
<long_name>
<model_config>
<build_settings>
<setting>
<default_model>
<workspace>
<operation>
<settings>
<setting>
<step_implementations>
<step>
<create_settings>
<setting>
<annotation_set_descriptors>
<annotation_set_descriptor>
<annotation_display>
<label>
<attribute>
<label_group>
<similarity_profile>
<stratum>
<tag_profile>
<attr_equivalences>
<dimension>
<score_profile>
<aggregation>
<attr_decomposition>
<partition_decomposition>
<label_limitation>
The toplevel element in the file. For historical reasons, some of
the tags are obligatory and some not. Conceptually speaking, you
always need to specify <annotation_set_descriptors>;
<model_config>, <workflows> and
<step_implementations> are required for using the engine and
experiment infrastructure; <workspace> if you're going to
use workspace mode. The other elements are for advanced
customizations.
If you want to define multiple tasks in the same task.xml file
(if, for instance, you're defining a task and a set of child
tasks), you can use <tasks> as your toplevel element. This
element has no attributes, and only one repeatable child:
<task>.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the task. This
name will appear in menus in the UI, and in help strings in
the engine, so make it something mnemonic, distinctive and
descriptive. |
visible | "no" |
no | If present, the task is not
"visible" in the various lists of tasks the user will see.
Typically, this is used if this task is not a leaf in the
tree of tasks. You will seldom need this capability. |
parent | a string |
no | The name of the parent task
in the hierarchy. If this is not specified, the system root
task will be used. You will seldom need this capability. If
you do, typically the parent will specify visible="no". |
class | a string, the name of a
Python class |
no | If you've found a need to specialize the
default task implementation, the value of this attribute
should be "<file>.<classname>", where
<file> corresponds to a file
<taskdir>/python/<file>.py. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<workflows> | yes | no | The workflows that are
used in the MAT engine. |
<settings> | no | no | The task-specific settings
which may be viewed by specializations of the root task. |
<doc_enhancement_class> | no | no | If specified, this element
should delimit a string "<file>.<classname>",
where <file> corresponds to a file
<taskdir>/python/<file>.py. The specified
class is a class which contributes to specializations of
this documentation for the task in question. This
functionality is currently undocumented. This element has no attributes or element children; its value is the text it delimits. |
<java_subprocess_parameters> |
no |
no |
If present, defaults for
various JVM parameters for all Java subprocesses (e.g.,
Java Carafe training and tagging). |
<web_customization> | no | no | Customizations of the Web
UI. |
<model_config> | no | yes |
Settings for the model
building engine. |
<default_model> | no |
no | If present, this element should delimit a pathname where models will be saved if MATModelBuilder is invoked with --save_as_default_model. If the pathname is relative, it will be interpreted as relative to the task directory. This value may be inherited from the parent task. |
<workspace> | no | no | Implementations of the
operations in the workspaces. |
<step_implementations> | no | no | Implementations of the
named steps in the MAT engine workflows. |
<annotation_set_descriptors> | no | no | The labels and attributes
which are used in this task. |
<annotation_display> |
no |
no |
The display-related properties of the
labels and attributes in this task. |
<similarity_profile> |
no |
yes |
The methods for comparing annotations for
scoring and visual comparison. |
<score_profile> |
no |
yes |
The methods for decomposing and aggregating
annotation labels for scoring. |
Workflows are ordered sets of steps, corresponding to a
larger-scale activity the user may wish to apply to the documents.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit | a string, a comma-delimited
sequence of workflow names |
no | If the task has a non-root parent task, you may use this attribute to inherit workflows from the parent task. The implementations of the step names will also be inherited. You can list multiple workflows, delimited by commas, e.g., "Demo,Hand annotation". |
inherit_all | "yes" |
no | If the task has a non-root
parent, you may use this attribute to specify that all
workflows should be inherited from the parent. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<workflow> | no | yes | An individual workflow |
Each non-inherited workflow is specified by a <workflow>
tag.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the workflow that
the user can specify in the Web UI or the MAT engine. |
hand_annotation_available_at_end | "yes" |
no | If specified, the user will
be able to add or correct hand annotations in the document
after the last step of this workflow is completed. If one of
the steps in the workflow is implemented as a tag step, this
attribute will be ignored; similarly, if
hand_annotation_available_at_beginning is specified, this
attribute will be ignored. |
hand_annotation_available_at_beginning |
"yes" |
no |
If specified, the user will
be able to add or correct hand annotations in the document
before the first step of this workflow. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<ui_settings> | no |
no |
These are settings that are intended to be passed unmodified to the UI. This is not currently used. |
<step> | no | yes | An individual step of a
workflow. |
These are settings that are intended to be passed unmodified to
the UI, in order to declaratively configure UI customizations for
particular workflows. At the moment, no tasks use this feature.
You can configure these settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <ui_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual UI setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
The steps are the basic elements of workflows.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the step. These
names must be matched in the step implementations. |
hand_annotation_available | "yes" |
no | If specified, hand annotation
is available in the Web UI during this step. |
by_hand | "yes" |
no | If specified, this step is
performed by the user by hand, not automatically. This step
must be defined as a tagging step in the step
implementation, and it implies
hand_annotation_available="yes". |
pretty_name | a string |
no | The name of this step that
the user will see in the UI. |
proxy_for_steps | a comma-delimited string of
step names |
no | Steps can be sequences of
other steps (i.e., they can be composite). You may want this
in a workflow if two steps will always be done as a group,
for instance. The names in the value for this attribute must
be the values of the "name" attribute of other steps, not
the values of the "pretty_name" attribute. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<create_settings> | no | no |
Settings to pass to the
initializer of the step |
<run_settings> |
no |
no |
Settings to pass to the
execution of a step |
<ui_settings> |
no |
no |
Settings to pass to the UI
for this step. Not currently used. |
These are settings that a step might pass to the initialization
phase of its step class. These settings override the values in the
<create_settings> element for <step_implementation>.
You can configure these settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <create_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual step creation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings which are passed to the do() or doBatch()
method of the step (that's the method that actually performs the
step). You can configure the settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <run_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
Most predefined step implementations in MAT do not support any
run settings. The two implementations which do are
MAT.JavaCarafe.CarafeTokenizationStep and
MAT.JavaCarafe.CarafeTagStep.
The MAT.JavaCarafe.CarafeTagStep step implements automatic
tagging. Any step which implements automatic tagging can bear the
following additional attribute-value pairs:
Key |
Value |
Description |
---|---|---|
tagger_local |
"yes" |
By default, the MAT engine
will contact the MAT Web server to tag a document, because
the Web server has the capability of starting up and
monitoring a long-living tagger task. The reason this is
beneficial is that the Carafe tagger, like many model-based
taggers, has a fairly expensive startup cost. To block the
engine from contacting the Web server, and force it to start
up and shut down the tagger on its own, specify
tagger_local="yes". |
tagger_model |
a string, a filename of a
tagging model |
If the task does not have a
default model, the user must specify the location of the
tagger model. |
In addition, the Carafe tagging and tokenization steps support other run settings, documented here.
An individual run setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings that are intended to be passed unmodified to
the UI, in order to declaratively configure UI customizations for
particular tasks. At the moment, no tasks use this feature. You
can configure these settings either with a child <setting>
element, or with an attribute on the <settings> element
itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <ui_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual UI setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
These are settings that a specialized task might require which
the user wishes to be able to configure in XML, rather than by
modifying the source code for the specialized task. The chances
that a normal user will use this are extremely slim. These
settings are not inherited by task children.
You can configure the settings either with a child
<setting> element, or with an attribute on the
<settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual task-level setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the task-level setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the task-level setting. This element has no attributes or element children; its value is the text it delimits. |
MAT has some built-in tools to control Java Carafe and other Java
subprocesses. Using this element, you can declare default settings
for Java heap and stack sizes. If not set locally, these settings
are inherited from parent tasks.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
heap_size | a string | no | The value here is a value
for the heap size for the Java VM. It is passed to the
Java VM using the -Xmx argument. Values like 512M or 2G
are examples of expected values. This default value can be
overridden by declaring the empty string ("") in any
configuration context where the heap size can be specified
(see the Java Carafe engine
for examples). |
stack_size | a string | no | The value here is a value for the stack size for the Java VM. It is passed to the Java VM using the -Xss argument. Values like 4096k or 512k are examples of expected values. This default value can be overridden by declaring the empty string ("") in any configuration context where the heap size can be specified (see the Java Carafe engine for examples). |
Among the ways that tasks can be customized is the Web UI can be
customized in a number of ways. This process is quite complicated;
it's almost entirely code-oriened, and it's not documented at all.
This section is here for reference only; users who aren't really,
really brave shouldn't go anywhere near most of these
customizations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit_css | "no" |
no | If the parent task has CSS
customizations, as specified in the <css> element
below, they are inherited by default. Use this setting to
block inheritance. |
inherit_js | "no" |
no | If the parent task has Javascript customizations, as specified in the <css> element below, they are inherited by default. Use this setting to block inheritance. |
display_config | a string |
no | Each Web customization set
has a name, so that when the user selects a particular task,
the UI knows which customization set to use. Can be
inherited from parent tasks; a value of "" cancels the
inheritance. |
alphabetize_labels |
"no" |
no |
By default, the MAT UI orders the annotation labels alphabetically in the legend and the tag popup menu. If this attribute is set, the UI will list the annotation labels in the order they are defined in the <tags> element. Can be inherited from parent tasks; a value of "" cancels the inheritance. |
tokenless_autotag_delimiters |
a string |
no |
By default, if you ask the
MAT UI to autotag similar strings when you're annotating
without tokens, the only edge conditions that the UI
recognizes are whitespace and zone boundaries. If your match
abuts a punctuation mark, it will not recognize it as a
delimiter. If you want other edge conditions to be
recognized, you can list them in the value of this
attribute. (Remember, though, that you may have to use the
XML entity character codes for those characters which are
significant to XML syntax, so that the XML parsing doesn't
fail.) This setting can be inherited from parent tasks; a
value of "" cancels the inheritance. |
text_right_to_left |
"yes" |
no |
If specified, documents viewed in this task in the MAT UI will be treated as right-to-left text (e.g., Arabic). Can be inherited from parent tasks; a value of "" cancels the inheritance. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<js> | no | yes | The relative pathname of
the Javascript customizations. This path is relative to
the task directory. By convention, this file should be in
the "js" subdirectory. This element has no attributes or element children; its value is the text it delimits. |
<css> | no | yes | The relative pathname of
the CSS customizations. This path is relative to the task
directory. By convention, this file should be in the "css"
subdirectory. This element has no attributes or element children; its value is the text it delimits. |
<short_name> |
no |
no |
This is the name that the
UI will display in the upper left corner if this
customization is the only customization available. This
setting will be inherited by child tasks. This element has no attributes or element children; its value is the text it delimits. |
<long_name> |
no |
no |
This is the name that the
UI will use as the title of the Web page if this
customization is the only customization available. This
setting will be inherited by child tasks. This element has no attributes or element children; its value is the text it delimits. |
It's also possible to configure various dimensions of the model
build process in the task.xml file. The settings for this config
are identical to the command-line options available for the MATModelBuilder. There is no
default model build engine in a task.xml file; if you want to
build models, you must declare a model config.
MAT is delivered with a default Carafe
model builder.
You can have multiple <model_config> entries, as long as
they differ by the config_name attribute. If a named or default
model config isn't found when requested by MATModelBuilder or the
experiment engine, MAT will look for it in the parent task.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
class |
the name of a Python class |
yes |
This attribute names the
class which will be used as the model builder. The default
Carafe model builder class is
MAT.JavaCarafe.CarafeModelBuilder |
config_name |
a string |
no |
If present, a config name to
specify as the --config_name in MATModelBuilder, or for the
config_name attribute in <build_settings> in the
experiment engine. If omitted, this entry is the default
model config. There can be only one default. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<build_settings> | no | no |
The settings for this
model config |
The <build_settings> tag supports arbitrary attribute-value pairs which are passed to the model builder. See the documentation for the Carafe model builder to see which attributes should be supplied to that engine. You can configure these settings either with a child <setting> element, or with an attribute on the <settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <build_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual build setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
If you want to use workspace mode, you must declare how the
various workspace operations are implemented. These operations are
described here.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
inherit_operations | "no" |
no | By default, workspace
operation implementations are inherited from the task
parent, if not available locally. Use this attribute to
block inheritance. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<operation> | yes | yes | An individual operation. |
Specifies the implementation of a workspace operation. Note that
in spite of the fact that operations are associated with folders,
these operations are referenced only by name, because the
operations should be named uniquely.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the operation |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<settings> | no | no | The operation settings. |
The settings for the operation. What these settings are depend on what sort of operation it is. For instance, for operations which invoke the MAT engine, these settings will be the arguments to the MAT engine. For operations which invoke the MAT model builder, these settings will be the arguments to the MAT model builder. See the documentation on workspaces to find out what the options are for particular operations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual operation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
Step implementations associate a named step with an
implementation for that step (i.e., a Python class), perhaps in
the context of particular workflows. The effect of each named step
in a task is global; e.g., the "tag" step might add content
annotations. However, the way that effect is achieved may differ
among step implementations; e.g., one implementation of the tag
step may involve hand annotation, or there may be multiple
possibilities for adding the tags automatically. By default, step
implementations are inherited from the parent.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<step> | no | yes | An individual step
implementation. |
Each individual step implementation specifies the Python class,
at least.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of a step as it is
used in workflows. These are values of the "name" attribute
for the <workflow> <step> element, not the
"pretty_name" attribute. |
class | a string, the name of a
Python class |
yes | The Python class, including
its module name, which implements this step. |
workflows | a comma-delimited string of
workflow names |
no | The workflow contexts in
which this implementation holds. Different workflows can
have different implementations for the same named step. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<create_settings> | no | no |
Default settings for
initializing the step. |
These are settings that a step might pass to the initialization
phase of its step class. These settings can be overridden by the
values in the <create_settings> element for <step> in
the <workflow> element. You can configure these settings
either with a child <setting> element, or with an attribute
on the <settings> element itself; they're interchangeable.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string |
no | The <create_settings> tag supports arbitrary attribute-value pairs. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<setting> | no | yes | An attribute-value pair. |
An individual step creation setting.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<name> | yes | no | The name of the setting. This element has no attributes or element children; its value is the text it delimits. |
<value> | yes | no | The value of the setting. This element has no attributes or element children; its value is the text it delimits. |
The <annotation_set_descriptors> element allows you to
define multiple annotation sets. In the current implementation,
you should have only one, which should have its attributes set as
shown immediately below. You can inherit annotations from other
tasks.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
all_annotations_known | "yes" |
no | By default, the task leaves
its annotation sets "open"; i.e., if the task encounters an
unknown annotation label, it won't raise an error. If you
provide the value "yes" for this attribute, an error will be
raised if the task encounters an unknown annotation. |
inherit | a comma-separated list of
labels to inherit |
no | You can inherit annotations
from other tasks, either by label or by category (see the
"category" attribute of <annotation_set_descriptor>.
To inherit an annotation by label, simply list it; to
inherit a category, list "category:" + the category name. A typical value for this attribute is "category:zone,category:token", which inherits the annotations for the zone and token categories from the parent (usually root) task. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<annotation_set_descriptor> | no | yes |
The <annotation_set_descriptor> element is described in
detail elsewhere (eventually,
you'll be able to specify it in its own file, and share these
files among tasks). The only elements of this type you should be
declaring should have category="content" and name="content".
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
category | no | The category of the
annotation set descriptor. Eventually, we intend for these
values to be user-definable (aside from a few predetermined
values like "zone" and "token"), but for now, the value for
this attribute for those descriptors you define should be
"content". |
|
name |
yes |
The name of the annotation set descriptor.
This attribute is distinguished from the category attribute
in that, eventually, we'll treat the category attribute as a
functional one, which can specify values which different
descriptors can fill in different tasks. Eventually, we
intend for the value of the "name" attribute to be
user-definable, but for now, the value for this attribute
for those descriptors you define should be "content". |
This element defines all the display-related properties in the
MAT UI of the elements defined in the
<annotation_set_descriptors> element. Most of what you can
do here is define the display-related properties of labels,
although you can also define some of the properties of attributes,
and also define groups for hierarchical annotation displays.
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<label> | no | yes | Defines the display-related
properties for a true or effective label |
<attribute> | no | yes | Defines the display-related
properties for an attribute of a particular label |
<label_group> | no | yes | Defines groups for
hierarchical annotation displays |
This element defines the display-related properties of true or
effective labels.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The true or effective label
to which this definition applies. |
accelerator | a single-character string |
no | If specified, this accelerator will be available in the annotation selection menu in the Web UI; if the user presses this key, this annotation will be selected for the span, just as if the element in the menu had been chosen. |
edit_immediately | "yes" |
no | If a label has attributes, or
if it can be the value of an annotation-valued attribute, it
is possible to edit the annotation in the UI, either in a
popup dialog or in a detail tab. If the annotation is
spanless, this editor will appear automatically when the
annotation is created; if it is spanned, it will not. If you
provide this attribute-value pair, the editor will appear
automatically when the annotation is created, whether or not
it's spanned. |
presented_name | a format string |
no | In the UI, there are many
places (e.g., in the annotation tables) where the annotation
can be described, and by default, the description for
spanned annotations is the covered text, while the
description for spanless annotations is the annotation ID.
Sometimes, this name isn't what you would prefer it to be,
and you can use this attribute to define the name you
prefer. The syntax for the value of this string is described
immediately below. |
css | a string of legal CSS |
no | If the label is a spanned label, the UI will apply this CSS on a token-by-token basis to any span labeled by this annotation. If the label is a spanless label, the CSS will be applied to its icon in the spanless sidebar. Because there's no text in the spanless sidebar, and because annotations can overlap and be displayed in an exploded, stacked representation in spanned hand annotation and in comparison, it's probably best to ensure that the primary aspect of this CSS is background styling. |
The presented_name attribute is a simple string, with format
directive of the form $(...). The values within the parentheses
can be any of the attributes of the label (in which case the
format directive will be replaced by the attribute value), or one
of the special values listed below. The format directive can also
contain key-value pairs, as in $(val:a=b,c=d). We describe the
possible key-value pairs below as well.
special value |
interpretation |
---|---|
_start |
The start index of the spanned annotation |
_end |
The end index of the spanned annotation |
_parent |
The parent annotation(s) (i.e., the
annotation(s) which have this annotation as a value of an
annotation-valued attribute) |
_label |
The true or effective label of the annotation |
_text |
The spanned text of the spanned annotation |
key |
available for |
possible value |
interpretation |
---|---|---|---|
truncate |
_text |
integer |
If specified, the UI will ensure that the
spanned text is no longer than n characters long. This
value cannot be less than 5. The UI will show the
beginning and end of the text, and show the truncated
medial text with ellipses (...). |
truncate |
_parent |
integer |
If specified, the UI will limit the number
of parent annotations listed, and indicate the remainder
of the list with ellipses (...). |
showLabel |
_parent, any annotation-valued attribute |
"no" |
By default, when the UI displays an
annotation attribute value as part of the name of an
annotation, it contains the label of the annotation value.
If you don't want the label displayed, provide this
key-value pair. |
showIndices |
_parent, any annotation-valued attribute | "yes" |
By default, when the UI displays an
annotation attribute value as part of the name of an
annotation,it does not display its start and end indices.
If you want these indices displayed, provide this
key-value pair. |
showFormattedName |
_parent, any annotation-valued attribute | "yes" |
By default, when the UI displays an
annotation attribute value as part of the name of an
annotation, it does not display the value's formatted
name. If you want the formatted name displayed, provide
this key-value pair. |
showFeatures |
_parent, any annotation-valued attribute | "yes" |
By default, when the UI displays an
annotation attribute value as part of the name of an
annotation, it does not display the values attribute-value
pairs. If you want these pairs displayed, provide this
key-value pair. |
So, for example, if you want the presented name of your
annotation to contain the text truncated to 20 characters, your
value for presented_name would be "$(_text:truncate=20)".
This element defines the display-related properties for specific
annotation attributes.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of a known attribute |
of_annotation | a string |
yes | The name of a known true or
effective label |
editor_style | "long" |
no | There are two ways of
providing attribute values for non-choice string attributes
in the UI: via a short typein window or via a multi-line
typein window. By default, a short typein window will be
used. If you provide this attribute-value pair, a long
typein window will be used. Ignored if the attribute is not
a string attribute, or if it's a choice attribute. |
read_only |
"yes" |
no |
Attributes are typically editable. If for
some reason you don't want to be able to edit an attribute
directly in the annotation editor (e.g., the attribute value
is automatically populated by a custom editor, as described
below), use this setting. |
custom_editor | a string |
no | You may have a string attribute which is actually a date, which you want to use a calendar widget to populate; or you might want to look up the annotation text in a database, and use the results to populate the attribute value. If you're willing to do some programming, you can use this attribute to specify an arbitrary JavaScript function for a string, int, or float attribute, to use as its editor. You can define your function in your task directory in your Javascript customization file. Unfortunately, we don't really have the resources to document the API this function has to conform to; either dig through the source code yourself, or ask us for help. |
custom_editor_is_multiattribute | "yes" |
no | If you've associated a
custom_editor with this attribute, this attribute-value pair
tells the UI that the editor will fill multiple attributes. |
custom_editor_button_label |
a string |
no |
If you have a custom editor, but you want the
label to be something other than "Edit" (let's say the value
is automatically calculated when you press the button), use
this. |
url_link |
a string |
no |
When the annotation attribute is displayed in
the annotation editor or the annotation table, the
annotation will be used to construct a URL link. The syntax
is identical to that of presented_name above, except that
of the special values, only _text is recognized, and no
key-value pairs are recognized within each directive. So,
e.g., if you want the link on an annotation attribute to
search for the spanned text in Google, the value of url_link
would be http://www.google.com/search?q=$(_text) |
Under some circumstances, you might want to create cascaded
annotation menus in the MAT UI, perhaps in order to group together
similar annotations, or provide options for more or less general
annotations, or to compress the screen real estate taken up by the
annotation popup menu. You can use the <label_group> element
to accomplish this.
Each label group has a name. This might correspond to a known
true or effective label (in which case it refers to that
annotation), or it might be a previously unknown name, in which
case it serves merely as a group. The children of each label group
can be actual annotation names, or other known label groups. The
annotations the label group refers to must be content annotations.
If the label group is not otherwise known, it has the option of
declaring CSS styling for the menu entry.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | the name of the label group, either a new name or the name of a true or effective content annotation |
children | a string |
yes | a comma-delimited sequence of names, either names of existing content annotations or of known label groups |
css | a string |
no | Like the css attribute on the <label> element above. Used to assign styling to otherwise unknown label groups. |
When you run the MATScore engine, or
produce a visual comparison of
annotations, MAT uses a set of heuristics to determine the best
pairing of annotations. You can affect this process using the
<similarity_profile> element.
Similarity profiles are not inherited.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
no | The name of the profile, for
use when creating comparison documents or scoring. If no
name is provided, this is the default profile for the task.
There can be only one unnamed profile. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<stratum> | no | yes | The comparison algorithm is stratified
(see the algorithm for
more details). You can use this element to define the
strata, rather than allowing them to be inferred. |
<tag_profile> | no | yes | There's a default similarity
profile for spanned and spanless annotations. If you want to
declare your own profile explicitly, you can do that with
this element. |
The comparison algorithm is stratified (see the algorithm for more details). You can use this element to define the strata, rather than allowing them to be inferred.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The labels in this stratum.
Note that these labels must be true labels, not effective
labels. |
There's a default similarity profile for spanned and spanless
annotations. If you want to declare your own profile explicitly,
you can do that with this element. See the algorithm for details on how to
use these.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The labels to which this
profile applies. Note that these labels must be true labels,
not effective labels. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<attr_equivalences> |
no |
yes |
Equivalences for attributes among the various
labels in the profile. |
<dimension> | yes | yes | One dimension of the profile. |
The true labels in your tag profile may vary in their attribute
names, but you may still want these attributes to be comparable.
This element allows you to declare your equivalences. See the algorithm for details about the
various dimensions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the equivalence.
See the algorithm for a list of legal names and their
interpretations. |
attrs | a comma-separated stering |
yes | All attributes which stand in
this equivalence. Each label in your profile must have at
least one of these attributes, and no attribute name can
appear more than once among the equivalences in the profile. |
Each profile consists of a number of dimensions, which define
some aspect of the annotation to use in comparison, along with the
method to be used for comparison and the relative weight of the
dimension. See the algorithm
for details about the various dimensions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the dimension.
See the algorithm for a list of legal names and their
interpretations. |
weight | a number |
yes | The relative weight of the
dimension. The weights of all the dimensions will be
normalized. |
param_digester_method | a Python function name |
no | In rare circumstances, the
dimension method may accept parameters (see <attr>
below) and these parameters may need to be interpreted
(e.g., "yes" -> True). The full name of the function,
including the module it's in, must be specified. |
aggregator_method | a Python function name |
no | If special handling is required for a dimension which has an aggregation value, this option allows you to declare the handler. The full name of the function, including the module it's in, must be specified. |
method | a string |
no | The method associated with
the dimension, if not the default method. See the algorithm
for a list of legal names. |
<attr> | a string | no | the <dimension> element supports arbitrary attribute-value pairs |
When you run the MATScore engine,
you can control how the scored elements are aggregated,
decomposed, or filtered in the scoring output. See the algorithm for details on how to
use this.
Score profiles are not inherited.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
no | The name of the profile, for use when scoring. If no name is provided, this is the default profile for the task. There can be only one unnamed profile. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<aggregation> | no | yes | A set of labels to aggregate
as a separate entry. |
<attr_decomposition> | no | yes | An attribute-based
decomposition of particular labels to report as a separate
entry. |
<partition_decomposition> | no | yes | A function-based
decomposition of particular labels to report as a separate
entry. |
<label_limitation> |
no |
no |
A list of labels to restrict the overall
reporting to. |
Under normal circumstances, annotations are aggregated per
document and per run by effective label (if available) or true
label, or by equivalence classes passed to MATScore, and then all
together into a single heap. You can add other aggregations of
true labels using this element.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the aggregation
as it will appear in the output spreadsheet |
true_labels | a comma-separated string of
labels |
yes | The true labels in this
aggregation. |
Under normal circumstances, the only way to decompose true labels
in the score output is by effective label. If you want to
decompose them by a particular attribute (e.g., you want to see
the score for ENAMEXes when type = NOM), you can use this element.
Decompositions can overlap with each other.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The true labels to which this
decomposition applies. |
attrs | a comma-separated string of
attrs |
yes | The names of attributes
defined for all the listed labels. There will be a separate
decomposition for each tuple of values for these attrs.
The name of the decomposition in the score output will
be <attr1>=<val1> <attr2>=<val2>... |
Under normal circumstances, the only way to decompose true labels
in the score output is by effective label. If you want to
decompose them by a Python function, you can use this element.
Decompositions can overlap with each other.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | The true labels to which this
decomposition applies. |
method | a Python function name |
yes | This function must take a single argument, which will be an annotation, and return a value. For instance, if you're evaluating a geotagger, and the tagger provides a country attribute for the location, and you want to decompose location scores by US and non-US, you'd define a function which returns "US" if the country attribute is "US", and "non-US" otherwise. The full name of the function, including the module it's in, must be specified. The name of the decomposition in the score output will be <bare function name>=<val>. |
The scorer will pair all annotations which are not specified as
being ignored. Sometimes, you might need to pair some annotations
as part of the scoring process (let's say they're arguments of
relations, for instance), but you don't want them in the final
output, even though you can't ignore them. You can use this
element to provide that filter.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
true_labels | a comma-separated string of
labels |
yes | Only these true labels (and
the effective labels that are defined on them) will be
included in the scoring output. |