The XML format for the annotation set descriptors in the task files (see "Creating a New Task") is described in this document. Use cases can be found here. Click here for a split-screen view.
<annotation_set_descriptor>
<annotation>
<attribute>
<range>
<choice>
<label_restriction>
<attributes>
The toplevel element. Each annotation set descriptor corresponds
to a subset of the annotations used in your task. At the moment,
there's no reason to define more than one set of content
annotations, and we recommend that you define all your content
annotations in a single set.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the annotation
set descriptor. For content annotations, this should be
"content". In a future release, there will be no
restrictions on this name. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<annotation> | no | yes | A possible annotation in the
descriptor. |
<attribute> | no | yes | An attribute for some
annotation(s) in the descriptor. |
Each annotation type in the set descriptor is defined by one of
these elements. Note that the attributes for the annotation are
defined as sister elements, not children.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
label | a string |
yes | The label for the annotation
type. |
all_attributes_known | "yes" |
no | Under normal circumstances,
MAT does not object to finding "undeclared" attributes of
annotations in the documents it reads. If this XML
attribute/value pair is present, the annotation type is
"locked"; in other words, when MAT reads in a document in
the context of a task corresponding to this annotation set
descriptor, it will object to any "stray" attributes which
appear on annotations of this type. |
span | "no" |
no | By default, annotation types
define spanned annotations. If this XML attribute/value pair
is present, this type will be spanless. |
Attributes are defined separately from the annotations that bear
them. The reason for this is that in a future release of MAT,
there will be a tight coupling between annotation set descriptors
and workflow steps, and there can be workflow steps which only add
an attribute to an existing annotation (e.g., part-of-speech
tagging implemented as adding a "POS" attribute to the "lex" tag).
The default value of every attribute is null unless otherwise
specified.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
name | a string |
yes | The name of the attribute. |
of_annotation | a comma-separated sequence of
strings |
yes | The annotation label or
labels which bear this attribute. |
type | one of "string", "int",
"float", "boolean", "annotation" |
no | The type of the attribute
value. Default is "string". |
aggregation | one of "set", "list" |
no | By default, the values of
attributes are singletons. If you want the value to be a set
or list of the objects of the specified type, indicate it
with this XML attribute. |
distinguishing_attribute_for_equality | "yes" |
no | Deprecated. The
chances are that you'll never, ever need this XML attribute. The current reconciliation tool requires instructions about which attributes to use when comparing annotations. By default, MAT doesn't treat any attributes as discriminative; only the label is examined. So a "PERSON" annotation in the reference document and the hypothesis document count as equal if and only if they span the same text; no additional attributes are examined. The only automatic exception to this are attributes which have <choice> children which define effective labels. So if you define a task with a single "ENAMEX" annotation label, with <attr_set> elements to mark the properties of the values of a "TYPE" attribute, the "TYPE" attribute will be treated as discriminative. If that's not enough - and it's enough for almost all applications - you can use this XML attribute in your annotation set descriptor to make further distinctions. This XML attribute is deprecated because the future reconciliation tool will use the same pairing algorithm used by the scorer. |
default | various |
no | If present, a default value for this attribute. The value must be of the appropriate type. For "boolean" attributes, use "yes" or "no". Defaults are not available for "annotation" attributes. This attribute can't be defined alongside the "default_is_text_span" attribute. |
default_is_text_span | "yes" |
no | If present, the default value
for this attribute is digested from the span of the
annotation (so this option is only available for attributes
of spanned annotations). If the span can't be digested into
a value of the appropriate type, no default will be set.
This attribute can't be defined alongside the "default"
attribute. This attribute is available for "int", "string", and "float" attributes. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<range> | no | no | For "int" and "float"
attributes, the range of permitted values. |
<choice> | no | yes | For "string" and "int"
attributes, one of the possible choices for values. If the
attribute has at least one <choice> child, the choices
listed are the only permitted values. |
<label_restriction> | no | yes | For "annotation" attributes,
a restriction on attribute value. For any "annotation"
attribute, there must be at least one label restriction.
Every value must satisfy at least one of the restrictions. |
For "int" and "float" attributes, you can define a range for the
possible attribute values. You can define both endpoints of the
range, or only one.
For "int" attributes, <choice> and <range> cannot cooccur.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
to | an int or float |
no | The minimum (inclusive)
endpoint of the range. The required type of the value
depends on the type of the attribute that bears it. |
from | an int or float |
no | The maximum (inclusive) endpoint of the range. The required type of the value depends on the type of the attribute that bears it. |
For "string" and "int" attributes, you may provide a list of explicit choices. If the attribute has at least one <choice> child, the choices listed are the only permitted values. You can further specify choices as "effective labels", which are notional labels which are actually implemented as a label + attribute/value pair, e.g., "PERSON" implemented as "ENAMEX" + "type" = "PER". You can refer to these effective labels in various places in the task and annotation set descriptor specifications, as marked.
If one <choice> element has an effective label, they all
must; and no more than one attribute for a given annotation type
may define effective labels. The effective label must be distinct
from all other effective labels in the set, and all other
annotation labels.
For "int" attributes, <choice> and <range> cannot
cooccur.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
effective_label | a string |
no | If present, the effective
label corresponding to the combination of this annotation
with this attribute value. |
For "annotation" attributes, you must define at least one label
restriction. Every value of the attribute must satisfy at least
one of the restrictions. The restrictions can be a simple label or
effective label, or they can also bear attribute/value
restrictions.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
label | a string |
yes | A label or effective label
which the attribute value must match in order to satisfy
this restriction. |
Element |
Obligatory? |
Repeatable? |
Description |
---|---|---|---|
<attributes> | no | no | Arbitrary attribute/value
pairs which must also be satisfied. |
Label restrictions may also bear attribute/value pairs which must
be satisfied in order to satisfy the restriction.
XML attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string | no | the <attributes>
element supports arbitrary XML attribute-value pairs. Each
<attr> must be defined for the label restriction
label, and must be a choice attribute (i.e., either a string
or int attribute which has choices associated with it). |