Annotation set descriptor XML reference

The XML format for the annotation set descriptors in the task files (see "Creating a New Task") is described in this document. Use cases can be found here. Click here for a split-screen view.

Element hierarchy

   <annotation_set_descriptor>
     <annotation>
     <attribute>
       <range>
       <choice>
       <label_restriction>
         <attributes>

<annotation_set_descriptor>

The toplevel element. Each annotation set descriptor corresponds to a subset of the annotations used in your task. At the moment, there's no reason to define more than one set of content annotations, and we recommend that you define all your content annotations in a single set.

XML attributes

XML attribute
Value
Obligatory?
Description
name a string
yes The name of the annotation set descriptor. For content annotations, this should be "content". In a future release, there will be no restrictions on this name.

Children

Element
Obligatory?
Repeatable?
Description
<annotation> no yes A possible annotation in the descriptor.
<attribute> no yes An attribute for some annotation(s) in the descriptor.

<annotation> (of <annotation_set_descriptor>)

Each annotation type in the set descriptor is defined by one of these elements. Note that the attributes for the annotation are defined as sister elements, not children.

XML attributes

XML attribute
Value
Obligatory?
Description
label a string
yes The label for the annotation type.
all_attributes_known "yes"
no Under normal circumstances, MAT does not object to finding "undeclared" attributes of annotations in the documents it reads. If this XML attribute/value pair is present, the annotation type is "locked"; in other words, when MAT reads in a document in the context of a task corresponding to this annotation set descriptor, it will object to any "stray" attributes which appear on annotations of this type.
span "no"
no By default, annotation types define spanned annotations. If this XML attribute/value pair is present, this type will be spanless.

<attribute> (of <annotation_set_descriptor>)

Attributes are defined separately from the annotations that bear them. The reason for this is that in a future release of MAT, there will be a tight coupling between annotation set descriptors and workflow steps, and there can be workflow steps which only add an attribute to an existing annotation (e.g., part-of-speech tagging implemented as adding a "POS" attribute to the "lex" tag).

The default value of every attribute is null unless otherwise specified.

XML attributes

XML attribute
Value
Obligatory?
Description
name a string
yes The name of the attribute.
of_annotation a comma-separated sequence of strings
yes The annotation label or labels which bear this attribute.
type one of "string", "int", "float", "boolean", "annotation"
no The type of the attribute value. Default is "string".
aggregation one of "set", "list"
no By default, the values of attributes are singletons. If you want the value to be a set or list of the objects of the specified type, indicate it with this XML attribute.
distinguishing_attribute_for_equality "yes"
no Deprecated. The chances are that you'll never, ever need this XML attribute.

The current reconciliation tool requires instructions about which attributes to use when comparing annotations. By default, MAT doesn't treat any attributes as discriminative; only the label is examined. So a "PERSON" annotation in the reference document and the hypothesis document count as equal if and only if they span the same text; no additional attributes are examined.

The only automatic exception to this are attributes which have <choice> children which define effective labels. So if you define a task with a single "ENAMEX" annotation label, with  <attr_set> elements to mark the properties of the values of a "TYPE" attribute, the "TYPE" attribute will be treated as discriminative.

If that's not enough - and it's enough for almost all applications - you can use this XML attribute in your annotation set descriptor to make further distinctions.

This XML attribute is deprecated because the future reconciliation tool will use the same pairing algorithm used by the scorer.
default various
no If present, a default value for this attribute. The value must be of the appropriate type. For "boolean" attributes, use "yes" or "no". Defaults are not available for "annotation" attributes. This attribute can't be defined alongside the "default_is_text_span" attribute.
default_is_text_span "yes"
no If present, the default value for this attribute is digested from the span of the annotation (so this option is only available for attributes of spanned annotations). If the span can't be digested into a value of the appropriate type, no default will be set. This attribute can't be defined alongside the "default" attribute.

This attribute is available for "int", "string", and "float" attributes.

Children

Element
Obligatory?
Repeatable?
Description
<range> no no For "int" and "float" attributes, the range of permitted values.
<choice> no yes For "string" and "int" attributes, one of the possible choices for values. If the attribute has at least one <choice> child, the choices listed are the only permitted values.
<label_restriction> no yes For "annotation" attributes, a restriction on attribute value. For any "annotation" attribute, there must be at least one label restriction. Every value must satisfy at least one of the restrictions.

<range> (of <attribute>)

For "int" and "float" attributes, you can define a range for the possible attribute values. You can define both endpoints of the range, or only one.

For "int" attributes, <choice> and <range> cannot cooccur.

XML attributes

XML attribute
Value
Obligatory?
Description
to an int or float
no The minimum (inclusive) endpoint of the range. The required type of the value depends on the type of the attribute that bears it.
from an int or float
no The maximum (inclusive) endpoint of the range. The required type of the value depends on the type of the attribute that bears it.

<choice>

For "string" and "int" attributes, you may provide a list of explicit choices. If the attribute has at least one <choice> child, the choices listed are the only permitted values. You can further specify choices as "effective labels", which are notional labels which are actually implemented as a label + attribute/value pair, e.g., "PERSON" implemented as "ENAMEX" + "type" = "PER". You can refer to these effective labels in various places in the task and annotation set descriptor specifications, as marked.

If one <choice> element has an effective label, they all must; and no more than one attribute for a given annotation type may define effective labels. The effective label must be distinct from all other effective labels in the set, and all other annotation labels.

For "int" attributes, <choice> and <range> cannot cooccur.

XML attributes

XML attribute
Value
Obligatory?
Description
effective_label a string
no If present, the effective label corresponding to the combination of this annotation with this attribute value.

<label_restriction> (of <attribute>)

For "annotation" attributes, you must define at least one label restriction. Every value of the attribute must satisfy at least one of the restrictions. The restrictions can be a simple label or effective label, or they can also bear attribute/value restrictions.

XML attributes

XML attribute
Value
Obligatory?
Description
label a string
yes A label or effective label which the attribute value must match in order to satisfy this restriction.

Children

Element
Obligatory?
Repeatable?
Description
<attributes> no no Arbitrary attribute/value pairs which must also be satisfied.

<attributes> (of <label_restriction>)

Label restrictions may also bear attribute/value pairs which must be satisfied in order to satisfy the restriction.

XML attributes

XML attribute
Value
Obligatory?
Description
<attr> a string no the <attributes> element supports arbitrary XML attribute-value pairs. Each <attr> must be defined for the label restriction label, and must be a choice attribute (i.e., either a string or int attribute which has choices associated with it).