Use cases for the XML format for the annotation set descriptors
in the task files (see "Creating a New
Task")
are described in this document. The reference document is found here. Click here for a split-screen view.
At the moment, most of the task XML customizations are quite
complex, and not yet documented. Here, we focus on the ways that
the user can specify various variations on defining their content
annotations. For examples of how to customize the UI display of
your annotations, see here.
The simplest example of customizing your annotations in your
task.xml file is inheriting all your structural annotations and
adding your own content annotations. The role of the different
annotation categories is described here.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
<annotation label="TAG1"/>
<annotation label="TAG2"/>
</annotation_set_descriptor>
</annotation_set_descriptors>
So here, we've inherited the structure annotations and defined
two content annotations, TAG1 and TAG2. The content annotations
are both spanned annotations, by default.
Not all content annotations are spanned annotations; some
annotations aren't anchored directly to the text. You can find
examples of such annotations in Tutorial
8. It's easy to define these annotations:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="SPANLESS1" span="no"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
The UI effects of defining spanless annotations are described here.
Annotations, spanned or spanless, can have attributes. These
attributes can be strings (the default), floats, integers,
Booleans, or other annotations, or sets or lists of these types.
Here's how to define a simple string attribute:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
String attributes can have default values, or choices:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr" default="Pronoun">
<choice>Pronoun</choice>
<choice>Nominal</choice>
<choice>Proper name</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
Integer and float attributes can be defined with accepted ranges:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="int" name="int_attr">
<range from="10" to="20"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
Annotation attributes must have label restrictions that specify
what types of annotations can fill this attribute value (more
examples here):
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" name="annot_attr">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
And any of these attributes can be set or list aggregations:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" aggregation="set" name="mentions">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
In some situations, you may want to define a single content
annotation, which has a distinguished attribute value. One common
example of this in language processing arises in tagging for
so-called named entities (people, locations, organizations). One
common tagging scheme assigns a single ENAMEX tag to these
entities, and distinguishes among them using the value of the
"type" attribute. This label + attribute/value pair is assigned a
notional name, for use in the UI, scorer, etc. We call these effective
labels.
Effective labels must be defined on choice restrictions of string
or integer attributes. If an effective label is declared for one
of the choices, there must be a declaration for all of them. In
other words, the choices must completely partition the label.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
You can define complex restrictions on annotation-valued
attributes in a number of ways. These restrictions consist of a
label and its attributes; the attributes must be choice attributes
(i.e., string or integer attributes with choices defined). The
availability of these restrictions is independent of whether an
effective label is defined for the attribute.
Here's an example fragment. It starts with the effective label
attribute definition from the previous example, but defines a
second (nonsensical) integer choice attribute:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
<attribute of_annotation="ENAMEX" type="int" name="size">
<choice>0</choice>
<choice>1</choice>
</attribute>
<!-- and now, the annotation-valued attribute -->
<annotation label="LOCATED"/>
<attribute of_annotation="LOCATED" name="who" type="annotation">
<label_restriction label="ENAMEX">
<attributes type="PERSON" size="1"/>
</label_restriction>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
The label restriction itself can refer either to a true or an
effective label, and the effective label can be combined with
additional attribute restrictions:
<annotation label="LOCATED"/>
<attribute of_annotation="LOCATED" name="who" type="annotation">
<label_restriction label="PERSON">
<attributes size="1"/>
</label_restriction>
</attribute>
As you can see from the previous example, you can use
annotation-valued attributes and label restrictions to create
relations among annotations. This is the only facility that MAT
provides for making these connections. We acknowledge that this
approach has limitations:
So, for instance, how might you represent an array of time
restrictions (before, after, etc.) on an event? Here are three
different strategies.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- if you're anticipating multiple times for a restriction type,
make these set aggregations -->
<attribute of_annotation="EVENT" type="annotation" name="before">
<label_restriction label="TIME"/>
<attribute of_annotation="EVENT" type="annotation" name="after">
<label_restriction label="TIME"/> ...
</annotation_set_descriptor>
</annotation_set_descriptors>
The obvious problem with this strategy is that you might have
many, many temporal relations you care about, and/or you may want
to provide attributes for the temporal relations.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="BEFORE"/>
<attribute of_annotation="BEFORE" type="annotation" name="event">
<label_restriction label="EVENT"/>
</attribute>
<attribute of_annotation="BEFORE" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
You could generalize this strategy by having a single temporal
relation with an attribute to indicate what kind of temporal
relation it is:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="TEMPORAL"/>
<attribute of_annotation="TEMPORAL" type="annotation" name="event">
<label_restriction label="EVENT"/>
</attribute>
<attribute of_annotation="TEMPORAL" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
<attribute of_annotation="TEMPORAL" name="type">
<choice>BEFORE</choice>
<choice>AFTER</choice>
...
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
The obvious problem with this strategy is that the temporal
relations are separated from the events they modify (because
there's no way of showing or representing relations as subordinate
attributes).
This strategy is a combination of the first two:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="TIME"/>
<!-- this annotation can be spanned or spanless -->
<annotation label="EVENT"/>
<!-- so can this annotation --> <annotation label="TEMPORAL"/>
<attribute of_annotation="TEMPORAL" type="annotation" name="time">
<label_restriction label="TIME"/>
</attribute>
<attribute of_annotation="TEMPORAL" name="type">
<choice>BEFORE</choice>
<choice>AFTER</choice>
...
</attribute>
<attribute of_annotation="EVENT" type="annotation" aggregation="set" name="temporal">
<label_restriction label="TEMPORAL"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
The distinction here is subtle: instead of TEMPORAL being a
two-place relation between an event and a time, it's got only one
argument, the time, and its relation to the EVENT is represented
by the its presence in the "temporal" set-aggregation
annotation-valued attribute.
The obvious disadvantage to this strategy is that it doesn't
correspond trivially to what we'd think of as the "correct event
logic". However, given that we're talking about annotations, not
objects in a knowledge representation, it might ultimately be the
proper compromise.