Use cases for the XML format for the annotation set descriptors
in the task files (see "Creating a New
Task")
are described in this document. The reference document is found here. Click here for a split-screen view.
At the moment, most of the task XML customizations are quite
complex, and not yet documented. Here, we focus on the ways that
the user can specify various variations on defining their content
annotations. For examples of how to customize the UI display of
your annotations, see here.
The simplest example of customizing your annotations in your
task.xml file is inheriting all your structural annotations and
adding your own content annotations. The role of the different
annotation categories is described here.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
<annotation label="TAG1"/>
<annotation label="TAG2"/>
</annotation_set_descriptor>
</annotation_set_descriptors>
So here, we've inherited the structure annotations and defined
two content annotations, TAG1 and TAG2. The content annotations
are both spanned annotations, by default.
Not all content annotations are spanned annotations; some
annotations aren't anchored directly to the text. You can find
examples of such annotations in Tutorial
8. It's easy to define these annotations:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="SPANLESS1" span="no"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
The UI effects of defining spanless annotations are described here.
Annotations, spanned or spanless, can have attributes. These
attributes can be strings (the default), floats, integers,
Booleans, or other annotations, or sets or lists of these types.
Here's how to define a simple string attribute:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
String attributes can have default values, or choices:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr" default="Pronoun">
<choice>Pronoun</choice>
<choice>Nominal</choice>
<choice>Proper name</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
Integer and float attributes can be defined with accepted ranges:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="int" name="int_attr">
<range from="10" to="20"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
Annotation attributes must have label restrictions that specify
what types of annotations can fill this attribute value:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" name="annot_attr">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
And any of these attributes can be set or list aggregations:
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" aggregation="set" name="mentions">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>
In some situations, you may want to define a single content
annotation, which has a distinguished attribute value. One common
example of this in language processing arises in tagging for
so-called named entities (people, locations, organizations). One
common tagging scheme assigns a single ENAMEX tag to these
entities, and distinguishes among them using the value of the
"type" attribute. This label + attribute/value pair is assigned a
notional name, for use in the UI, scorer, etc. We call these effective
labels.
Effective labels must be defined on choice restrictions of string
or integer attributes. If an effective label is declared for one
of the choices, there must be a declaration for all of them. In
other words, the choices must completely partition the label.
<annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>