Annotation set descriptor XML use cases

Use cases for the XML format for the annotation set descriptors in the task files (see "Creating a New Task") are described in this document. The reference document is found here. Click here for a split-screen view.

At the moment, most of the task XML customizations are quite complex, and not yet documented. Here, we focus on the ways that the user can specify various variations on defining their content annotations. For examples of how to customize the UI display of your annotations, see here.

Defining content annotations

The simplest example of customizing your annotations in your task.xml file is inheriting all your structural annotations and adding your own content annotations. The role of the different annotation categories is described here.

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
<annotation label="TAG1"/>
<annotation label="TAG2"/>
</annotation_set_descriptor>
</annotation_set_descriptors>

So here, we've inherited the structure annotations and defined two content annotations, TAG1 and TAG2. The content annotations are both spanned annotations, by default.

Defining spanless content annotations

Not all content annotations are spanned annotations; some annotations aren't anchored directly to the text. You can find examples of such annotations in Tutorial 8. It's easy to define these annotations:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="SPANLESS1" span="no"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

The UI effects of defining spanless annotations are described here.

Defining attributes

Annotations, spanned or spanless, can have attributes. These attributes can be strings (the default), floats, integers, Booleans, or other annotations, or sets or lists of these types. Here's how to define a simple string attribute:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr"/>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

String attributes can have default values, or choices:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" name="string_attr" default="Pronoun">
<choice>Pronoun</choice>
<choice>Nominal</choice>
<choice>Proper name</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Integer and float attributes can be defined with accepted ranges:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="int" name="int_attr">
<range from="10" to="20"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Annotation attributes must have label restrictions that specify what types of annotations can fill this attribute value:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" name="annot_attr">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

And any of these attributes can be set or list aggregations:

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<attribute of_annotation="TAG1" type="annotation" aggregation="set" name="mentions">
<label_restriction label="TAG2"/>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>

Defining a single content annotation, partitioned by attribute values

In some situations, you may want to define a single content annotation, which has a distinguished attribute value. One common example of this in language processing arises in tagging for so-called named entities (people, locations, organizations). One common tagging scheme assigns a single ENAMEX tag to these entities, and distinguishes among them using the value of the "type" attribute. This label + attribute/value pair is assigned a notional name, for use in the UI, scorer, etc. We call these effective labels.

Effective labels must be defined on choice restrictions of string or integer attributes. If an effective label is declared for one of the choices, there must be a declaration for all of them. In other words, the choices must completely partition the label.

  <annotation_set_descriptors inherit="category:zone,category:token">
<annotation_set_descriptor name="content" category="content">
...
<annotation label="ENAMEX"/>
<attribute of_annotation="ENAMEX" name="type">
<choice effective_label="PERSON">PER</choice>
<choice effective_label="ORGANIZATION">ORG</choice>
<choice effective_label="LOCATION">LOC</choice>
</attribute>
...
</annotation_set_descriptor>
</annotation_set_descriptors>