Task XML Use Cases

Use cases for the XML format for the task files (see "Creating a New Task") are described in this document. The reference document is found here. Click here for a split-screen view.

At the moment, most of the task XML customizations are quite complex, and not yet documented. Here, we focus on the ways that the user can specify various variations on defining their content annotations.

Defining content annotations

The simplest example of customizing your annotations in your task.xml file is inheriting all your structural annotations and adding your own content annotations. The role of the different annotation categories is described here.

When you define your content annotations, you should assign some CSS to distinguish them in the Web UI. Right now, the most appropriate way to do this is to use background colors.

  <tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: blue"/>
</tag>
<tag name="TAG2" category="content">
<ui css="background-color: red"/>
</tag>
</tags>

So here, we've inherited the structure annotations using the inherit_structure attribute, and defined two content annotations, TAG1 and TAG2. We've assigned TAG1 a blue background color, and TAG2 a red background color. Since you're using CSS, you can assign colors using hexadecimal designations as well (or, if you prefer, set a background image, or other wacky things).

One caveat: at the moment, annotation spans are styled on a token-by-token basis. So if, for instance, you want to have a left bracket at the left end of an annotation, and a right bracket at the right end, you can't do that quite yet; you'd end up with each token bracketed.

Defining a single content annotation, partitioned by attribute values

In other situations, you may want to define a single content annotation, which has a distinguished attribute value. One common example of this in language processing arises in tagging for so-called named entities (people, locations, organizations). One common tagging scheme assigns a single ENAMEX tag to these entities, and distinguishes among them using the value of the "type" attribute.

There's no problem doing this in MAT. In order to do this, we use the <attr_set> sub-element of <tag>. Each <attr_set> has a name, by which the alternative is known in the tagging menu in the UI, and also in the scoring engine. Within each <attr_set> is one or more <attr> elements, which have a name and a value; any annotation which has the appropriate tag, and also the appropriate attributes and values, will be considered to be in this attr set. So for the user, the specification below will look exactly like having defined three separate tags: PERSON, LOCATION, ORGANIZATION. However, internally, the rich annotated document will only have ENAMEX annotations.

  <tags inherit_structure="yes">
<tag name="ENAMEX" category="content">
<attr_set name="PERSON">
<attr name="type" value="PERSON"/>
<ui css="background-color: CCFF66"/><!-- # light green -->
</attr_set>
<attr_set name="LOCATION">
<attr name="type" value="LOCATION"/>
<ui css="background-color: FF99CC"/><!-- # pink -->
</attr_set>
<attr_set name="ORGANIZATION">
<attr name="type" value="ORGANIZATION"/>
<ui css="background-color: 99CCFF"/><!-- # light blue -->
</attr_set>
</tag>
</tags>

Defining keyboard accelerators

The <ui> element also supports the option of having keyboard accelerators. These are keys that the user can press when the tagging menu is visible in the UI, which are equivalent to having selected that menu item. You can add an accelerator using an attribute on the <ui> element:

  <tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: blue" accelerator="A"/>
</tag>
<tag name="TAG2" category="content">
<ui css="background-color: red" accelerator="B"/>
</tag>
</tags>

It's probably a good idea to choose the accelerators mnemonically (the first letter of the menu item name is always a good mnemonic, unless of course more than one item starts with the same letter). Be careful, though; MAT doesn't yet ensure that there are no clashes among accelerators.

Changing the annotation foreground font

Sometimes the color you choose is too dark to see the text, in which case you can use CSS to change the text color:

  <tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: black; color: white" accelerator="A"/>
</tag>
</tags>

Remember, the value of the css attribute is really CSS; it's not converted or processed in any way before it's inserted into the CSS rules in the Web UI. The one caveat is that the CSS is applied to each token in the annotated phrase, not to the phrase as a whole.

Using cascaded menus for more and less specialized tags

Let's say that you have the following annotations:

  <tags inherit_structure="yes">
<tag name="PERSON" category="content">
<ui css="background-color: blue"/>
</tag>
<tag name="MAN" category="content">
<ui css="background-color: pink"/>
</tag>
<tag name="WOMAN" category="content">
<ui css="background-color: orange"/>
</tag>
<tag name="US-LOCATION" category="content">
<ui css="background-color: gray"/>
</tag>
<tag name="FOREIGN-LOCATION" category="content">
<ui css="background-color: yellow"/>
</tag>
</tags>

Your annotator is instructed to label people, using PERSON as the annotation if she can't tell which of MAN or WOMAN is applicable. Your preference is to arrange these in a visual hierarchy for the annotator's convenience; you wish to do the same with US-LOCATION and FOREIGN-LOCATION, even though they don't have a common, less specific annotation. Here's what you do:

<tags>
...
<tag_group name="PERSON" children="MAN,WOMAN"/>
<tag_group name="LOCATION" children="US-LOCATION,FOREIGN-LOCATION"/>
</tags>

The tag group can reference an existing annotation (as in the PERSON case) or create its own group (as in the LOCATION case). The effect of these groups will be to create submenus in the annotation popup in the MAT UI.