Use cases for the XML format for the task files (see "Creating a New Task") are described
in this document. The reference document is found here. Click here
for a split-screen view.
At the moment, most of the task XML customizations are quite
complex, and not yet documented. Here, we focus on the ways that the
user can specify various variations on defining their content
annotations.
The simplest example of customizing your annotations in your
task.xml file is inheriting all your structural annotations and adding
your own content annotations. The role of the different annotation
categories is described here.
When you define your content annotations, you should assign some CSS
to distinguish them in the Web UI. Right now, the most appropriate way
to do this is to use background colors.
<tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: blue"/>
</tag>
<tag name="TAG2" category="content">
<ui css="background-color: red"/>
</tag>
</tags>
So here, we've inherited the structure annotations using the
inherit_structure attribute, and defined two content annotations, TAG1
and TAG2. We've assigned TAG1 a blue background color, and TAG2 a red
background color. Since you're using CSS, you can assign colors using
hexadecimal designations as well (or, if you prefer, set a background
image, or other wacky things).
One caveat: at the moment, annotation spans are styled on a
token-by-token basis. So if, for instance, you want to have a left
bracket at the left end of an annotation, and a right bracket at the
right end, you can't do that quite yet; you'd end up with each token
bracketed.
In other situations, you may want to define a single content
annotation, which has a distinguished attribute value. One common
example of this in language processing arises in tagging for so-called
named entities (people, locations, organizations). One common tagging
scheme assigns a single ENAMEX tag to these entities, and distinguishes
among them using the value of the "type" attribute.
There's no problem doing this in MAT. In order to do this, we use
the <attr_set> sub-element of <tag>. Each <attr_set>
has a name, by which the alternative is known in the tagging menu in
the UI, and also in the scoring engine. Within each <attr_set> is
one or more <attr> elements, which have a name and a value; any
annotation which has the appropriate tag, and also the appropriate
attributes and values, will be considered to be in this attr set. So
for the user, the specification below will look exactly like having
defined three separate tags: PERSON, LOCATION, ORGANIZATION. However,
internally, the rich annotated document will only have ENAMEX
annotations.
<tags inherit_structure="yes">
<tag name="ENAMEX" category="content">
<attr_set name="PERSON">
<attr name="type" value="PERSON"/>
<ui css="background-color: CCFF66"/><!-- # light green -->
</attr_set>
<attr_set name="LOCATION">
<attr name="type" value="LOCATION"/>
<ui css="background-color: FF99CC"/><!-- # pink -->
</attr_set>
<attr_set name="ORGANIZATION">
<attr name="type" value="ORGANIZATION"/>
<ui css="background-color: 99CCFF"/><!-- # light blue -->
</attr_set>
</tag>
</tags>
The <ui> element also supports the option of having keyboard
accelerators. These are keys that the user can press when the tagging
menu is visible in the UI, which are equivalent to having selected that
menu item. You can add an accelerator using an attribute on the
<ui> element:
<tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: blue" accelerator="A"/>
</tag>
<tag name="TAG2" category="content">
<ui css="background-color: red" accelerator="B"/>
</tag>
</tags>
It's probably a good idea to choose the accelerators mnemonically
(the first letter of the menu item name is always a good mnemonic,
unless of course more than one item starts with the same letter). Be
careful, though; MAT doesn't yet ensure that there are no clashes among
accelerators.
Sometimes the color you choose is too dark to see the text, in which
case you can use CSS to change the text color:
<tags inherit_structure="yes">
<tag name="TAG1" category="content">
<ui css="background-color: black; color: white" accelerator="A"/>
</tag>
</tags>
Remember, the value of the css attribute is really CSS; it's not
converted or processed in any way before it's inserted into the CSS
rules in the Web UI. The one caveat is that the CSS is applied to each
token in the annotated phrase, not to the phrase as a whole.
Let's say that you have the following annotations:
<tags inherit_structure="yes">
<tag name="PERSON" category="content">
<ui css="background-color: blue"/>
</tag>
<tag name="MAN" category="content">
<ui css="background-color: pink"/>
</tag>
<tag name="WOMAN" category="content">
<ui css="background-color: orange"/>
</tag>
<tag name="US-LOCATION" category="content">
<ui css="background-color: gray"/>
</tag>
<tag name="FOREIGN-LOCATION" category="content">
<ui css="background-color: yellow"/>
</tag>
</tags>
Your annotator is instructed to label people, using PERSON as the
annotation if she can't tell which of MAN or WOMAN is applicable. Your
preference is to arrange these in a visual hierarchy for the
annotator's convenience; you wish to do the same with US-LOCATION and
FOREIGN-LOCATION, even though they don't have a common, less specific
annotation. Here's what you do:
<tags>
...
<tag_group name="PERSON" children="MAN,WOMAN"/>
<tag_group name="LOCATION" children="US-LOCATION,FOREIGN-LOCATION"/>
</tags>
The tag group can reference an existing annotation (as in the PERSON
case) or create its own group (as in the LOCATION case). The effect of
these groups will be to create submenus in the annotation popup in the
MAT UI.