The basic unit of document enrichment in MAT is the annotation. There are two
types of annotations in MAT: span(ned)
annotations and spanless
annotations. Span annotations are anchored to a
particular contiguous span in the document, and make some implicit
assertion about it; e.g., the span from character 10 to character
15 is a noun phrase. Spanless annotations are not anchored to a
span, and are used to make assertions about the entire document,
or to make assertions about other annotations; e.g., annotation 1
and annotation 2 refer to the same entity, or they stand in some
relation to each other.
Annotations have labels, which are strings. Annotations can also
have attributes, whose values are restricted to particular types.
Your task maintainer will define all your annotations, attributes,
and attribute value types and restrictions for you; each task
defines the annotations and attributes available for that task.
You'll learn more about tasks in
a minute.
While you don't need to know most of the details of how
annotations are constructed or defined, you do need to
know that among the possible attribute value types are
"annotation" and "list or set of annotations"; in other words, the
way MAT implements relation, event and coreference annotation is
via attributes whose values are other annotations. The annotations
which "host" these annotation-valued attributes may be spanned or
spanless. You've already seen some examples of some of these in tutorial 8, and you'll see UI
examples in more detail later. If you want all the
gory details, you can look here
and here.
In most circumstances, the name that you'll be given or shown for
an annotation is the annotation's label; e.g., if you're applying
or looking at a PERSON annotation, the label of that annotation
will be "PERSON". However, in some cases, your task maintainer
will choose to make this notional label an effective label,
which corresponds to some combination of annotation label +
attribute value (e.g., ENAMEX type="PER"). As the annotator,
you'll have access to to the actual label and attribute/value
information in the UI, but you'll be shown the effective label in
the relevant circumstances (e.g., when you're choosing an
annotation to create from your annotation menu in the UI).
MAT currently allows you to apply all of its tools to what we'll
call simple span annotations: span annotations which have
either no attributes, or a single attribute/value pair which
defines the effective label. The engine which comes with MAT, Carafe, can be used with such
annotations to perform the tag-a-little, learn-a-little loop, run
experiments, etc. For annotations of greater complexity (i.e.,
spanless annotations, or annotations with more attributes), Carafe
can't currently add such annotations automatically; so Carafe will
be able to build and apply models for only the simple span subset
of these more complex annotation sets. This restriction affects
the MAT processing engine and experiment harness, and it
means that all your more complex attributes and annotations will
have to be added by hand. For more details about what MAT can and
can't do with complex annotations, see here.
There's one more concept you'll need. MAT divides the annotations
in a task into different categories. All the annotations you'll be
hand-adding or editing are called content annotations.
Other categories include:
If this documentation ever talks about annotations without
specifying a category, it's almost certainly talking about content
annotations. As a user, you really won't need to know much, if
anything, about the other categories.