Tasks, workflows, and steps

For the most part, you can't do anything substantial with the MAT toolkit without defining a task. A task is a set of activities, called workflows, which can be broken down into steps. Each task has a set of annotations that its activities share; each step in a task can participate in multiple workflows, and each step makes the "same" contribution to each workflow it participates in. All these concepts are interrelated, and it's difficult to discuss one without the other, but we'll try to describe them in the most sensible order.

Steps

Steps are atomic actions in your workflows. The most common type of step adds a category of annotation; e.g., a tokenization step adds token annotations. Each step has an implementation (either self-contained, or a wrapper for an external tool) which is a Python class. We provide a number of useful step implementations which you can use in your workflow. If you want to define your own steps, you'll have to consult the advanced topics.

Here are the step implementations, along with their common step names, that MAT provides "out of the box":

Step implementation name
common step name
Description
MAT.PluginMgr.WholeZoneStep
zone
This step assigns a single zone annotation with label "zone" and attribute "region_type" with value "body", to the entire document. This step also adds administrative SEGMENT annotations to track annotation progress.

The options for this step are described immediately below.
MAT.JavaCarafe.CarafeTokenizationStep
tokenize
This step runs the Carafe tokenizer on the relevant document, generating token annotations with label "lex" in such a way that the zone boundaries are not crossed.

The options for this step are described here.
MAT.JavaCarafe.CarafeTagStep
tag
This step runs the Carafe tagger, adding content tags to the document.

The options for this step are described here.
MAT.PluginMgr.TagStep
hand tag
This step is the parent of all tag steps. It serves as a placeholder implementation for hand annotation in those workflows that do not have automated content tagging, and to implement "undo" in automated tag steps. Any steps with this implementation must be designated by_hand="yes" in the task.xml file.

This step has no available options.
MAT.PluginMgr.AlignStep
align
This step is intended to work with documents which have been imported from other formats (e.g., XML inline), which have content annotations which may not align with token boundaries. This step aligns the content annotation boundaries with with the token boundaries by expanding the annotations to the nearest token boundaries. This alignment is expected in the UI annotation tool (and, in fact, by may trainable tagging engines, including Carafe). Insert a step with this implementation in your workflows which are intended to manage imported documents.

This step has no available options.

See the sample 'Named Entity' task for a detailed example of how these steps are used in workflows.

The options these step implementations can bear can be specified in the task.xml file or in the invocation of the MAT engine.The one general-purpose step which has options is MAT.PluginMgr.WholeZoneStep:

Command line option
XML attribute
Value
Description
--mark_gold
mark_gold
"yes" (XML)
If present, mark the document segments as gold-standard data (annotator = "GOLD_STANDARD", status = "reconciled")

The UI also makes available a separate "mark gold" step, which has no backend implementation.

Workflows

Once you have a set of step implementations to draw from, you can create mnemonic names for them and assemble them into workflows. Four extremely common and obvious workflows are

If you create other, custom steps, you may have other workflows.

One quirk of the mnemonic names for steps is that they're global to the task. The implementation of, say, "tokenize" can differ from workflow to workflow, but when you apply different workflows to a document, the document knows what's already been done by virtue of the named steps that have been applied. So it's not a good idea for the effect of step implementations to differ among workflows. The implementations can provide different methods for achieving the same effect (e.g., different automated taggers, or hand vs. automated tagging), but they should not vary any further; the tags which are added by any implementation of a named step should be the same.

Other things you may find in tasks

In general, tasks provide a customization bundle for your use of MAT. In this document, we've described two of the most prominent customizations: defining steps and defining workflows (we've discussed annotations elsewhere). There are many other things you can customize:

For relevant examples of these, please consult "The sample tasks", "Creating a new task", "Creating a new demo", and the documentation for the task XML and the demo XML.