Tasks, workflows, and steps

For the most part, you can't do anything substantial with the MAT toolkit without defining a task. A task is a set of activities, called workflows, which can be broken down into steps. Each task has a set of annotations that its activities share; each step in a task can participate in multiple workflows, and each step makes the "same" contribution to each workflow it participates in. All these concepts are interrelated, and it's difficult to discuss one without the other, but we'll try to describe them in the most sensible order.

Steps

Steps are atomic actions in your workflows. The most common type of step adds a category of annotation; e.g., a tokenization step adds token annotations. Each step has an implementation (either self-contained, or a wrapper for an external tool) which is a Python class. We provide a number of useful step implementations which you can use in your workflow. If you want to define your own steps, you'll have to consult the advanced topics.

Here are the step implementations, along with their common step names, that MAT provides "out of the box":

Step implementation name	common step name	Description
MAT.PluginMgr.WholeZoneStep	zone	This step assigns a single zone annotation with label "zone" and attribute "region_type" with value "body", to the entire document. This step also adds administrative SEGMENT annotations to track annotation progress. The options for this step are described immediately below.
MAT.JavaCarafe.CarafeTokenizationStep	tokenize	This step runs the Carafe tokenizer on the relevant document, generating token annotations with label "lex" in such a way that the zone boundaries are not crossed. The options for this step are described here.
MAT.JavaCarafe.CarafeTagStep	tag	This step runs the Carafe tagger, adding content tags to the document. The options for this step are described here.
MAT.PluginMgr.TagStep	hand tag	This step is the parent of all tag steps. It serves as a placeholder implementation for hand annotation in those workflows that do not have automated content tagging, and to implement "undo" in automated tag steps. Any steps with this implementation must be designated by_hand="yes" in the task.xml file. This step has no available options.
MAT.PluginMgr.AlignStep	align	This step is intended to work with documents which have been imported from other formats (e.g., XML inline), which have content annotations which may not align with token boundaries. This step aligns the content annotation boundaries with with the token boundaries by expanding the annotations to the nearest token boundaries. This alignment is expected in the UI annotation tool (and, in fact, by may trainable tagging engines, including Carafe). Insert a step with this implementation in your workflows which are intended to manage imported documents. This step has no available options.

See the sample 'Named Entity' task for a detailed example of how these steps are used in workflows.

The options these step implementations can bear can be specified in the task.xml file or in the invocation of the MAT engine.The one general-purpose step which has options is MAT.PluginMgr.WholeZoneStep:

Command line option	XML attribute	Value	Description
--mark_gold	mark_gold	"yes" (XML)	If present, mark the document segments as gold-standard data (annotator = "GOLD_STANDARD", status = "reconciled")

The UI also makes available a separate "mark gold" step, which has no backend implementation.

Workflows

Once you have a set of step implementations to draw from, you can create mnemonic names for them and assemble them into workflows. Four extremely common and obvious workflows are

a workflow that zones a document, tokenizes it, and then automatically tags it (called "Demo" in the "Named Entity" task)
a workflow which zones a document, tokenizes it, and then provides a placeholder for human annotation (called "Hand annotation' in the "Named Entity" task)
a workflow for hand annotation without tokenization (called "Tokenless hand annotation" in the "Named Entity" task)
a workflow which has no steps, but which allows a final phase of human review and hand-correction (called "Review/repair" in the "Named Entity" task)

If you create other, custom steps, you may have other workflows.

One quirk of the mnemonic names for steps is that they're global to the task. The implementation of, say, "tokenize" can differ from workflow to workflow, but when you apply different workflows to a document, the document knows what's already been done by virtue of the named steps that have been applied. So it's not a good idea for the effect of step implementations to differ among workflows. The implementations can provide different methods for achieving the same effect (e.g., different automated taggers, or hand vs. automated tagging), but they should not vary any further; the tags which are added by any implementation of a named step should be the same.

Other things you may find in tasks

In general, tasks provide a customization bundle for your use of MAT. In this document, we've described two of the most prominent customizations: defining steps and defining workflows (we've discussed annotations elsewhere). There are many other things you can customize:

You can provide specialized Javascript and CSS code for the MAT UI. These customizations are very complex, and we will leave them undocumented in this release.
You can specify default settings for building models in the task.xml file. These settings are essentially the flags described in the documentation for MATModelBuilder.
You can declare settings for the operations in your workspaces in the task.xml file. These settings are essentially the flags described in the documentation for MATEngine and MATModelBuilder (depending on which one the operation uses).
You can customize how annotations are compared for scoring and comparison.
You can define custom steps in Python, and refer to them in the task.xml file. For hints on how to do this, see the section "Creating Your Own Steps" in the advanced task customization docs.
You can define new workspace folders, and refer to them and customize their behavior in the task.xml file. These customizations are still evolving, and we will leave them undocumented in this release.
You can customize the documentation that's visible via the MAT UI. These customizations are very complex, and we will leave them undocumented in this release.
You can create a demo.xml file which uses your task to field a Web demo.

For relevant examples of these, please consult "The sample tasks", "Creating a new task", "Creating a new demo", and the documentation for the task XML and the demo XML.