Creating a New Task

If you haven't received an already-customized version of MAT, and you want to do something besides the default named entity task, you're going to want to define your own task. This document describes how to do that for simple tasks.

Step 1: Set up your directory

Create a directory. This directory might ultimately have various subdirectories; for instance, custom Python code must live in files in a python/ subdirectory, and custom Javascript code should live in files in a js/ subdirectory. But you don't need to know about those right now.

Step 2: Create a task.xml file

Most of what we're going to talk about in this document is the task.xml file. You can get a better idea of what this file consists of by looking at the task documentation, the documentation on the sample task, and the documentation on the task XML itself.

For now, just open an empty file named task.xml and save the empty file in your directory created in step 1.

Step 3: Name your task and set up your template

Create a top-level <task> element, and give your task a name:

<task name="Widget Annotation">
  <tags>
  </tags>
  <workflows>
  </workflows>
</task>

If you had a need for customizing the task class, you'd add a class attribute to the task; one reason you might do this is to add a new folder to a workspace. See the advanced documentation for a hint.

For historical reasons, the three subelements of <task> shown here are the obligatory elements. We've set up <tags> and <workflows> with separate closing tags because you'll almost certainly be populating these elements.

Step 4: Declare your annotations

Let's assume that you're going to use the default MAT automated tools (tagger, trainer, tokenizer). Then you'll want to inherit the structural annotations from the core task, and define your content annotations:

  <tags inherit_structure="yes">
    <tag name="TAG1" category="content">
      <ui css="background-color: blue"/>
    </tag>
    <tag name="TAG2" category="content">
      <ui css="background-color: red"/>
    </tag>
  </tags>

You can customize your annotation declaration in a number of ways. See the task XML use cases for examples.

Step 5: declare your model build settings

Another toplevel element in the <task> is the settings to specify how the model is built.

  <model_config class="MAT.JavaCarafe.CarafeModelBuilder"/> 
  <default_model>default_model</default_model>

This uses the Carafe engine, with the default feature specification which Carafe provides, and instructions to save default models to the file "default_model" in your task directory. If you want to use the faster, but possibly less-well-performing (and slightly less reliable) PSA training method, use this model_build_settings specification:

  <model_config class="MAT.JavaCarafe.CarafeModelBuilder">
    <build_settings training_method="psa" max_iterations="6"/>
  </model_config>

This specifies 6 iterations of periodic stepsize adjustment.

There are lots of ways of costumizing the Carafe model builder. See MATModelBuilder and the Carafe engine documentation for more details about these settings.

There are some circumstances under which you don't need to configure model building; e.g., if you only intend to use the MAT tool for hand annotation or to score documents.

Step 6: define your steps and workflows

Other toplevels element in the <task> are the descriptions of the workflows and steps you'll use. Right now, MAT is somewhat limited in its default steps and flexibility; without customization, a limited number of steps are available, and these steps can be organized into only a limited range of workflows. We believe that these steps and workflows are sufficient for the most common tasks a user might have; and, unfortunately, at the moment it's quite difficult to describe how to extend these options in any great detail. See the advanced customization notes for what's available.

We recommend the following workflow and step blocks, as described in the sample task.

  <workflows>
    <workflow name="Hand annotation">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag" pretty_name="hand tag" by_hand="yes"/>
    </workflow>
    <workflow name="Review/repair" hand_annotation_available_at_end="yes"/>
    <workflow name="Demo" hand_annotation_available_at_end="yes">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag"/>
    </workflow>
    <workflow name="Align">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="align"/>
    </workflow>
  </workflows>
  <step_implementations>
    <step name="tokenize" class="MAT.JavaCarafe.CarafeTokenizationStep"/>
    <step name="zone" class="MAT.PluginMgr.WholeZoneStep"/>
    <step name="align" class="MAT.PluginMgr.AlignStep"/>
    <step name="tag" tagging_step="yes" workflows="Demo" class="MAT.JavaCarafe.CarafeTagStep"/>
    <!-- for undo -->
    <step name="tag" tagging_step="yes" class="MAT.PluginMgr.TagStep"/>
  </step_implementations>

Step 7 (optional): define your workspace implementations

If you intend to use workspace mode, you should also define your workspace implementations. The workspace block that corresponds to the workflows and steps described immediately above looks like this:

  <workspace>
    <operation name="autotag">
      <settings workflow="Demo" steps="zone,tokenize,tag"/>
    </operation>
    <operation name="modelbuild">
      <settings/>
    </operation>
    <operation name="tagprep">
      <settings workflow="Hand annotation" steps="zone,tokenize"/>
    </operation>
  </workspace>

Step 8: Install the task

Use the MATManagePluginDirs tool to ensure that MAT knows about your task directory. If <dir> is your task directory:

Unix:

% $MAT_PKG_HOME/bin/MATManagePluginDirs install <dir>

Windows native:

> %MAT_PKG_HOME%\bin\MATManagePluginDirs.cmd install <dir>