Creating a New Task

If you haven't received an already-customized version of MAT, and you want to do something besides the default named entity task, you're going to want to define your own task. This document describes how to do that for simple tasks.

Step 1: Set up your directory

Create a directory. This directory might ultimately have various subdirectories; for instance, custom Python code must live in files in a python/ subdirectory, and custom Javascript code should live in files in a js/ subdirectory. But you don't need to know about those right now.

Step 2: Create a task.xml file

Most of what we're going to talk about in this document is the task.xml file. You can get a better idea of what this file consists of by looking at the task documentation, the documentation on the sample 'Named Entity' task, and the documentation on the task XML and annotation set descriptor XML itself.

For now, just open an empty file named task.xml and save the empty file in your directory created in step 1.

Step 3: Name your task and set up your template

Create a top-level <task> element, and give your task a name:

<task name="Widget Annotation">
  <annotation_set_descriptors>
    <annotation_set_descriptor name="content" category="content">
    </annotation_set_descriptor>
  </annotation_set_descriptors>
  <annotation_display>
  </annotation_display>
  <workflows>
  </workflows>
</task>

If you had a need for customizing the task class, you'd add a class attribute to the task; one reason you might do this is to add a new folder to a workspace. See the advanced documentation for a hint.

For historical reasons, the <workflows> element shown here is obligatory. We've added <annotation_set_descriptors> because you're definitely going to be defining them. Notice that the descriptor has the name and category attributes both set to "content"; at the moment, these are the only settings you should use when declaring your annotations. We've also added <annotation_display>, because you're going to want that too.

Step 4: Declare your annotations

Let's assume that you're going to use the default MAT automated tools (tagger, trainer, tokenizer). Then you'll want to inherit the zone and token annotations from the core task:

  <annotation_set_descriptors inherit="category:zone,category:token">
    ...
  </annotation_set_descriptors>

Next, you should define your labels:

  <annotation_set_descriptors inherit="category:zone,category:token">
    <annotation_set_descriptor name="content" category="content">
      <annotation label="TAG1"/>
      <annotation label="TAG2"/>
    </annotation_set_descriptor>
  </annotation_set_descriptors>

You can customize your annotation declaration in a number of ways. See the annotation set descriptor XML use cases for examples.

Next, you'll want to associate display behavior with your annotations, for the UI using <annotation_display>:

  <annotation_display>
    <label name="TAG1" accelerator="1" css="background-color: blue"/>
    <label name="TAG2" accelerator="2" css="background-color: green"/>
  </annotation_display>

Examples of customizing your annotation display can be found in the task XML use cases.

Step 5: declare your model build settings

Another toplevel element in the <task> is the settings to specify how the model is built.

  <model_config class="MAT.JavaCarafe.CarafeModelBuilder"/> 
  <default_model>default_model</default_model>

This uses the Carafe engine, with the default feature specification which Carafe provides, and instructions to save default models to the file "default_model" in your task directory. If you want to use the faster, but possibly less-well-performing (and slightly less reliable) periodic stepsize adjustment training method, use this model_build_settings specification:

  <model_config class="MAT.JavaCarafe.CarafeModelBuilder">
    <build_settings training_method="psa" max_iterations="6"/>
  </model_config>

This specifies 6 iterations of periodic stepsize adjustment.

There are lots of ways of customizing the Carafe model builder. See MATModelBuilder and the Carafe engine documentation for more details about these settings.

There are some circumstances under which you don't need to configure model building; e.g., if you only intend to use the MAT tool for hand annotation or to score documents.

Step 6: define your steps and workflows

Other toplevel elements in the <task> are the descriptions of the workflows and steps you'll use. Right now, MAT is somewhat limited in its default steps and flexibility; without customization, a limited number of steps are available, and these steps can be organized into only a limited range of workflows. You can find a summary of the available steps and workflows here, and additional details here. We believe that these steps and workflows are sufficient for the most common tasks a user might have; and, unfortunately, at the moment it's quite difficult to describe how to extend these options in any great detail. See the advanced customization notes for what's available.

We recommend the following workflow and step blocks, as described in the sample 'Named Entity' task.

  <workflows>
    <workflow name="Hand annotation">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag" pretty_name="hand tag" by_hand="yes"/>
    </workflow>
    <workflow name="Review/repair" hand_annotation_available_at_end="yes"/>
    <workflow name="Demo" hand_annotation_available_at_end="yes">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="tag"/>
    </workflow>
    <workflow name="Align">
      <step name="zone"/>
      <step name="tokenize"/>
      <step name="align"/>
    </workflow>
  </workflows>
  <step_implementations>
    <step name="tokenize" class="MAT.JavaCarafe.CarafeTokenizationStep"/>
    <step name="zone" class="MAT.PluginMgr.WholeZoneStep"/>
    <step name="align" class="MAT.PluginMgr.AlignStep"/>
    <step name="tag" workflows="Demo" class="MAT.JavaCarafe.CarafeTagStep"/>
    <!-- for undo -->
    <step name="tag" class="MAT.PluginMgr.TagStep"/>
  </step_implementations>

Step 7 (optional): define your workspace implementations

If you intend to use workspace mode, you should also define your workspace implementations. The workspace block that corresponds to the workflows and steps described immediately above looks like this:

  <workspace>
    <operation name="autotag">
      <settings workflow="Demo" steps="tag"/>
    </operation>
    <operation name="modelbuild">
      <settings/>
    </operation>
    <operation name="import">
      <settings workflow="Hand annotation" steps="zone,tokenize"/>
    </operation>
  </workspace>

Step 8: Install the task

Use the MATManagePluginDirs tool to ensure that MAT knows about your task directory. If <dir> is your task directory:

Unix:

% $MAT_PKG_HOME/bin/MATManagePluginDirs install <dir>

Windows native:

> %MAT_PKG_HOME%\bin\MATManagePluginDirs.cmd install <dir>