The central technology underlying MAT is Carafe, a MITRE-built
trainable conditional random field text tagger, implemented in Java.
This engine creates models from annotated documents and annotates
documents based on those models. The tokenizer that serves as a
preprocess to Carafe is also implemented in Java, and is distributed
with Carafe.
In many annotation systems, the main focus is on tweaking the tagger
to get the best possible results. This is definitely possible
with Carafe, but it's not the focus of MAT; the focus of MAT is to
build an
annotation infrastructure around an existing tagging feature set
and tokenizer.
MAT makes Carafe available as a command-line tool, but also as a
server, which avoids repeatedly incurring the startup cost of loading
the annotation model.
MAT is distributed with a UI which is a novel combination of two
capabilities which are required for managing annotated documents. On
the one hand, it's a hand annotation tool, as well as an
annotation display tool. On the other hand, it also allows the user to
control the automated steps which the document goes through. This UI is
written entirely in Javascript, and MAT runs its own Web server to make
the UI available.
MAT maintains its annotated documents in its own simple standoff
annotation format, which is based on the Javascript Object Notation
(JSON). JSON is especially convenient for passing documents back and
forth via AJAX.
The core engine which controls the automated document processing is
written in Python. This includes both the command-line capabilities and
the Web backend.
For its Web frontend, MAT relies on the Yahoo! YUI toolkit, a
BSD-licensed library for building rich interactive Web applications.
For its service infrastructure, including its Web server, MAT relies
on CherryPy, a BSD-licensed,
lightweight, flexible
threaded Web protocol engine written in Python.
UIMA, the Unstructured Management Information Architecture, is an open-source infrastructure for managing automated annotations. MAT is a different animal: it's designed to provide simple tools for a wide range of tasks associated with annotating documents. MAT's workflow engine could have been built on top of UIMA, but the pieces we chose (Python instead of Java, JSON instead of XML) are lighter-weight and have made development significantly faster and lower-cost.