The central technology underlying MAT is Carafe, a MITRE-built trainable
conditional random field text tagger, implemented in Scala, a
Java-compatible programming language which compiles to Java object
files. This engine creates models from annotated documents and
annotates documents based on those models. The tokenizer that
serves as a preprocess to Carafe is also implemented in Scala, and
is distributed with Carafe.
In many annotation systems, the main focus is on tweaking the
tagger to get the best possible results. This is definitely
possible with Carafe, but it's not the focus of MAT; the focus of
MAT is to build an annotation infrastructure around an existing
tagging feature set and tokenizer.
MAT makes Carafe available as a command-line tool, but also as a
server, which avoids repeatedly incurring the startup cost of
loading the annotation model. This server behavior is the default
behavior in MAT; it must be explicitly disabled using the
--tagger_local option to MATEngine.
MAT is distributed with a UI which is a novel combination of two
capabilities which are required for managing annotated documents.
On the one hand, it's a hand annotation tool, as well as an
annotation display tool. On the other hand, it also allows the
user to control the automated steps which the document goes
through. This UI is written entirely in JavaScript, and MAT runs
its own Web server to make the UI available.
MAT maintains its annotated documents in its own simple standoff
annotation format, which is based on the JavaScript Object
Notation (JSON). JSON is especially convenient for passing
documents back and forth via AJAX.
The core engine which controls the automated document processing
is written in Python. This includes both the command-line
capabilities and the Web backend.
For its Web frontend, MAT relies on the Yahoo! YUI toolkit, a
BSD-licensed library for building rich interactive Web
applications.
For its service infrastructure, including its Web server, MAT
relies on CherryPy, a
BSD-licensed, lightweight, flexible threaded Web protocol engine
written in Python.