Core technologies

Carafe

The central technology underlying MAT is Carafe, a MITRE-built trainable conditional random field text tagger, implemented in Java. This engine creates models from annotated documents and annotates documents based on those models. The tokenizer that serves as a preprocess to Carafe is also implemented in Java, and is distributed with Carafe.

In many annotation systems, the main focus is on tweaking the tagger to get the best possible results. This is definitely possible with Carafe, but it's not the focus of MAT; the focus of MAT is to build an annotation infrastructure around an existing tagging feature set and tokenizer.

MAT makes Carafe available as a command-line tool, but also as a server, which avoids repeatedly incurring the startup cost of loading the annotation model.

Javascript and AJAX

MAT is distributed with a UI which is a novel combination of two capabilities which are required for managing annotated documents. On the one hand,  it's a hand annotation tool, as well as an annotation display tool. On the other hand, it also allows the user to control the automated steps which the document goes through. This UI is written entirely in Javascript, and MAT runs its own Web server to make the UI available.

JSON

MAT maintains its annotated documents in its own simple standoff annotation format, which is based on the Javascript Object Notation (JSON). JSON is especially convenient for passing documents back and forth via AJAX.

Python

The core engine which controls the automated document processing is written in Python. This includes both the command-line capabilities and the Web backend.

Open-source packages used by MAT

YUI

For its Web frontend, MAT relies on the Yahoo! YUI toolkit, a BSD-licensed library for building rich interactive Web applications.

CherryPy

For its service infrastructure, including its Web server, MAT relies on CherryPy, a BSD-licensed, lightweight, flexible threaded Web protocol engine written in Python.

What's the difference between MAT and UIMA?

UIMA, the Unstructured Management Information Architecture, is an open-source infrastructure for managing automated annotations. MAT is a different animal: it's designed to provide simple tools for a wide range of tasks associated with annotating documents. MAT's workflow engine could have been built on top of UIMA, but the pieces we chose (Python instead of Java, JSON instead of XML) are lighter-weight and have made development significantly faster and lower-cost.