The experiment engine

At various points, you might want to know how your system is performing, and you might want to conduct an experiment. In a typical experiment of this sort, you'd have to do a number of things:

Select an annotated training corpus to build a model from
Select an annotated test corpus to test the output of the tagging engine
Build your model
Tag an unannotated version of the test corpus using the constructed model
Score the tagging output against the annotated test corpus

You may also want to select your training and test corpora by partitioning an single corpus of documents, or iterate through a variety of settings for the model builder or tagger to compare their performance (e.g., different bias values for recall vs. precision).

MAT provides an engine to do all this for you, guided by an extensive, declarative XML specification for your experiment. The experiment XML file consists of three types of information:

descriptions of corpora
descriptions of model sets
descriptions of experiment runs

The experiment engine runs this experiment in a directory which is provided to it either via the XML file (the "dir" attribute of the <experiment> element) or on the command line (the --exp_dir option). The experiment engine prepares the corpora, builds the models, and performs the experiment runs in this directory. The experiment XML file is copied into the experiment directory, if a file with the same name is not already present.

We document the wide range of ways that this engine can be used here; we describe the rich structure of its output directory here.