- Fixed a bug which corrupted line endings in CSV files on Windows.
- Upgraded to Java Carafe 0.9.7RC4 to address a problem with
serializing mat-json files in the tokenizer.
- Introduced a check to ensure that MAT 1.2 fails intelligentlywhen
presented with future versions of the mat-json serialization.
- Minor documentation updates.
- Upgraded to Java Carafe 0.9.7RC3. This version of Carafe is not
compatible with models built with previous versions of Java Carafe.
- Fixed two small bugs where ENAMEX-style tags were not being
ordered correctly when labels were not being alphabetized in the UI.
- Fixed a small bug in the scorer where documents with no zones
(e.g., XML documents which contain only annotations) were not being
scored correctly when a task was specified.
- Fixed a small bug in setting up the tabbed terminal in Windows.
- Fixed a small bug where Java Carafe model training might do the
wrong thing with duplicate file names.
- Reorganized documentation links to create a better initial
experience when unpacking the distribution.
- Fixed a small bug in the initialization sequence when interacting
with the Java Carafe tagger server.
- Fixed a small bug in CSS displays of annotations with labels
whose name contain a dash.
- Small documentation updates.
- Bumped Java requirement to 1.6 update 4 due to issues with Java
- Bumped Java requirement to 1.6.
- Extended the capabilities of the property caches in the
experiment engine to handle a wider range of data types.
- Removed the option to make smaller zones for PSA training because
it's no longer needed. Removed the no_random_segments configuration
option as part of this change.
- Removed option for SGD training because it's being removed from
- Insulated the MAT Web server against stray ill-formed cookies
that the browser might deliver to the server by accident.
- Retokenized sample data and updated Java Carafe bindings for
- Fixed a bug where MATReport and MATScore spreadsheets were being
created as ASCII rather than UTF-8.
- Expanded optional subprocess monitoring to include children and
remote children of subprocesses.
- Fixed a bug with restart in MATExperimentEngine.
- Small documentation updates.
- Updated to Java Carafe 0.9.5. All previous models will have to be
- Clarified a few error messages, updated some comments.
- Fixed an obscure bug in Windows 7 which was resulting in a
Windows memory error.
- Added the optional ability to monitor subprocess memory image
sizes via the open-source psutil package. Enabled MATEngine,
MATModelBuilder, MATExperimentEngine and MATWorkspaceEngine with this
- Fixed a bug in the test suite which limited its ability to run on
- Bug 19392: Ensured that scorer doesn't include annotations
outside zones in
the reference document, if zones are known to be present.
- Extended MATReport to provide file-level statistics.
- Improved error message when character encoding is wrong.
- Added the "default_tag_window_size" and
"default_tag_window_position" attribute in task.xml to provide the
ability to control the annotation windows in the MAT UI.
- Add the MATReport tool to generate concordance-style annotation
reports in CSV and text formats.
- Fixed a small installer bug which arose when Java could not be
- Fixed a bug where deleting annotations in workspace mode in the
UI wasn't enough by itself to mark a document as needing to be saved.
- Fixed a bug where the new keyboard accelerator for repeating the
last annotation wasn't working on Windows.
- Extended the (undocumented) mechanism for task-specific
the presentation of untaggable regions.
- Added the ability to add a diff file against a previous
distribution to a distribution tarball.
- Documentation updates.
- Tiny bug fix in error reporting from Java Carafe.
- Enhanced Java Carafe wrapper with control for Java stack size.
- Fixed some small bugs in error reporting from Java Carafe.
- Improved the description of corpora in CVS output files in the
- Added the error text description to the logging CVS files in the
- Enhanced task.xml files with defaults for the Java subprocess
- Improved documentation and record-keeping for MAT builds.
- Minor documentation improvements.
- Added the "alphabetize_labels" attribute in task.xml to provide
the ability to control how the annotations are ordered in the UI legend
- Fixed a typo bug in the tagging service.
- Updated some documentation as a result of user feedback.
- Added the ability to add last tag in the UI, with a consistent
- Fixed two more small command-line argument processing bugs, one
in MATModelBuilder and one in MATRetokenize.
- Fixed a bug in the scorer, where the task was not being used as a
fallback source of tag label metadata.
- Modified MATScore to allow the user to provide lists of content
annotations and token annotations directly, in case neither task nor
tag label metadata is available.
- Fixed a bug in the processing of command line arguments in
- Fixed a bug in MATRetokenize that arose when only a single task
- Added internal APIs to make it easier to make use of taggers and
tokenizers other than Carafe.
- Fixed a lurking bug where CRLF was not being handled correctly
when XML inline was being read.
- Further cleanup of documentation for Windows port.
- Improvements and simplifications to the distribution and plugin
- Fixed minor bugs in constructing cascaded annotation menu
- Whitespace in tag labels is now supported correctly in the MAT UI.
- The annotation window in the MAT UI now displays the content
annotations under the mouse.
- The MAT UI now supports cascaded annotation menus, via the
<tag_group> element in the task.xml file.
- The task.xml file now provides a means to control text direction
for individual workflows, using the text_right_to_left attribute.
- The task.xml file syntax has been modified to clarify how
attributes are handled by MAT. For advanced users, this necessitates a
number of changes in the task.xml file (see the upgrade
- Fixed a bug where the scorer was breaking if files had no tags to
- It is now possible to use workspace operations directly on the
command-line, without the "operate" operation.
- Replaced the OCaml tokenizer and Carafe tagger with the Java
reimplementation. This necessitates a number of changes, including
retokenizing your documents and rebuilding your models (see the upgrade
- Command-line options for steps which are used in more than one
step are now appropriately cross-referenced in the help string.
- Workspace operations which don't affect any pathnames now raise
- The system now appears to work in Python 2.6 and in MacOS X 10.6.
- The experiment engine XML file now allows you to define bindings
for commonly repeated values, and also supports explicitly referring to
the experiment directory and pattern directory.
- The system now works in native Windows, without Cygwin.
- The system is now distributed in a single zip file for all
- The MAT UI has been improved to support hand annotation without
tokens (although this is not recommended).
- MacOS X no longer requires a special installation of Python, even
though select.poll is still missing on that platform.
- Bug 24205: due to a bug in the experiment engine, non-default
model configurations weren't being handled correctly. Fixed.
- Bugs 15815, 19669: Make sure that the whole system does the right
thing with spaces in filenames. Fixed.
- Extended and modified how corpora are specified in experiment
XML, introducing considerably greater flexibility. It's now possible to
specify n-way corpus splits, and group them arbitrarily in training and
- Fixed an obscure bug where child tasks of visible tasks weren't
being reported appropriately to the UI.
- Fixed a minor bug in formatting XML output.
- Fixed a bug in workspace locking.
- Minor documentation updates.
- Enabled confidence interval reporting. Added reporting of
- Added general workspace locking for operations, importing and
removing basenames, opening a workspace file in the UI, and listing the
contents of the workspace folders.
- Initial support for Python 2.6.
- Upgraded to CherryPy 3.1.2.
- Added XML reader to MATWorkspaceEngine import; expanded
documentation on MATExperimentEngine to describe how to prepare corpora
with XML documents.
- Added support for multiple model build settings in task.xml.
- Added PluginMgr.AlignStep to the core to support alignment of
externally-generated content tags with token boundaries.
- Converted calls to MATEngine command-line tool in the experiment
engine to invocations of the MATEngine object.
- Expanded Java service API to cover checking for the existence of
a workspace, listing the contents of a workspace folder, and opening a
- Bug 24020: due to a bug in the demo infrastructure, it was not
possible to reprocess a document reliably. Fixed.
- Bug 23988: proxies for multiple steps in task.xml weren't doing
the right thing when the proxy set didn't exactly match between
proxies. You shouldn't do this, and the UI still exhibits some unusual
behavior, but it's fixed in the engine.
- Bug 21112: the experiment engine insisted on converting test
documents to raw form before processing for the test run phase. It's
now possible to specify other default preprocessing (e.g., just undo
- Bug 20835: global maintenance of annotation type objects was not
thread-safe. Fixed by making annotation type objects local to a
- Added XML reader and writer to MATEngine, MATModelBuilder,
- Added ability to define arbitrary readers and writers.
- Added the ability to use your own training engine in your own
- Added feedback for workspace import.
- Added option to enable workspace access from remote clients via
the Web server.
- Added splash screen to UI.
- Enhanced scorer to provide file-level count data to support
computing confidence intervals (not enabled yet).
- Enhanced load and save in workspace mode in UI by adding the
document basename to the Web service result.
- Added the "rich, incoming" folder to
the core workspace, for importing files which are in rich format but
not yet prepared for hand tagging.
- Enabled rich document readers to infer the processing state of
documents which lack the appropriate metadata.
- Added "remove" operation to MATWorkspaceEngine.
- Fixed a subtle bug where tagging engines which fail on startup
were not notifying the UI client properly about the failure.
- Fixed an obscure bug where MAT JSON documents which share the
same annotation label but have different attribute orders were being
- Fixed a small bug in MATWeb which prevented logs from rotating
- Fixed a small bug in the Java library which didn't correspond to
the (correct) documentation.
- Bug 18427: in some situations where the UI viewport is small, the
menu bar would disappear. Fixed.
- Bug 20757: ENAMEX-style tags and attributes didn't work with
Carafe training on JSON documents. Fixed.
- Bug 23672: the MIME type of log spreadsheets was incorrect when
saved from the MAT UI, causing Excel to fail to digest the logs
properly on Windows. Fixed.
- Bug 23668: "Update workspace key" in the UI was failing to
percolate appropriately to "Open workspace...". Fixed.
- Minor documentation updates.
- Fixed a bug where command-line arguments of MATEngine weren't
overriding step attributes in the task.xml file. Added appropriate
- Fixed a bug where step attribute defaults weren't being
- Fixed a bug in the Java bindings where it was possible to create
an annotation type with null attributes.
- Fixed an infelicity in the initial README when users first unpack
- Minor documentation updates.
- Fixed minor omission in experiment infrastructure.
- Fixed a bug where extra task paths were not being canonicalized
when task directories were being computed.
- Added a temporary fix, to be backed out in 1.1, to use the Carafe
engine for the anonymization task in the core distribution.
- Fixed a bug in branding in the UI.
- Minor documentation updates.
- Minor documentation updates.
- Vast tracts of documentation updates.
- Minor modifications of the UI log action names to improve
- Bug 19970: because of a deep bug in the interaction between
Firefox and the UI toolkit when the backend server was down, no
feedback was being provided to the user about the failure. Fixed.
- Bug 17989: the experiment engine was saving its raw gold files as
ASCII, rather than a Unicode-compatible encoding like UTF-8. Fixed.
- Bug 17867: HTML and HTTP escapes were not being inserted
appropriately in the routines that generate task-specific
- Bug 17778: various operations that could be performed in the UI
weren't blocking out rerequests while the operation was being
- Bug 16039: left click (not swipe) on an annotation while hand
annotating selected just that token, not the entire annotation. Fixed.
- Extended branding to documentation.
- Yet more documentation updates, including Java client library
- Changed the "inherit_actions" attribute of workspace operations
in task.xml to "inherit_operations", for consistency.
- Fixed a bug in the plugin manager which was making demo files
inaccessible for tasks without Python customizations.
- Even more documentation updates.
- Corrected terminology for the deidentification task.
- Fixed MATWeb so that it preserves its workspace key across
- Expanded the Java client library to support workspace operations.
- Added branding capability to UI and tasks.
- Vast documentation updates.
- Added a Java client library (not documented yet).
- Bug 19772: install.bat was failing with links in Cygwin
- The settings for the Carafe model builder are now specified in
the task.xml file, and can be overridden in the experiment XML or in
the workspace settings. As part of this change, the "task" attribute to
<build_settings> in the experiment XML file is no longer
- Added the MATModelBuilder tool.
- Added the MATManagePluginDirs tool.
- Expanded the documentation.
- The tabbed terminal is now optional. Build, install and runtime
have been appropriately updated.
- The visible name of the rich JSON format in the UI, command line,
and all configuration and experiment files has been changed to
"mat-json". This change is in support of future additions of other
readers and writers for the MAT system.
- The URL with which MAT can be accessed in the Web UI has been
changed to 'http://<host>:<port>/MAT/desktop".
- For any task directory ending with basename <name>, you can
now access a desktop restricted to that task at
- Initial support for user-friendly demos has been introduced. The
configuration file for this capability will be documented when the
capability matures a little more.
- All XML configuration file layouts are now defined by a simple,
user-readable templating system.
- It is now possible to define user-visible workflow steps which
consist of sequences of implementations of other steps (e.g., "prep"
might implement "zone,tokenize").
- The Web UI is no longer tied as closely to the names of the steps
in the MAT engine.
- Tasks now support the option of defining "attribute sets" for
annotations, so that, e.g., an ENAMEX tag with different values for the
"type" attribute can map to different CSS configurations, and be
treated differently in the scorer.
- Bug 19240: if a core task was installed using
MATInstallApplication after a dependent task, and both had
documentation, the documentation rendering was broken. This has been
fixed by moving to live, on-line generation of the customized
- Bugs 14466, 16090: The tarball distribution contained multiple
the MAT executables, only one of which was configured correctly. All
duplications have now been eliminated. As a result, all MAT executables
in the tarball distribution should be accessed from src/MAT instead of
- Bugs 14507, 19391: Previously, there was a separate step to
Web documents in a separate location. That step has now been eliminated.
- MATServer has been removed from the system, and replaced by a
thread in the CherryPy Web application.
- The underlying version of Carafe has been upgraded, and old
training models are not forward-compatible with the new engine. Models
will have to be rebuilt.
- Bug 14508: The Web server could not be limited to localhost. Now
it can be.
- Apache has been removed from the system, and replaced by the
Python-based CherryPy Web application infrastructure.
- The MAT scoring engine now defaults to writing CSV files with
spreadsheet-interpretable equations for computed values. This behavior
can be controlled on the command line for both the scorer and the
- Conversion of ad-hoc mechanisms for Carafe customization to
consistent command-line updates. Note:
- Bug 16450: the -prior-adjst argument of Carafe was not available
for customization in any useful way, even though it controls
recall/precision bias. Fixed.
- Bug 16225: the random zoning option to optimize PSA training was
available only to resynthesized documents, because it was part of the
de-identification task. It is now a feature of the trainer itself, and
zone tags are temporarily inserted into the document immediately before
training if PSA training is requested.
- Major refactor of internal management of steps to support step
rollback in the backend and address a large number of bugs.
- Bug 18960: switching between workflows wasn't undoing all the
appropriate UI configuration steps. Fixed.
- Bug 18898: After cleaning up the way transformation and
nomination should work in the de-identification task, changing the
between the two steps could screw up the application of the
transformation. Fixed by cleaning up the way metadata is handled in the
UI and transported to the backend.
- Bug 18849: experiment directory is created incorrectly if it
doesn't exist when the experiment is started. Fixed.
- Bug 18848: Transforming a document in the de-identification task
would strip all the tags if tokenization had been skipped. Fixed by
doing a smarter job of figuring out how to postprocess transformed
- Bug 18534: hand annotations were not showing up when hand
annotated documents were reloaded, because of a logic error in the
relationship between steps done and steps visible. Fixed.
- Bug 18430: hand annotation was erroneously available during
resynthesis. No longer.
- Bug 18426: the clean step can't be undone, but nothing was
preventing this. Now raises an error.
- Bug 18425: the clean step was not a core step. It is now.
- Bug 17680: the psaTransform step was not being rolled back
appropriately in one implementation of the de-identification task,
it wasn't "really" a transform step. Refactoring the step management
- Bug 17610: rollback was not supported in the backend engine.
Fixed. Engine now accepts an --undo_through argument.
- Bug 16603, 16604: newlines are preserved during transformation in
the de-identification task, but the extent of the tags needed to be
adjusted. In non-clear replacement, the right thing was happening by
accident, because those documents were being tokenized when they
shouldn't have been. These documents are now (properly) not tokenized,
and the extent adjustment is now done correctly.
- Bug 14512: if there's only a single task or a single workflow,
they should not need to be provided. Fixed.
- Major new capability to manage workspaces of documents, including
support for iterative model creation.
- New "Save/Hide" menu to allow better UI desktop management, as
well as a hide widget in the window panes.
- Bug 17687: the UI was showing steps which weren't in the current
workflow, because they were in the document. Fixed.
- Bug 16747: New windows appear directly on top of each other in
the new UI. This has been fixed.
- Bug 18457: a number of closely related bugs in the
task conspired to cause document-level cacheing of names and name
components to fail.
- Added Windows batch scripts to run the various MAT tools outside
of the MAT controller.
- A tiny bug fix to address a problem managing absent Carafe models
in tasks in delivered tarballs.
- Made entire system Unicode-aware. Added guard to ensure that
Carafe does not currently see non-ASCII-compatible files. Added option
to pass character encodings to the engine in the UI and on the command
- Improved error reporting in MATEngine.
This version was not released due to subtle Unicode bugs introduced by
the migration to simplejson.
- Upgraded to Yahoo UI Toolkit version 2.6.0.
- Added ability to customize scorer to some degree for various
- Added --debug flag to experiment engine for debugging support.
- Fixed an unreported bug where rerunning the experiment engine to
regenerate scores when source_corpus_dir was present was failing.
- Bug 17863: errors in the command line tool weren't reporting the
file that caused the error for tokenization. Fixing this in all cases
isn't really possible at the moment, because of the way batch
processing works, but it's possible in non-batch, and should be now
- Corrected a subtle error in rich document encoding in Python
where the sequence "\/" wasn't getting decoded correctly by the
python-cjson library. Migrated to simplejson.
- In the de-identification task, added a simple Java client example
the MAT CGI script.
- Enabled chains of source corpora dependencies in experiment
engine. This corrected an error where the engine would fail if a corpus
pointed to a source corpus which itself pointed to a source corpus. The
only visible consequence of this change is that <prep>
instructions in these chains are layered; if a corpus has a
<prep> instruction and a source_corpus_dir attribute, the input
to the <prep> instruction will be the output of any <prep>
instructions in the source corpus, rather than ignoring the
<prep> instruction in the source corpus as the system previously
did. We know of no system deployments which exploited this previous
- Bug 17681: the psaTransform step was being recorded by mistake in
the transformed document in the de-identification task. The system now
imposes a more general requirement that steps are only recorded on
documents modified by side effect, rather than freshly produced
- Added the ability to pass a lexicon to the training engine.
- Added the ability to pass an experiment directory to the
- Relaxed the requirement that the source_corpus_dir attribute in
the experiment configuration be an absolute pathname. Now, like other
paths in the experiment configuration, if it is not absolute, the
experiment directory will be prepended.
- The experiment engine now copies the experiment XML file into the
experiment directory, if a file by that name is not already present.
- Bug 17613: in the de-identification task, a small bug in the
engine was causing the
training to fail if annotations already contained a replacement
- Bug 17612: a repeat of bug 16099, where trailing newlines aligned
tokens with tags incorrectly after replacement. The previous fix was
slightly incorrect. Fixed.
- Fixed an element of inflexibility in the Carafe trainer, which
didn't permit the command line in the <prep> experiment XML tag
to support input documents
which were rich documents. This command line no longer supplies the
--input_file_type argument automatically, and it must now be provided
explicitly in the experiment XML file.
- Added the beginnings of an improved test suite (not yet
- Bug 16095: Carafe trainer couldn't handle empty lines in the file
specifying the available tags. Fixed in the experiment engine by
automatically generating the tag file. As a result, the --task
arguments are no longer supported in the <prep> and
<run_settings> command lines in the experiment XML; the toplevel
<experiment> tag now requires a "task" attribute; and the
<build_settings> tag no longer accepts the "tag_set" attribute.
- Bug 16749: deleting a document window while the document was
- Bug 17027: in the UI, it was possible to include trailing or
leading whitespace in an annotation during hand tagging. Fixed.
- Bug 17028: in the UI, it was possible to swipe from a taggable
point across an untaggable point to another taggable point. This now
generates an error to the user.
- Bug 17029: bug 16096 was not completely fixed in 0.9pre6; trying
to repeat some errorful calls caused the system to hang. Fixed.
- Fixed minor score accumulation bugs in experiment engine.
- Fixed bug where distributions with only a single task defined
caused an error in the UI.
- Cleaned up computation of file save paths so the application
never adds more than one pathname suffix.
- Bug 14509: to simplify task descriptions, display configuration,
parent tasks. Fixed.
- Bug 14393: modification check was not performed when the task
changed. This bug was mostly applicable to the old interface, but in
the new interface, all client changes are tracked, and rollback
confirmation is presented when any change is undone, or when document
is closed. Fixed.
- Bug 15856: modification check was not performed when steps were
undone. Same issue as bug 14393 above. Fixed.
- Bug 16096: various Carafe errors were not being percolated back
to the client appropriately. Fixed many, many tiny bugs.
- Added functionality to support customization of core
documentation to allow incorporation of application-specific
documentation and application-specific branding.
- Bug 14650: mat_controller.sh was starting up even if the ports
that its servers needed were taken. Fixed.
- Experiment engine now does the right thing when corpora are
reused and modified in the consuming experiment.
- Bug 16099: bracket redaction spans were mistakenly expanded over
subsequent tokens in some cases. Fixed.
- Bug 16478: zone step in interface was not rolled back correctly.
- Bug 16098: drop-down menus were obscured by document windows.
Partially fixed; menu bar can still be partially obscured.
- Feature request 16097: Added token-level scoring, better
alignment for error detail to
the scoring engine.
- Bug 16255: the experiment engine should switch to noninterleaved
mode when it detects that some models have already been built. Fixed.
- Bug 16256: documentation for training_increment in experiment XML
documentation was incomplete. Fixed.
- New user interface, based on the Yahoo! UI toolkit, featuring a
desktop-style interaction with a top menubar and multiple moveable,
resizeable document panes.
- Bug 15870: annotation popups in Firefox 3 were popping up in the
lower left corner. Fixed in the new UI.
- Expanded UI log facility to record seconds since log start, more
details about UI annotation popup interaction.
- Feature request 15803: Initial support for keyboard accelerators
in UI annotation popup.
- Bug 15826: the UI was mistakenly recording some steps that it
shouldn't be recording. Fixed.
- Bug 15810: steps in the UI were not changing when the user
changed the workflow. Fixed.
- Bug 15809: hand annotation was available before zoning was
applied. In the current system, this could result in undigestible data.
- Bug 15808: the "Save raw" button for raw documents made no sense,
and was misleading. Now disabled.
- Bug 15806: tag names were not alphabetized in the annotation
- Bug 15807: some annotation names should be white foreground text.
Easily supported through task specification; tasks updated.
- Bug 15814: annotations weren't visible when reloading saved rich
documents. This bug revealed a number of issues with handling the
global order of steps across workflows, which have hopefully all been
- Bug 14394: steps update wasn't working when workflow changed.
- Bug 15815: Unix installation path cannot contain spaces.
Temporarily fixed by aborting installation, with warning, if spaces are
in the path.