- Fixed an ordering bug MAT-JSON deserialization in Java,
annotation-valued attributes and are values of other
annotation-valued attributes might not have their
non-annotation-valued attributes decoded in time to satisfy the
restrictions of the attribute they're a value of.
- Updated build script to update version number in JAR manifest.
- Fixed a couple small bugs in MATAnnotationInfoToJSON.
- Fixed a bug where the incorrect content type was being
reported to the browser on UI document save, causing an
incorrect file extension to be added in Safari.
- Fixed an suboptimal behavior where XML inline output was
producing extraneous (but harmless) toplevel XML tags to ensure
- Expanded the options available in the UI for the xml-inline
reader, and improved the associated documentation.
- Fixed a bug in the Java core library where annotation IDs were
being processed incorrectly.
- Fixed a bug where right-to-left languages were incorrectly
- Compensated for a bug in YUI which caused buttons to be
attached at unreliable times, making it impossible to center the
editor popups vertically.
- Fixed a small bug in the pairer where the similarity
comparison was returning incorrect values in one specific case.
- Removed duplicate definitions of inChooseMode() in the
- Compensated for a z-index bug in YUI which causes annotation
menu popups to appear behind annotation editor popups in some
cases, and for newer popups to appear behind older ones.
- Compensated for a series of YUI bugs which caused menu
scrolling to behave strangely in annotation menu popups which
exceeded the height of the viewport.
- Fixed a bug where menu scrolling in long annotation menu
popups was too slow.
- Disabled the the "Unset" button in annotation editors for
attributes in choose mode, to avoid mischief.
- Fixed a bug where pressing the close button on a tab while the
user is being asked whether to reload or close would cause an
- Fixed a bug where it was possible to close tabs while waiting
for a step result from the Web server.
- Made all question dialogs modal.
- Fixed bug in keypress handling where modern browsers were no
longer sending what we previously expected for
<Enter>/<Return>. Also changed <Tab> to "-"
for deleting annotations because of cross-browser
- Fixed a bug in the UI where annotations without any display
entry could be selected by clicking.
- Fixed a bug where some of the gestures in annotation context
menus in annotation editors were not bound correctly.
- Fixed a number of bugs relating to how assigned files are
processed from the UI.
- Fixed a bug where setting MAT_PKG_HOME as an environment
variable possibly caused MAT to use the wrong value internally.
- It is no longer possible to open the same file more than once
in the workspace UI. Previously, it was possible, but the older
buffers couldn't be saved or operated on.
- Enabled busy wait image for workspace folder refresh in UI.
- Minor additional improvements to workspace panel file
information display in UI.
- Fixed a bug where the UI panel which displayed unexpected
errors from the backend was not draggable.
- Fixed a bug in the standalone viewer where overlapping
annotations weren't being stacked on initial display when hand
annotation was enabled.
- Fixed a bug where tokenized documents containing annotations
whose indices aren't aligned with the tokens caused a failure in
collecting corpus statistics during model building.
- Added version information to Java JAR manifest.
- Fixed a small bug in UI autotagging involving matching a
pattern which doesn't fall on token boundaries.
- Fixed a bug with error handling in toolchain execution in
- Minor documentation updates.
- Fixed a bug in the pairer which was failing to deal with
spanless annotations when there was no task.
- Removed an outdated reference to
--mark_gold_standard_reference in the MATWorkspace
- Fixed a typo in the toolchain which would cause an error in
the case where the task has exactly one workflow.
- Fixed a typo in a Windows command in the MATEngine
- Improved documentation about the --lexicon_dir option to
- Fixed a bug where edit_immediately was not being respected for
- Fixed a bug where true labels could not be used as a shorthand
for effective labels for adding annotations in choose mode.
- Documented the MAT runtime environment variables and added a
common command-line option to report where they're read from.
- Fixed a bug where the display properties of annotations with
effective labels were not consistently updated when the
effective label attribute value changes.
- Fixed an error in the MAT UI in segmenting documents which
contain lines at the end which are wrapped.
- Fixed a bug in the MAT UI where highlighting an annotation's
parents or children was mispositioning the attribute name
- Fixed a bug in the MAT UI where the attribute name tooltips
were mistakenly permitted to scroll off the right side of the
- Fixed a bug in the MAT UI where the annotation editor popup
max size was not being computed correctly.
- Fixed a bug in the MAT UI where annotation editor popups, if
moved, would jump back to the center of the viewport when the
viewport was resized.
- Undid Java 1.7 requirement.
- Fixed a small bug in the toolchain where jCarafe was being
executed even if there were no documents to tag.
- Disabled copying annotation-valued attributes during
autotagging in the UI.
- Removed some old code from the Java MAT engine client API.
- Final documentation tweaks.
- Fixed a series of Windows bugs relating to situations where
the workspace DB was not appropriately closed.
- Added access to score and similarity profiles to experiment
- Upgraded to jCarafe 0.9.8.5b-06.
- Raised Java requirement to 1.7.
- Fixed a typo in MATReport.
- Made the MAT-JSON Python reader a bit more forgiving.
- Fixed a bug in MATTransducer where the record of steps applied
to the document was not retained when a convertor was provided.
- Fixed a comparable bug in the reader/writer subsystem.
- Fixed a bug in the document reader where compound steps
weren't being consulted correctly when inferring the steps
applied to a document.
- Fixed a bug where the tokenizer was not anticipating the
unlikely case where a document had been tokenized but the record
of steps applied did not mention the tokenization.
- Fixed a bug where the scorer was failing to use the new
comparison mechanisms for tokens.
- Fixed a bug where comparison sets in the scorer created
because of transitive overlaps could have resulted in pairing
annotations which don't overlap at all.
- Fixed a subtle bug in the experiment engine which arose when
labels which are present in the training corpus are not present
in the test corpus.
- Documentation formatting tweaks.
- Small enhancement of document mapping to preserve XML
- Partial updates for possible future support of Internet
- Ensured that annotation editor titles don't grow without
- Ensured that annotations without CSS display instructions
aren't rendered in any way.
- Fixed a bug where it was possible for documents in the
standalone viewer to lack a global annotation type repository.
- Added new functionality to the XML template digester to allow
tokenless_autotag_delimiters to be Unicode.
- Fixed some tiny bugs in the Web UI templates.
- Inserted some missing documentation pages.
- Fixed a subtle but dangerous bug in the document convertor
which was causing the wrong annotations to be converted.
- Fixed a couple of tiny bugs in the scoring pairer.
- Fixed a bug in the UI where documents which were identified as
reconciliation documents but were not were causing the UI to
- Tightened the restrictions on int attributes: choice
restrictions and range restrictions cannot cooccur.
- Documentation updates.
- Fixed a bug where the function that aligns annotations with
new tokens didn't work with overlapping annotations, and failed
to forcibly detach annotations it deleted.
- Fixed a bug where annotations without visible displays were
taking up space in the UI.
- Added a warning to notify users that MAT currently can't
distinguish in UI presentation between annotation labels which
differ only in case.
- Documentation updates.
- Implemented a workaround for a limitation on numbers of query
parameters in sqlite3.
- Enhanced the behavior of MATTransducer for reporting errors,
capturing transduction history.
- Extended pair comparison to handle multi-attribute dimensions,
- Fixed an insidious bug where the endpoints of swipes in the UI
during annotation were being misinterpreted in rare
- Fixed a bug where annotation popups and informative dialogs
were not necessarily being removed when documents were
deselected (via tabs, e.g.) in the UI.
- Added more operations to the document conversion.
- Modified the workspace_configuration operation in workspaces
to print out the task.
- Demoted the support level of Windows XP, which we no longer
have access to.
- Fixed a bug where documents which had hand annotation
available when they were loaded into the UI were not being
displayed in stacked mode.
- Fixed a bug where it was not possible to modify the extent an
annotation in the UI if the swipe overlapped with multiple
annotations on the same layer.
- Fixed a bug where closing dialogs that ask a question left the
UI in a strange state.
- Fixed a rare bug where HTML characters in the annotation
tables were not displayed correctly.
- Updated to jCarafe 0.9.8.4.RC14 to fix an off-by-one error.
- Documentation updates.
- Minor bug fixes.
- Documentation updates, including Java docs.
- Minor extension to document conversion.
- Added more customization options to attribute entries in UI
tables and editor/viewers: custom editor label customization,
- Minor bug fix.
- Minor updates to score configurability.
- Fixed bug in UI where workspace listing was not retaining its
- Added the ability to designate an attribute as read-only for
UI editing purposes.
- Ensured that sidebar redisplay happens appropriately for
- Minor bug fixes.
- Implemented case-insensitive autotagging UI option.
- Expanded the score profile options.
- Fixed a bug in workspace logging from the UI.
- Added highlighting and scrolling improvements to the
- More documentation updates.
- Other minor bug fixes.
- Updated Java Carafe bindings for latest version.
- More documentation updates.
- Updated standalone viewer to support new comparison paradigm.
- More documentation updates.
- Minor bug fixes.
- More documentation updates.
- Added description columns to the detail spreadsheet output of
- A variety of tiny bug fixes.
- Ensured compatibility with, and upgraded to, jCarafe
- More documentation updates.
- Updates to Java bindings to deal with annotation attribute
- Added an (undocumented) capability to add Web operations to
- Workspace dialog in the UI now focuses on the next input
element when you tab.
- Fixed a number of small bugs.
- Lots of documentation updates.
- Modified the way effective labels are declared.
- Fixed a number of small bugs.
- Imported document mapping XML capability (not yet documented).
- Scrubbed the current state of the UI for logging updates.
- Fixed a bug where double-clicking on a word at the end of a
line in non-tokenized hand annotation was selecting the wrong
- Extended MATManagePluginDirs to list the task names in the
- Enhanced MATTransducer with the document mapping capability,
along with a reporting capability based on the same tools as
those used in MATReport.
- Fixed a bug in the distribution code related to Mac
configuration of Terminator.
- Added utility to create comparison documents.
- Extended hand annotation tool to cover complex label
- Implemented workspace logging awareness in UI.
- Made all annotation capabilities work in standalone UI.
- Improved scoring algorithm to do a better job of pairing
- Removed obsolete default_tag_window_size and
- Improved tagging feedback in the UI.
- Initial documentation updates.
- Hand annotation now supports adding overlapping annotations.
- Reorganized annotation pane to force span breaks at line wrap
boundaries, in order to have the display behave properly with
annotations which are longer than a single line and to line up
annotations properly with spanless annotation icons.
- Cleaned up implementation of resize in annotation panes.
- MATReport now supports per-label reports.
- Introduced new document comparison pane, using output from the
pairing algorithm, including relation comparison.
- Implemented read-only appropriately across the UI for
annotation editor popups.
- Fixed a variety of small bugs.
- Added ability to report spanless annotations in MATReport.
- Added ability to register convertors for tasks in
- Added "scroll to" action in UI.
- Added annotation highlighting to annotation table mouse hover.
- Fixed a couple of small bugs.
- Imported workspace logger and reconciliation workflows from
- Added ability to treat all annotations as content annotations
- Initial implementation of expanded annotation ontology in
- Added stubs for new comparison window.
- Added --version option to common command-line options, and
documented common options separately.
- Added "About MAT" option to UI, with version number.
- Implemented defaults in annotation task descriptors.
- Implemented new comparison documents based on similarity
- Added spanless sidebar and annotation highlighting in UI.
- Introduced improvements to feedback for hand annotation and
choose mode in UI.
- Improved plumbing for presenting results of document and
annotation modifications in UI.
- Fixed a variety of small errors.
- Implemented similarity profiles for Kuhn-Munkres-based
scoring. Enabled scoring of annotation-valued attributes and
- Added similarity and scoring profiles to the scorer, including
tag aggregations and tag decompositions in the scoring profiles.
- Enhanced MATTransducer to work on files one at a time.
libraries, including the ability to use the text span as the
default value for spanned annotations.
- Fixed two small bugs in the MAT UI.
- Added MATAnnotationInfoToJSON
to support standalone viewers.
- Fixed a small bug in computing tag orders for the UI in the
new annotation specification regime.
- Initial implementation of the capability of stratified scoring
of annotations with annotation-valued attributes (not yet
- Separated the annotation pairing capability from the scorer.
- Added the ability to create custom attribute editors in the
- Re-enabled popup annotation editors in standalone viewer.
- Fixed some minor bugs in how the string-valued attributes
appear in the annotation editor.
- Incorporated Kuhn-Munkres alignment algorithm for scoring
- Fixed bug with deleting annotations in the UI.
- Fixed a couple minor UI bugs.
- Re-introduced the overlay manager to handle popup
backgrounding and foregrounding.
- Reworked the UI annotation manipulation plumbing to provide
more general support for redisplay.
- Fixed a bug where the annotation table was always being shown,
even when it wasn't needed.
- Disabled the "Show annotation tables" menu item when a
document shows tables regardless (because it has spanless
- Added the ability to delete annotations from various context
- Added the ability to choose an annotation in "choose mode"
from the annotation editor action menu.
- Discovered and corrected inconsistencies in how the annotation
editors were invoked in the standalone viewer.
- Fixed a Safari bug relating to the menu update for the
- Fixed bugs in the standalone viewer relating to the upgrade to
the new annotation set descriptors.
- Numerous tiny UI bug fixes.
- Expanded UI support for annotation-valued attributes,
including actions to add annotations to parents.
- Added annotation tables and the ability to create spanless
- Added support for a global "choose mode" in the UI.
- Expanded range of editable attribute types in annotation
- Removed old desktop UI.
- Migrated all tasks and plugin manager to new well-formedness
- Implemented global and local annotation type managers for
well-formedness condition support.
- Added utility to update task.xml files.
- Improvements to initial well-formedness condition checking in
- Implemented initial UI support for annotation-valued
- Enhanced presentation of annotations and synchronization
between annotation views in UI.
- Improved global management of view settings in UI.
- Added ability to edit annotations in tabs.
- Migrated to new workbench UI.
- Added annotation capability to standalone document viewer.
- Added support for lists and sets of attribute values in
MAT-JSON and Python, as well as float, string, int, and boolean
- Initial Python implementation of well-formedness conditions
for annotation attribute values.
- Fixed a tiny bug in the UI where unzoned documents couldn't do
autotag because no content intervals were found.
- Updated the workspace documentation with troubleshooting tips.
- Other minor doc updates.
- Fixed bugs in Java bindings where annotation IDs were being
incremented incorrectly, and spanless Asets were being created
- Fixed a couple small bugs in the build script related to
missing executables on MacOS X.
- Fixed a bug in CherryPyService where images from task
documentation had the wrong MIME type.
Version 2.0pre4 is the current frozen version for the
Callisto-compatible version of TooCAAn.
- Added confidence capturing to the jCarafe wrapper.
- Added MATTransducer.
- Added the --xml_translate_all option to the XML inline reader.
- Transitioned to new workspace structure. Introduced conversion
- Initial reorganization of extensible command-line arguments.
- Removed Cygwin support.
- Imposed requirement of Python 2.6 or later.
- Introduced reconciliation capability in UI.
- Removed the legacy "operate" keyword in workspace commands.
- Fixed bug 21049: XML inline rendering and reading should skip
the untaggable tags, which typically cause crossing dependency
problems. Fixed by removing untaggables except in the UI.
- Added overlap handling to the scorer.
- Fixed a bug in ordering annotations in the UI by making it
impossible to reuse annotation names as attribute set names in
- Made it possible to restrict scoring to gold segments in
either the hypothesis or the reference.
- Made the confusability matrix in the scorer more well-behaved
- Changed recall, precision, and f-measure in the scorer to
return scores between 0 and 1, rather than 0 and 100.
- Fixed a set of interrelated bugs that was blocking Carafe from
acting as a partial tagger.
- Extended scorer to make use of all recorded distinguishing
attributes in annotations.
- Added global management of debug flags, subprocess
information, temporary directories.
- Modified scorer and experiment engine to generate multiple
spreadsheet types in the same run.
- Enhanced fake XML reader to recognize zero-length tags, nested
tags, recover properly from "misformed" tags.
- Removed mat_controller.sh and accoutrements. Refactored MATWeb
to take responsibility for starting up the tabbed terminal.
- Updated to new version of jCarafe.
- Added support for running experiments against workspaces to
both MATExperimentEngine and MATWorkspaceEngine.
- Added closing checks for the MAT UI to warn about unsaved
document and log changes, and to free workspace locks.
- Added initial support for adding hooks to present annotation
- Added indication of when a document is modified to the
document pane title.
- Fixed a subtle bug involving figuring out whether a document
is dirty or not.
- Extended mat-json format.
- Provided new comparison window in UI.
- Provided some final documentation updates to stress the lack
of security in the MATWeb server.
- Fixed a final bug in the demo UI.
- Backported standalone document viewer wrapper from MAT 2.0
- Fixed a tiny CSS bug.
- Updated MATScore documentation to clarify that only documents
with nonoverlapping content annotations can be scored.
- Fixed a conceptual bug in the detail spreadsheet in the
- Fixed a bug in MATWeb where the server wasn't enforcing that
the workspace directory appear beneath the workspace container
- Fixed a bug in the confusability matrices in the scorer.
- Fixed a bug in handling forced recomputation in the experiment
- Fixed a bug in the UI where workspace document windows were
not being tiled appropriately.
- Added a plugin hook for tasks to modify the UI metadata.
- Fixed a small bug in handling autotagging in the UI.
- Introduced further granularity into the scorer to describe
various subtypes of span clashes. Introduced an option to enable
this granularity (disabled by default).
- Fixed a small scorer bug which arose when reference documents
contained no annotations.
- Fixed a tiny but important bug in counting characters in XML
inline input where the background signal is itself XML.
- Improved debug handling in the model trainer.
- Added --preserve_tempfiles to preserve tempfiles during model
- Improved error checking for color interpretation for Java
- Added the ability to ignore labels to the scorer.
- Provided a workaround for managing zones to compensate for a
jCarafe bug in tokenization.
- Fixed small bug in scorer which raised an error when tokens
were present in one, but not both, of the reference and
- Added equivalence classes and confusability matrix generation
to the scorer.
- Expanded the core scorer columns to reflect the various
subtypes of clashes directly.
- Small documentation tweak.
- Internal API modifications to the document writer.
- Improved the handling of command line help messages for
- Added character and pseudo-token scoring to the MAT scorer.
- Fixed a bug in managing the Web service headers which caused
spurious errors to be generated when running the demos.
- Modified remove operation in workspaces not to remove all
basenames by default.
- Added the fake-xml-inline reader.
- Fixed two security issues in workspace access. First, if
workspace container directories are specified to MATWeb, they must be used; no absolute
pathnames are permitted. Second, workspace keys are no longer
passed on the command line when MATWeb restarts.
- Fixed a tiny typo in the document reader.
- Fixed a small bug in the MAT toolchain where it was not
possible to access errors from the document reader.
- Fixed a small bug in the MAT UI which arose while implementing
- Added the ability to specify character encodings to MATScore.
- Fixed a bug where input-only or output-only document formats
were not being treated appropriately in the MAT UI.
- Added the ability to insert document convertors for reading
and writing in the document IO subsystem, to, e.g., convert from
PERSON to ENAMEX TYPE=PERSON when reading or writing.
- Added capability to autotag similar text strings in the MAT
- Moved the text_right_to_left property from the workflows to
the Web customization, since it's essentially task-global.
- Fixed a bug in the new redistribute.py utility which prevented
it from working with file paths with spaces in them.
- Added to the redistribute.py utility the ability to remove
tasks from the target distribution.
- Enhanced MATWeb with the ability to supersede existing MAT Web
servers, to capture random output and error, and to bundle a
number of related features under the new --as_service option.
- Added more restrictive color format recognition in the Java
bindings to support Callisto color interpretation.
- Improved the encapsulation of the Web frontend core pieces to
- Fixed incorrect and overly literal MIME type in Web server
- Removed all attempts to access annotation category information
via the document. All such accesses now happen via the task.
- Added the redistribute.py utility.
- Fixed a bug in the MAT UI where annotation attributes and
values were not being shown when the mouse hovers over an
- Fixed a bug in the experiment engine where the order of
documents in training corpora was not preserved across restarts
of the experiment engine (important when corpus size iterators
- Bug fixes for Java API improvements.
- Introduced notion of iterators to experiment engine.
- Extended experiment score summary files to include more
information about runs and models.
- Enhanced Java API slightly to support better introspection
about task information.
- Fixed a bug which corrupted line endings in CSV files on
- Upgraded to Java Carafe 0.9.7RC4 to address a problem with
serializing mat-json files in the tokenizer.
- Introduced a check to ensure that MAT 1.2 fails intelligently
when presented with future versions of the mat-json
- Minor documentation updates.
- Upgraded to Java Carafe 0.9.7RC3. This version of Carafe is
not compatible with models built with previous versions of Java
- Fixed two small bugs where ENAMEX-style tags were not being
ordered correctly when labels were not being alphabetized in the
- Fixed a small bug in the scorer where documents with no zones
(e.g., XML documents which contain only annotations) were not
being scored correctly when a task was specified.
- Fixed a small bug in setting up the tabbed terminal in
- Fixed a small bug where Java Carafe model training might do
the wrong thing with duplicate file names.
- Reorganized documentation links to create a better initial
experience when unpacking the distribution.
- Fixed a small bug in the initialization sequence when
interacting with the Java Carafe tagger server.
- Fixed a small bug in CSS displays of annotations with labels
whose name contain a dash.
- Small documentation updates.
- Bumped Java requirement to 1.6 update 4 due to issues with
- Bumped Java requirement to 1.6.
- Extended the capabilities of the property caches in the
experiment engine to handle a wider range of data types.
- Removed the option to make smaller zones for PSA training
because it's no longer needed. Removed the no_random_segments
configuration option as part of this change.
- Removed option for SGD training because it's being removed
- Insulated the MAT Web server against stray ill-formed cookies
that the browser might deliver to the server by accident.
- Retokenized sample data and updated Java Carafe bindings for
- Fixed a bug where MATReport and MATScore spreadsheets were
being created as ASCII rather than UTF-8.
- Expanded optional subprocess monitoring to include children
and remote children of subprocesses.
- Fixed a bug with restart in MATExperimentEngine.
- Small documentation updates.
- Updated to Java Carafe 0.9.5. All previous models will have to
- Clarified a few error messages, updated some comments.
- Fixed an obscure bug in Windows 7 which was resulting in a
Windows memory error.
- Added the optional ability to monitor subprocess memory image
sizes via the open-source psutil package. Enabled MATEngine,
MATModelBuilder, MATExperimentEngine and MATWorkspaceEngine with
- Fixed a bug in the test suite which limited its ability to run
- Bug 19392: Ensured that scorer doesn't include annotations
outside zones in the reference document, if zones are known to
- Extended MATReport to provide file-level statistics.
- Improved error message when character encoding is wrong.
- Added the "default_tag_window_size" and
"default_tag_window_position" attribute in task.xml to provide
the ability to control the annotation windows in the MAT UI.
- Add the MATReport tool to generate concordance-style
annotation reports in CSV and text formats.
- Fixed a small installer bug which arose when Java could not be
- Fixed a bug where deleting annotations in workspace mode in
the UI wasn't enough by itself to mark a document as needing to
- Fixed a bug where the new keyboard accelerator for repeating
the last annotation wasn't working on Windows.
- Extended the (undocumented) mechanism for task-specific
customize the presentation of untaggable regions.
- Added the ability to add a diff file against a previous
distribution to a distribution tarball.
- Documentation updates.
- Tiny bug fix in error reporting from Java Carafe.
- Enhanced Java Carafe wrapper with control for Java stack size.
- Fixed some small bugs in error reporting from Java Carafe.
- Improved the description of corpora in CVS output files in the
- Added the error text description to the logging CVS files in
the MAT UI.
- Enhanced task.xml files with defaults for the Java subprocess
- Improved documentation and record-keeping for MAT builds.
- Minor documentation improvements.
- Added the "alphabetize_labels" attribute in task.xml to
provide the ability to control how the annotations are ordered
in the UI legend and popup.
- Fixed a typo bug in the tagging service.
- Updated some documentation as a result of user feedback.
- Added the ability to add last tag in the UI, with a consistent
- Fixed two more small command-line argument processing bugs,
one in MATModelBuilder and one in MATRetokenize.
- Fixed a bug in the scorer, where the task was not being used
as a fallback source of tag label metadata.
- Modified MATScore to allow the user to provide lists of
content annotations and token annotations directly, in case
neither task nor tag label metadata is available.
- Fixed a bug in the processing of command line arguments in
- Fixed a bug in MATRetokenize that arose when only a single
task was defined.
- Added internal APIs to make it easier to make use of taggers
and tokenizers other than Carafe.
- Fixed a lurking bug where CRLF was not being handled correctly
when XML inline was being read.
- Further cleanup of documentation for Windows port.
- Improvements and simplifications to the distribution and
plugin installation code.
- Fixed minor bugs in constructing cascaded annotation menu
- Whitespace in tag labels is now supported correctly in the MAT
- The annotation window in the MAT UI now displays the content
annotations under the mouse.
- The MAT UI now supports cascaded annotation menus, via the
<tag_group> element in the task.xml file.
- The task.xml file now provides a means to control text
direction for individual workflows, using the text_right_to_left
- The task.xml file syntax has been modified to clarify how
attributes are handled by MAT. For advanced users, this
necessitates a number of changes in the task.xml file (see the upgrade
- Fixed a bug where the scorer was breaking if files had no tags
- It is now possible to use workspace operations directly on the
command-line, without the "operate" operation.
- Replaced the OCaml tokenizer and Carafe tagger with the Java
reimplementation. This necessitates a number of changes,
including retokenizing your documents and rebuilding your models
(see the upgrade
- Command-line options for steps which are used in more than one
step are now appropriately cross-referenced in the help string.
- Workspace operations which don't affect any pathnames now
raise an error.
- The system now appears to work in Python 2.6 and in MacOS X
- The experiment engine XML file now allows you to define
bindings for commonly repeated values, and also supports
explicitly referring to the experiment directory and pattern
- The system now works in native Windows, without Cygwin.
- The system is now distributed in a single zip file for all
- The MAT UI has been improved to support hand annotation
without tokens (although this is not recommended).
- MacOS X no longer requires a special installation of Python,
even though select.poll is still missing on that platform.
- Bug 24205: due to a bug in the experiment engine, non-default
model configurations weren't being handled correctly. Fixed.
- Bugs 15815, 19669: Make sure that the whole system does the
right thing with spaces in filenames. Fixed.
- Extended and modified how corpora are specified in experiment
XML, introducing considerably greater flexibility. It's now
possible to specify n-way corpus splits, and group them
arbitrarily in training and test runs.
- Fixed an obscure bug where child tasks of visible tasks
weren't being reported appropriately to the UI.
- Fixed a minor bug in formatting XML output.
- Fixed a bug in workspace locking.
- Minor documentation updates.
- Enabled confidence interval reporting. Added reporting of
- Added general workspace locking for operations, importing and
removing basenames, opening a workspace file in the UI, and
listing the contents of the workspace folders.
- Initial support for Python 2.6.
- Upgraded to CherryPy 3.1.2.
- Added XML reader to MATWorkspaceEngine import; expanded
documentation on MATExperimentEngine to describe how to prepare
corpora with XML documents.
- Added support for multiple model build settings in task.xml.
- Added PluginMgr.AlignStep to the core to support alignment of
externally-generated content tags with token boundaries.
- Converted calls to MATEngine command-line tool in the
experiment engine to invocations of the MATEngine object.
- Expanded Java service API to cover checking for the existence
of a workspace, listing the contents of a workspace folder, and
opening a workspace file.
- Bug 24020: due to a bug in the demo infrastructure, it was not
possible to reprocess a document reliably. Fixed.
- Bug 23988: proxies for multiple steps in task.xml weren't
doing the right thing when the proxy set didn't exactly match
between proxies. You shouldn't do this, and the UI still
exhibits some unusual behavior, but it's fixed in the engine.
- Bug 21112: the experiment engine insisted on converting test
documents to raw form before processing for the test run phase.
It's now possible to specify other default preprocessing (e.g.,
just undo tagging).
- Bug 20835: global maintenance of annotation type objects was
not thread-safe. Fixed by making annotation type objects local
to a document.
- Added XML reader and writer to MATEngine, MATModelBuilder,
- Added ability to define arbitrary readers and writers.
- Added the ability to use your own training engine in your own
- Added feedback for workspace import.
- Added option to enable workspace access from remote clients
via the Web server.
- Added splash screen to UI.
- Enhanced scorer to provide file-level count data to support
computing confidence intervals (not enabled yet).
- Enhanced load and save in workspace mode in UI by adding the
document basename to the Web service result.
- Added the "rich, incoming" folder to the core workspace,
for importing files which are in rich format but not yet
prepared for hand tagging.
- Enabled rich document readers to infer the processing state of
documents which lack the appropriate metadata.
- Added "remove" operation to MATWorkspaceEngine.
- Fixed a subtle bug where tagging engines which fail on startup
were not notifying the UI client properly about the failure.
- Fixed an obscure bug where MAT JSON documents which share the
same annotation label but have different attribute orders were
being digested inconsistently.
- Fixed a small bug in MATWeb which prevented logs from rotating
- Fixed a small bug in the Java library which didn't correspond
to the (correct) documentation.
- Bug 18427: in some situations where the UI viewport is small,
the menu bar would disappear. Fixed.
- Bug 20757: ENAMEX-style tags and attributes didn't work with
Carafe training on JSON documents. Fixed.
- Bug 23672: the MIME type of log spreadsheets was incorrect
when saved from the MAT UI, causing Excel to fail to digest the
logs properly on Windows. Fixed.
- Bug 23668: "Update workspace key" in the UI was failing to
percolate appropriately to "Open workspace...". Fixed.
- Minor documentation updates.
- Fixed a bug where command-line arguments of MATEngine weren't
overriding step attributes in the task.xml file. Added
- Fixed a bug where step attribute defaults weren't being
- Fixed a bug in the Java bindings where it was possible to
create an annotation type with null attributes.
- Fixed an infelicity in the initial README when users first
unpack the distribution.
- Minor documentation updates.
- Fixed minor omission in experiment infrastructure.
- Fixed a bug where extra task paths were not being
canonicalized when task directories were being computed.
- Added a temporary fix, to be backed out in 1.1, to use the
Carafe engine for the anonymization task in the core
- Fixed a bug in branding in the UI.
- Minor documentation updates.
- Minor documentation updates.
- Vast tracts of documentation updates.
- Minor modifications of the UI log action names to improve
- Bug 19970: because of a deep bug in the interaction between
Firefox and the UI toolkit when the backend server was down, no
feedback was being provided to the user about the failure.
- Bug 17989: the experiment engine was saving its raw gold files
as ASCII, rather than a Unicode-compatible encoding like UTF-8.
- Bug 17867: HTML and HTTP escapes were not being inserted
appropriately in the routines that generate task-specific
- Bug 17778: various operations that could be performed in the
UI weren't blocking out rerequests while the operation was being
- Bug 16039: left click (not swipe) on an annotation while hand
annotating selected just that token, not the entire annotation.
- Extended branding to documentation.
- Yet more documentation updates, including Java client library
- Changed the "inherit_actions" attribute of workspace
operations in task.xml to "inherit_operations", for consistency.
- Fixed a bug in the plugin manager which was making demo files
inaccessible for tasks without Python customizations.
- Even more documentation updates.
- Corrected terminology for the deidentification task.
- Fixed MATWeb so that it preserves its workspace key across
- Expanded the Java client library to support workspace
- Added branding capability to UI and tasks.
- Vast documentation updates.
- Added a Java client library (not documented yet).
- Bug 19772: install.bat was failing with links in Cygwin
- The settings for the Carafe model builder are now specified in
the task.xml file, and can be overridden in the experiment XML
or in the workspace settings. As part of this change, the "task"
attribute to <build_settings> in the experiment XML file
is no longer recognized.
- Added the MATModelBuilder tool.
- Added the MATManagePluginDirs tool.
- Expanded the documentation.
- The tabbed terminal is now optional. Build, install and
runtime have been appropriately updated.
- The visible name of the rich JSON format in the UI, command
line, and all configuration and experiment files has been
changed to "mat-json". This change is in support of future
additions of other readers and writers for the MAT system.
- The URL with which MAT can be accessed in the Web UI has been
changed to 'http://<host>:<port>/MAT/desktop".
- For any task directory ending with basename <name>, you
can now access a desktop restricted to that task at
- Initial support for user-friendly demos has been introduced.
The configuration file for this capability will be documented
when the capability matures a little more.
- All XML configuration file layouts are now defined by a
simple, user-readable templating system.
- It is now possible to define user-visible workflow steps which
consist of sequences of implementations of other steps (e.g.,
"prep" might implement "zone,tokenize").
- The Web UI is no longer tied as closely to the names of the
steps in the MAT engine.
- Tasks now support the option of defining "attribute sets" for
annotations, so that, e.g., an ENAMEX tag with different values
for the "type" attribute can map to different CSS
configurations, and be treated differently in the scorer.
- Bug 19240: if a core task was installed using
MATInstallApplication after a dependent task, and both had
documentation, the documentation rendering was broken. This has
been fixed by moving to live, on-line generation of the
- Bugs 14466, 16090: The tarball distribution contained multiple
copies of the MAT executables, only one of which was configured
correctly. All duplications have now been eliminated. As a
result, all MAT executables in the tarball distribution should
be accessed from src/MAT instead of build/MAT.
- Bugs 14507, 19391: Previously, there was a separate step to
install the Web documents in a separate location. That step has
now been eliminated.
- MATServer has been removed from the system, and replaced by a
thread in the CherryPy Web application.
- The underlying version of Carafe has been upgraded, and old
training models are not forward-compatible with the new engine.
Models will have to be rebuilt.
- Bug 14508: The Web server could not be limited to localhost.
Now it can be.
- Apache has been removed from the system, and replaced by the
Python-based CherryPy Web application infrastructure.
- The MAT scoring engine now defaults to writing CSV files with
spreadsheet-interpretable equations for computed values. This
behavior can be controlled on the command line for both the
scorer and the experiment engine.
- Conversion of ad-hoc mechanisms for Carafe customization to
consistent command-line updates. Note: due to this change, the way to add a
permanent Carafe model to a task file has changed.
- Bug 16450: the -prior-adjst argument of Carafe was not
available for customization in any useful way, even though it
controls recall/precision bias. Fixed.
- Bug 16225: the random zoning option to optimize PSA training
was available only to resynthesized documents, because it was
part of the de-identification task. It is now a feature of the
trainer itself, and the zone tags are temporarily inserted into
the document immediately before training if PSA training is
- Major refactor of internal management of steps to support step
rollback in the backend and address a large number of bugs.
- Bug 18960: switching between workflows wasn't undoing all the
appropriate UI configuration steps. Fixed.
- Bug 18898: After cleaning up the way transformation and
nomination should work in the de-identification task, changing
the replacer between the two steps could screw up the
application of the transformation. Fixed by cleaning up the way
metadata is handled in the UI and transported to the backend.
- Bug 18849: experiment directory is created incorrectly if it
doesn't exist when the experiment is started. Fixed.
- Bug 18848: Transforming a document in the de-identification
task would strip all the tags if tokenization had been skipped.
Fixed by doing a smarter job of figuring out how to postprocess
- Bug 18534: hand annotations were not showing up when hand
annotated documents were reloaded, because of a logic error in
the relationship between steps done and steps visible. Fixed.
- Bug 18430: hand annotation was erroneously available during
resynthesis. No longer.
- Bug 18426: the clean step can't be undone, but nothing was
preventing this. Now raises an error.
- Bug 18425: the clean step was not a core step. It is now.
- Bug 17680: the psaTransform step was not being rolled back
appropriately in one implementation of the de-identification
task, because it wasn't "really" a transform step. Refactoring
the step management fixed this.
- Bug 17610: rollback was not supported in the backend engine.
Fixed. Engine now accepts an --undo_through argument.
- Bug 16603, 16604: newlines are preserved during transformation
in the de-identification task, but the extent of the tags needed
to be adjusted. In non-clear replacement, the right thing was
happening by accident, because those documents were being
tokenized when they shouldn't have been. These documents are now
(properly) not tokenized, and the extent adjustment is now done
- Bug 14512: if there's only a single task or a single workflow,
they should not need to be provided. Fixed.
- Major new capability to manage workspaces of documents,
including support for iterative model creation.
- New "Save/Hide" menu to allow better UI desktop management, as
well as a hide widget in the window panes.
- Bug 17687: the UI was showing steps which weren't in the
current workflow, because they were in the document. Fixed.
- Bug 16747: New windows appear directly on top of each other in
the new UI. This has been fixed.
- Bug 18457: a number of closely related bugs in the
de-identification task conspired to cause document-level
cacheing of names and name components to fail.
- Added Windows batch scripts to run the various MAT tools
outside of the MAT controller.
- A tiny bug fix to address a problem managing absent Carafe
models in tasks in delivered tarballs.
- Made entire system Unicode-aware. Added guard to ensure that
Carafe does not currently see non-ASCII-compatible files. Added
option to pass character encodings to the engine in the UI and
on the command line.
- Improved error reporting in MATEngine.
This version was not released due to subtle Unicode bugs introduced
by the migration to simplejson.
- Upgraded to Yahoo UI Toolkit version 2.6.0.
- Added ability to customize scorer to some degree for various
- Added --debug flag to experiment engine for debugging support.
- Fixed an unreported bug where rerunning the experiment engine
to regenerate scores when source_corpus_dir was present was
- Bug 17863: errors in the command line tool weren't reporting
the file that caused the error for tokenization. Fixing this in
all cases isn't really possible at the moment, because of the
way batch processing works, but it's possible in non-batch, and
should be now fixed.
- Corrected a subtle error in rich document encoding in Python
where the sequence "\/" wasn't getting decoded correctly by the
python-cjson library. Migrated to simplejson.
- In the de-identification task, added a simple Java client
example for the MAT CGI script.
- Enabled chains of source corpora dependencies in experiment
engine. This corrected an error where the engine would fail if a
corpus pointed to a source corpus which itself pointed to a
source corpus. The only visible consequence of this change is
that <prep> instructions in these chains are layered; if a
corpus has a <prep> instruction and a source_corpus_dir
attribute, the input to the <prep> instruction will be the
output of any <prep> instructions in the source corpus,
rather than ignoring the <prep> instruction in the source
corpus as the system previously did. We know of no system
deployments which exploited this previous configuration.
- Bug 17681: the psaTransform step was being recorded by mistake
in the transformed document in the de-identification task. The
system now imposes a more general requirement that steps are
only recorded on documents modified by side effect, rather than
freshly produced documents.
- Added the ability to pass a lexicon to the training engine.
- Added the ability to pass an experiment directory to the
- Relaxed the requirement that the source_corpus_dir attribute
in the experiment configuration be an absolute pathname. Now,
like other paths in the experiment configuration, if it is not
absolute, the experiment directory will be prepended.
- The experiment engine now copies the experiment XML file into
the experiment directory, if a file by that name is not already
- Bug 17613: in the de-identification task, a small bug in the
training engine was causing the training to fail if annotations
already contained a replacement nomination. Fixed.
- Bug 17612: a repeat of bug 16099, where trailing newlines
aligned tokens with tags incorrectly after replacement. The
previous fix was slightly incorrect. Fixed.
- Fixed an element of inflexibility in the Carafe trainer, which
didn't permit the command line in the <prep> experiment
XML tag to support input documents which were rich documents.
This command line no longer supplies the --input_file_type
argument automatically, and it must now be provided explicitly
in the experiment XML file.
- Added the beginnings of an improved test suite (not yet
- Bug 16095: Carafe trainer couldn't handle empty lines in the
file specifying the available tags. Fixed in the experiment
engine by automatically generating the tag file. As a result,
the --task arguments are no longer supported in the <prep>
and <run_settings> command lines in the experiment XML;
the toplevel <experiment> tag now requires a "task"
attribute; and the <build_settings> tag no longer accepts
the "tag_set" attribute.
- Bug 16749: deleting a document window while the document was
- Bug 17027: in the UI, it was possible to include trailing or
leading whitespace in an annotation during hand tagging. Fixed.
- Bug 17028: in the UI, it was possible to swipe from a taggable
point across an untaggable point to another taggable point. This
now generates an error to the user.
- Bug 17029: bug 16096 was not completely fixed in 0.9pre6;
trying to repeat some errorful calls caused the system to hang.
- Fixed minor score accumulation bugs in experiment engine.
- Fixed bug where distributions with only a single task defined
caused an error in the UI.
- Cleaned up computation of file save paths so the application
never adds more than one pathname suffix.
- Bug 14509: to simplify task descriptions, display
inherited from their parent tasks. Fixed.
- Bug 14393: modification check was not performed when the task
changed. This bug was mostly applicable to the old interface,
but in the new interface, all client changes are tracked, and
rollback confirmation is presented when any change is undone, or
when document is closed. Fixed.
- Bug 15856: modification check was not performed when steps
were undone. Same issue as bug 14393 above. Fixed.
- Bug 16096: various Carafe errors were not being percolated
back to the client appropriately. Fixed many, many tiny bugs.
- Added functionality to support customization of core
documentation to allow incorporation of application-specific
documentation and application-specific branding.
- Bug 14650: mat_controller.sh was starting up even if the ports
that its servers needed were taken. Fixed.
- Experiment engine now does the right thing when corpora are
reused and modified in the consuming experiment.
- Bug 16099: bracket redaction spans were mistakenly expanded
over subsequent tokens in some cases. Fixed.
- Bug 16478: zone step in interface was not rolled back
- Bug 16098: drop-down menus were obscured by document windows.
Partially fixed; menu bar can still be partially obscured.
- Feature request 16097: Added token-level scoring, better
alignment for error detail to the scoring engine.
- Bug 16255: the experiment engine should switch to
noninterleaved mode when it detects that some models have
already been built. Fixed.
- Bug 16256: documentation for training_increment in experiment
XML documentation was incomplete. Fixed.
- New user interface, based on the Yahoo! UI toolkit, featuring
a desktop-style interaction with a top menubar and multiple
moveable, resizeable document panes.
- Bug 15870: annotation popups in Firefox 3 were popping up in
the lower left corner. Fixed in the new UI.
- Expanded UI log facility to record seconds since log start,
more details about UI annotation popup interaction.
- Feature request 15803: Initial support for keyboard
accelerators in UI annotation popup.
- Bug 15826: the UI was mistakenly recording some steps that it
shouldn't be recording. Fixed.
- Bug 15810: steps in the UI were not changing when the user
changed the workflow. Fixed.
- Bug 15809: hand annotation was available before zoning was
applied. In the current system, this could result in
undigestible data. Fixed.
- Bug 15808: the "Save raw" button for raw documents made no
sense, and was misleading. Now disabled.
- Bug 15806: tag names were not alphabetized in the annotation
- Bug 15807: some annotation names should be white foreground
text. Easily supported through task specification; tasks
- Bug 15814: annotations weren't visible when reloading saved
rich documents. This bug revealed a number of issues with
handling the global order of steps across workflows, which have
hopefully all been addressed.
- Bug 14394: steps update wasn't working when workflow changed.
- Bug 15815: Unix installation path cannot contain spaces.
Temporarily fixed by aborting installation, with warning, if
spaces are in the path.