Using workspaces

Workspaces provide a guided, structured way of managing and processing your documents. Make sure that this is what you want. Workspace mode is provided by MATWorkspaceEngine on the command line, and via "File -> Open workspace..." in the Web UI. You can find a summary of the highlights about using workspaces here; this document provides the details.

The structure of the workspace directory

Workspaces are just directories. The structure of these directories looks like this:

Workspace users

One of the innovations in workspaces in MAT 2.0 is its close connection with segments and annotation progress. All documents in workspaces are now closely tracked for their annotation state (including, ultimately, annotation of subsections of documents), which includes tracking who modified the annotations in the various document portions. As a result, every document edit in workspaces is linked to a workspace user.

The inventory of users of a workspace is entirely up to its creators and managers. Every workspace must be created with at least one initial user. The names of these users are not bound to any external resource; they're not required to be the same as login names, for instance. They're merely there to provide a way of attributing document changes. There's no account management or passwords; you can "claim" to be any registered user you want to claim to be when you edit a workspace. We're assuming that you're using MAT workspaces in a cooperative environment in which this sort of inappropriate behavior won't arise.

Although there's no requirement that registered user names correspond to external resources like login names, you may find it easiest to use login names anyway, so that your workspace annotators don't have to remember a different name when they open a workspace.

Workspace users are assigned roles, which indicate what they can do within the workspace. By default, all users can annotate documents in the core folder. If workspace reconciliation is configured, workspace users are also assigned reconciliation roles. By default, each user has all reconciliation roles except "human_decision" (the ability to make an enforceable judgment about a reconciliation choice).

Workspace operations

The available operations are:

topic
operation
availability
folder
creation
create
command line
(global)
file management
import
command line
(global)
remove
command line
(global)
assign
command line
(global)
open_file
UI, command line debug
(global)
markgold
UI, command line debug
core
unmarkgold
UI, command line debug
core
save
UI, command line debug
core, reconciliation
inspection
list
UI, command line
(global)
workspace_configuration
command line
(global)
dump_database
command line
(global)
logging
enable_logging
command line
(global)
disable_logging
command line
(global)
rerun_log
command line
(global)
users
register_users
command line
(global)
list_users
command line
(global)
add_roles
command line
(global)
remove_roles
command line
(global)
automated tagging

modelbuild
command line
core
autotag
UI, command line
core
experimentation
list_basename_sets
command line
(global)
add_to_basename_set
command line
(global)
remove_from_basename_set
command line
(global)
run_experiment
command line
(global)
reconciliation
(not yet enabled)
configure_reconciliation
command line
(global)
submit_to_reconciliation
command line
core
remove_from_reconciliation
command line
reconciliation
administration
force_unlock
command line
core

There are also internal operations which are not publicly visible (release_lock, update_ui_log).

We'll review each of these operations in turn.

Creation

create

The create operation creates a workspace. It requires a task and an initial user.

This operation is available only on the command line.

File management

import

The import operation ingests documents into the workspace. The documents are all converted to MAT JSON format, and are prepared for annotation. You can optionally assign documents to users.

This operation is only available on the command line.

Historically, the import operation could target multiple folders, but in MAT 2.0, only the core folder is eligible for import.

Configuring the import operation in task.xml

In task.xml, you can specify the default process by which documents are prepared for annotation when they're imported. Here's an example:

  <workspace>
...
<operation name="import">
<settings workflow="Demo" steps="zone,tokenize"/>
</operation>
...
</workspace>

As described here, these settings can be overridden using the --workflows and --steps options described in MATWorkspaceEngine.

remove

The remove operation removes all copies of the basename from the workspace. Warning: this operation will remove all traces of the basenames from the workspace folders and the database. Do not use it unless you really want them removed.

This operation is only available on the command line.

assign

This operation assigns the specified basenames to the specified users. Each user gets his or her own copy of the document to annotate. If the document's annotations have been already altered by a human, the basename cannot be assigned.

This operation is only available on the command line.

open_file

This operation opens a workspace file and returns its contents. It also locks the workspace file in the workspace database. This lock is typically released when a file is closed in the UI, using the private release_lock operation. If this document is "stranded" - if, for instance, a user forgets to close the document - you can use the force_unlock operation to fix this.

This operation is available in the MAT UI, or on the command line if --debug is provided.

markgold

This operation marks all of the "non-gold" segments in a document "human gold".

This operation is available in the MAT UI, or indirectly on the command line via the import operation, or on the command line if --debug is provided. When used in the UI, it will trigger a save operation first if the document has unsaved changes.

unmarkgold

This operation marks all of the "human gold" or "reconciled" segments in a document "non-gold".

This operation is available in the MAT UI, or on the command line if --debug is provided. When used in the UI, it will trigger a save operation first if the document has unsaved changes.

save

This operation saves the contents of a workspace file.

This operation is available in the MAT UI, or on the command line if --debug is provided.

Logging

MAT provides a rich and extensive logging infrastructure specifically for workspaces. When logging is enabled, MAT workspace operations log every action and data modification, so that the activities in the workspace can be rerun from the point that logging was enabled, exactly as they were originally performed.

Workspace logging is distinct from UI logging. The MAT UI has the capability of capturing all the user gestures, and save these gestures to a CSV file at the user's request. If workspace logging is enabled, the UI turns on this capability specifically for the current workspace, and uploads the log fragments to the MAT server with every save operation in the "core" folder. The format of this log is identical to the format of the UI logger. Unlike general UI logging, this logging cannot be configured or controlled from the UI. Finally, this logging does not interfere with general UI logging; if you choose to enable UI logging, you'll still get all the user gestures, including those that are captured for workspace logging.

enable_logging

This operation enables the logging.  The log will be saved in the _checkpoint subdirectory of the workspace directory.

This operation is available on the command line.

disable_logging

This operation disables logging. If a log is being collected, by default it is moved to the first available _checkpoint_<n> path. However, the user can force the log to be disabled if she chooses. In either case, this ensures that _checkpoint never contains a discontinuous log.

This operation is available on the command line.

rerun_log

This operation allows you to rerun the log. It will use the _checkpoint/_rerun subdirectory of the workspace directory to store the rerun state. You can use this capability to recreate any intermediate state of your workspace, e.g., for experiment analysis.

This operation is available on the command line.

Inspection

list

This operation shows you the contents of the folders in the workspace. The listing shows you the status of the document, as well as who it's assigned to.

It is available both on the command line, and in the MAT UI as part of the workspace interface.

workspace_configuration

This operation describes a number of properties of the workspace. Most of these properties are capabilities of MAT which are currently in development, but not yet publicly released. We've included the infrastructure for supporting these emerging capabilities in order to ensure that users of MAT will not have to update their workspaces when these capabilities are released. The properties reported are:

dump_database

This operation describes all the tables in the workspace database. It is a useful debugging tool for the technically inclined.

This operation is only available on the command line.

Users

Workspace users have roles which say what they can do in the workspace, but unless workspace reconciliation is enabled, users have only one available role, "core_annotation", which means the user is eligible to perform annotation. If reconciliation is enabled, each reconciliation phase is also recognized as a role. The role "all" is a shorthand for all available roles.

You can explicitly specify user roles which you register the users, or afterward. You may want to vary the available roles for annotators because, e.g., you may want only some of them to participate in particular reconciliation phases; say, you might want only some annotators to be able to perform the decisive human_decision reconciliation step.

register_users

This operation allows you to add registered users to your workspace. Perhaps you want to be able to track the contributions of multiple annotators, or you might want to actually assign documents to multiple annotators and do multiple annotation. You may also want to assign roles to your users. You cannot unregister users once they're registered, although you can remove all their roles.

This operation is only available on the command line.

list_users

This operation lists the users in a workspace. It is also available as part of the workspace_configuration operation.

This operation is only available on the command line.

add_roles

The add_roles operation adds roles to existing users.

This operation is only available on the command line.

remove_roles

The remove_roles operation removes roles from existing users.

This operation is only available on the command line.

Automated tagging

modelbuild

This operation builds a model which can be used to autotag other documents. Every document segment in the workspace which has been touched by a human annotator is used to build this model. If there are multiple copies of a document because the document is multiply assigned, all copies will be used (so that document will be overrepresented in the model, and all conflicting annotations will be used as well). You can optionally ask the workspace to autotag documents after the model is built.

Note: the workspace model is completely distinct from the default task model.

This operation is only available on the command line.

Configuring the modelbuild operation in task.xml

If you want to customize your modelbuild operation, e.g., restrict it to just the gold segments, you can do so in task.xml. You can use any setting that's available to the training engine.

  <workspace>
...
<operation name="modelbuild">
<settings partial_training_on_gold_only="yes"/>
</operation>
...
</workspace>

autotag

This operation automatically tags documents using the current workspace model. You can specify individual basenames to tag, or tag all documents. The tagging engine will only tag those document segments which have not yet been touched by a human annotator. Existing (machine-generated) annotations in those segments will be discarded and new ones added.

Note: this operation does not use the Carafe tagging server, even in the UI. So the startup cost of the tagging engine is incurred each time the autotag operation is executed. This operation also does not use the default task model, ever; it only uses models constructed using the modelbuild operation.

This operation is available in the MAT UI (for individual documents) and on the command line. When used in the UI, it will trigger a save operation first if the document has unsaved changes.

Experimentation

We can establish basename sets which we can reference when we run experiments.

list_basename_sets

This operation lists the basename sets and their contents. This operation is only available on the command line.

add_to_basename_set

This operation adds basenames to a given basename set (and implicitly creates the set if necessary). This operation is only available on the command line.

remove_from_basename_set

This operation removes basenames from a given basename set (and implicitly removes the set if necessary). This operation is only available on the command line.

run_experiment

This operation allows you to run an experiment based on this workspace, either using an experiment file or by specifying the properties of the test set in terms of properties of the workspace basenames.

Reconciliation

Each MAT workspace has the ability to support reconciliation, which is the process by which the consistency of annotations is checked, and conflicts are possibly resolved. This process is not yet available due to UI limitations, but will be in our next release. As part of this process, you'll be able to perform cross-validation of the input documents, to help identify inconsistencies in human annotation.

You can submit any document to reconciliation at any point in the annotation process (as long as it isn't being annotated by someone). You must configure reconciliation before you submit any documents.

configure_reconciliation

Use this operation to establish the active reconciliation phases for your workspace.

This operation is only available on the command line.

submit_to_reconciliation

Use this operation to submit documents for reconciliation. The phases that are assigned to the documents will be the phases provided in the most recent configure_reconciliation operation.

This operation is only available on the command line.

remove_from_reconciliation

If, for some reason, a document fails to exit reconciliation naturally (if some of the users fail to complete their reconciliation steps, for example), you can use this operation to remove the document forcibly from reconciliation. You have the option of discarding the reconciliation decisions that were made.

This operation is only available on the command line.

Administration

force_unlock

This operation forces a basename in the named folder to be unlocked. Warning: be very certain that you apply the force_unlock operation only to basenames whose locks have been stranded. If you unlock a basename which is being annotated, the annotator will not be able to save her changes.

This operation is only available on the command line.

Workspace security

Unlike file mode, workspace mode is stateful from the point of view of the UI. It is the server, rather than the client, which loads and saves the files. However, we don't want just anybody to be able to cause the server to perform these stateful operations, so the MAT web server implements some security mechanisms.

Note, however, that the MAT workspace functionality is not an enterprise-secure implementation, and will never be one. It does not use SSL; it does not perform any sort of user authentication beyond the workspace key; it does not provide any security logging or traceability; and it does not currently implement transactions. You should assume that anyone who has access to your network can see your workspace traffic, and overwrite your data.

Note that workspace users play no role in workspace security.

Workspace locking

Workspaces maintain an internal lock to ensure that any operations which change the state of the workspace are exclusive. This locking mechanism is quite simple - it relies on the presence or absence of the "opLockfile" file. If something goes horribly wrong,  it's possible that the workspace may get in a stranded state, where it fails to remove "opLockfile" at the end of the operation. If you're getting a notification that the workspace is in use, and you're sure it's not, you can remove the file by hand. As an added bonus, the file contents will tell you what operation was being performed by which user, and what time the lock was established.

Troubleshooting

Failed import

You may realize, once you've completed an import operation, that you didn't import the basenames the way you'd wanted; perhaps you'd intended to strip a suffix, or you assigned them to the wrong workspace user. You can use the remove operation to remove the basenames from the workspace in preparation for re-importing. Warning: this operation will remove all traces of the basenames from the workspace folders and the database. Do not use it unless you really want them removed.

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove basename1...

If you're not sure what basenames are available, the --help option will list them:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove --help

More on the remove operation here.

Locked files

The workspaces do not permit documents to be edited by more than one annotator at a time. The workspaces achieve this exclusivity through the use of file locks, which are recorded in the workspace database. When an annotator opens a document for annotation, the annotation UI is given a lock ID which it can use to release the document when the editing session is over. In some circumstances, unfortunately, the document is not unlocked; for instance, if the UI encounters an unexpected error and crashes before unlocking the document. You can use the force_unlock operation to clear this lock from the database.

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> force_unlock --user user1 core basename1

If you just want to unlock everything, don't specify any basenames. If you want to know what's locked, use the dump_databsae operation:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> dump_database

This will show you the content of the workspace database tables.

Warning: be very certain that you apply the force_unlock operation only to basenames whose locks have been stranded. If you unlock a basename which is being annotated, the annotator will not be able to save her changes.

More on force_unlock here.

Error "workspace is currently unavailable (processing another request)"

If you get this error message, and you're absolutely certain that no one else is working on the workspace, something horrible has happened, and a previous operation has failed in such a way to fail to remove the "opLockfile" file. More on how to deal with this here.

Advanced topic: workspace reconciliation

Each MAT workspace has ability to support reconciliation, which is the process by which the consistency of annotations is checked, and conflicts are possibly resolved. This facility is not yet available due to UI limitations, but it will be in our next release. This section describes the behavior that we plan to make available.

You can submit any document to reconciliation at any point in the annotation process (as long as it isn't being annotated by someone). You'll use the submit_to_reconciliation operation to submit the documents. This operation will lock the basenames in the core folder (so no one can open those documents for annotation) and prepare a document, called a reconciliation document, which contains all the annotations in all the documents that correspond to that basename, sorted into "votes" which indicate, for each segment of the document in conflict, which annotator produced which pattern of annotations. The workspace annotators will then follow the reconciliation steps which are configured at the time that the documents are submitted for reconciliation.

Reconciliation steps

There are three possible reconciliation steps currently supported. You can set up your workspace to use any or all of these reconciliation steps, and you can change what steps are enabled for future submissions at any given time. The available steps, in order, are:

The system will advance documents through these steps automatically if possible (so, for instance, if an annotator makes a choice during the crossvalidation_challenge step, and no annotator adds any new annotation patterns, the system assumes that the annotator's vote will not change). Once all segments have been marked as reconciled, or the document has passed through all assigned annotators and steps, it exits reconciliation, and the agreed-upon changes are folded back into the documents in the core annotation folder which were submitted to reconciliation. So if the same document is assigned to two annotators, and it passes through reconciliation and the conflicts are resolved, those assigned documents will be altered to reflect the reconciliations.

SEGMENTs in reconciliation

The use of the SEGMENTs in reconciliation differs slightly from its use in core annotation, especially with respect to the value of its "status" attribute. The three significant "status" attribute values in reconciliation are:

In addition, there's additional administrative information on the segment that records the state of the reconciliation.

Stranded reconciliation documents

If you submit a document to reconciliation, it may remain in reconciliation because, e.g., an annotator who was registered with one of the relevant roles is no longer working on the project. Or you may have submitted it to reconciliation in error. You can  use the remove_from_reconciliation operation to remove the document.

Keep in mind that the document may already be partially reconciled.  If you want to remove the document and preserve the decisions already made, you can use the operation as follows:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation reconciliation basename1

This will migrate the agreed-upon document segments back into the documents which were used to create the reconciliation document. If you do not want to preserve those decisions, and simply want to stop the document from being reconciled, do this instead:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine <dir> remove_from_reconciliation --dont_reintegrate reconciliation basename1

More on this operation here.

Advanced topic: the workspace database

The workspace database is an SQLite database which tracks the status of documents, users, and the workspace itself. The schema can be found in MAT_PKG_HOME/lib/mat/python/MAT/ws_db.sql. The tables are:

There are other tables and columns which relate to workspace features we have yet to enable. We will document those features of the database as the corresponding workspace features are enabled.