Files and workspaces

You can work with documents in MAT either in file mode or in workspace mode. In this section, we describe each mode and the differences between the two.

File mode

In file mode, you work with documents on an individual basis. MAT doesn't care where they're loaded from, or where they're saved to. If they're in MAT's rich standoff annotation format, they'll know what steps have already been applied to them, but other than that, the user must specify all the other parameters of any file mode operation:

File mode is provided by MATEngine on the command line, and via "File -> Open file..." in the Web UI.

From the point of view of the UI, file mode in the Web server is stateless. Files are loaded from the client, and saved to the client, and the Web server has no access to the file system to load and save the files.

Workspace mode

A workspace is a directory, which contains a set of predefined directories for storing documents. We call these subdirectories folders. Each folder has a set of operations that you can perform on documents in that folder; these operations may create versions of the file in other folders, or move the file to another folder as a result of the operation. Unlike file mode, the way you interact with a workspace is almost entirely defined for you.

Workspace mode is provided by MATWorkspaceEngine on the command line, and via "File -> Open workspace..." in the Web UI. Unlike file mode, workspace mode is stateful from the point of view of the UI. It is the server, rather than the client, which loads and saves the files. However, we don't want just anybody to be able to cause the server to perform these stateful operations, so the MAT web server implements some security mechanisms.

Note, however, that the MAT workspace functionality is not an enterprise-secure implementation, and will never be one. It does not use SSL; it does not perform any sort of user authentication beyond the workspace key; it does not provide any security logging or traceability; and it does not currently implement transactions. You should assume that anyone who has access to your network can see your workspace traffic, and overwrite your data.

Workspace locking

Workspaces maintain an internal lock to ensure that any operations which change the state of the workspace are exclusive. This locking mechanism is quite simple - it relies on the presence or absence of the "opLockfile" file. If something goes horribly wrong,  it's possible that the workspace may get in a stranded state, where it fails to remove "opLockfile" at the end of the operation. If you're getting a notification that the workspace is in use, and you're sure it's not, you can remove the file by hand. As an added bonus, the file contents will tell you what operation was being performed by which user, and what time the lock was established.

The structure of the workspace directory

As we said above, workspaces are just directories. The structure of these directories looks like this:

With this background, let's see how you can use workspaces. Tutorial 6 presents examples of most of the steps below, and more examples can be found in the documentation for MATWorkspaceEngine.

Step 1: create the workspace

First, you create the workspace. The workspace must have an assigned task, which you specify when you create it. Creating the workspace creates the directory, the folder subdirectories, a place to store the models, and some administrative information.

Workspace creation is currently only available on the command line.

Step 2: import documents

Next, you import documents into the workspace. You'll import documents into any one of a number of predefined folders:

There are other predefined folders (e.g., "raw, processed" contains raw versions of documents which have already been processed), but these are the only folders you can import documents into. Your task may also define additional folders.

When a document is imported, it is assigned a unique basename, which is usually the basename of the path of the imported file (i.e., the final path component). All versions of this file in the various workspace folders have the identical basename.

You import documents as many times as you like, and at any point while you work with your workspace. For instance, you can import some documents, hand annotate them, and then build a model, and then import more raw documents to autotag.

File import is currently only available on the command line.

Step 3: perform operations on documents.

The vast majority of your time in the workspace will be spent interacting with your documents. Each folder has predefined operations which you can perform on documents in the folder.

folder
operation
availability
description
flag
value
raw, unprocessed
autotag
UI, command line
Automatically tag documents with the current model. Deposit the results in the "autotagged" folder. If no specific basenames are specified, all eligible documents are autotagged,  including those which have already been autotagged and those in the "rich, incoming" directory. Already autotagged documents will be unwound according to the engine settings for the autotag operation in the task.xml file.

Note: this operation does not use the Carafe server, even in the UI. So the startup cost is incurred each time the autotag step is executed.


tagprep
UI, command line
Prepare the documents for hand tagging. Deposit the results in the "in process" folder.


rich, incoming
autotag
UI, command line
Automatically tag documents with the current model. Deposit the results in the "autotagged" folder. If no specific basenames are specified, all eligible documents are autotagged,  including those which have already been autotagged and those in the "raw, unprocessed" directory. Already autotagged documents will be unwound according to the engine settings for the autotag operation in the task.xml file.

Note: this operation does not use the Carafe server, even in the UI. So the startup cost is incurred each time the autotag step is executed.


tagprep
UI, command line
Prepare the documents for hand tagging. Deposit the results in the "in process" folder.


in process

markcompleted
UI, command line
Move the documents into the "completed" folder. In the UI, save the document if hand tagging has been done.


save
UI
Save the current hand tagging.
mark_completed
if present and the value is "yes", the markcompleted operation will be applied immediately after the save.
completed

modelbuild
command line
Create a model based on the specified files in the folder (all of them, by default). Optionally, perform the autotag step on other documents after the model is built.
do_autotag
if present and the value is "yes", the autotag operation will be applied in the "raw, unprocessed" folder immediately afterward.
autotag_basenames
if do_autotag is specified, a space-separated sequence of basenames which are in "raw, unprocessed" to autotag, rather than the entire contents of the "raw, unprocessed" folder.
autotag_basename
if do_autotag is specified, a basename which is in "raw, unprocessed" to autotag, rather than the entire contents of the "raw, unprocessed" folder.
markincomplete
UI, command line
Move the documents into the "in process" folder.


autotagged
handcorrect
UI, command line
Move the documents into the "in process" folder.


On the command line, these operations are applied by default to all the files in the folder, and optional to a specified subset. In the UI, on the other hand, these operations are only available on a file-by-file basis. We haven't yet tackled managing the more time-consuming folder-level operations in the UI.

A typical interaction

Because interacting with the workspace means switching between longer-duration batch operations (e.g., model building) and quicker file-level operations, (e.g., hand tagging), the user will end up moving back and forth between the UI and the terminal. This is currently unavoidable. Here's what a typical interaction might look like.

  1. Command line: Create a workspace
  2. Command line: Import a batch of documents
  3. Command line: Prepare the documents for hand tagging
  4. UI: Hand annotate some documents
  5. Command line: Build a model and autotag the raw documents you haven't hand annotated
  6. Command line: Make the autotagged documents available for hand correction

(Alternatively, steps 3 and 6 can happen, per document, in the UI.) Steps 5 and 6 can be repeated with newly imported documents, so you can iteratively expand the model and your supply of hand-corrected documents.

Comparing the two modes

File mode requires more of the user at each step, but is also significantly more flexible than workspace mode. Workspace mode, on the other hand, provides considerably more structured support and bookkeeping for the user, at the sacrifice of flexibility. For instance:

It's important to stress that file mode and workspace mode cannot be freely mixed. You can invoke the file mode engine on a file in a workspace, but you'll likely make a mess of things if you save it back to the workspace. Similarly, you can't invoke the workspace engine on any file that hasn't been imported into it. You can, for instance, process some documents in file mode, and then import them into the workspace, but you can make a mess of things by importing them into the wrong folder in the workspace. Ideally, you'll load raw documents into the "raw, unprocessed" folder in the workspace and do all your operations on those documents starting from there.