You can work with documents in MAT either in file mode or in workspace mode. In this
section, we describe each mode and the differences between the
two.
In file mode, you work with documents on an individual basis. MAT
doesn't care where they're loaded from, or where they're saved to.
If they're in MAT's rich standoff annotation format, they'll know
what steps have already been applied to them, but other than that,
the user must specify all the other parameters of any file mode
operation:
File mode is provided by MATEngine
on the command line, and via "File -> Open file..." in the Web
UI.
From the point of view of the UI, file mode in the Web server is
stateless. Files are
loaded from the client, and saved to the client, and the Web
server has no access to the file system to load and save the
files.
A workspace is a directory, which contains a set of predefined
directories for storing documents. We call these subdirectories folders. Each folder has a set
of operations that you can perform on documents in that folder;
these operations may create versions of the file in other folders,
or move the file to another folder as a result of the operation.
Unlike file mode, the way you interact with a workspace is almost
entirely defined for you.
Workspace mode is provided by MATWorkspaceEngine on the command
line, and via "File -> Open workspace..." in the Web UI. Unlike
file mode, workspace mode is stateful
from the point of view of the UI. It is the server, rather than
the client, which loads and saves the files. However, we don't
want just anybody to be able to cause the server to perform these
stateful operations, so the MAT web server
implements some security
mechanisms.
Note, however, that the MAT workspace functionality is not an enterprise-secure
implementation, and will never be one. It does not use
SSL; it does not perform any sort of user authentication beyond
the workspace key; it does not provide any security logging or
traceability; and it does not currently implement transactions.
You should assume that anyone who has access to your network can
see your workspace traffic, and overwrite your data.
Workspaces maintain an internal lock to ensure that any
operations which change the state of the workspace are exclusive.
This locking mechanism is quite simple - it relies on the presence
or absence of the "opLockfile" file. If something goes horribly
wrong, it's possible that the workspace may get in a
stranded state, where it fails to remove "opLockfile" at the end
of the operation. If you're getting a notification that the
workspace is in use, and you're sure it's not, you can remove the
file by hand. As an added bonus, the file contents will tell you
what operation was being performed by which user, and what time
the lock was established.
As we said above, workspaces are just directories. The structure
of these directories looks like this:
With this background, let's see how you can use workspaces. Tutorial 6 presents examples of most
of the steps below, and more examples can be found in the
documentation for MATWorkspaceEngine.
First, you create the workspace. The workspace must have an
assigned task, which you specify when you create it. Creating the
workspace creates the directory, the folder subdirectories, a
place to store the models, and some administrative information.
Workspace creation is currently only available on the command
line.
Next, you import documents into the workspace. You'll import
documents into any one of a number of predefined folders:
You import documents as many times as you like, and at any point
while you work with your workspace. For instance, you can import
some documents, hand annotate them, and then build a model, and
then import more raw documents to autotag.
File import is currently only available on the command line.
The vast majority of your time in the workspace will be spent
interacting with your documents. Each folder has predefined
operations which you can perform on documents in the folder.
folder |
operation |
availability |
description |
flag |
value |
---|---|---|---|---|---|
raw,
unprocessed |
autotag |
UI, command line |
Automatically tag documents
with the current model. Deposit the results in the
"autotagged" folder. If no specific basenames are specified,
all eligible documents are autotagged, including those
which have already been autotagged and those in the "rich,
incoming" directory. Already autotagged documents will be
unwound according to the engine settings for the autotag
operation in the task.xml file. Note: this operation does not use the Carafe server, even in the UI. So the startup cost is incurred each time the autotag step is executed. |
||
tagprep |
UI, command line |
Prepare the documents for
hand tagging. Deposit the results in the "in process"
folder. |
|||
rich,
incoming |
autotag |
UI, command line |
Automatically tag documents
with the current model. Deposit the results in the
"autotagged" folder. If no specific basenames are specified,
all eligible documents are autotagged, including those
which have already been autotagged and those in the "raw,
unprocessed" directory. Already autotagged documents will be
unwound according to the engine settings for the autotag
operation in the task.xml file. Note: this operation does not use the Carafe server, even in the UI. So the startup cost is incurred each time the autotag step is executed. |
||
tagprep |
UI, command line |
Prepare the documents for
hand tagging. Deposit the results in the "in process"
folder. |
|||
in
process |
markcompleted |
UI, command line |
Move the documents into the
"completed" folder. In the UI, save the document if hand
tagging has been done. |
||
save |
UI |
Save the current hand
tagging. |
mark_completed |
if present and the value is
"yes", the markcompleted operation will be applied
immediately after the save. |
|
completed |
modelbuild |
command
line |
Create
a model based on the specified files in the folder (all of
them, by default). Optionally, perform the autotag step on
other documents after the model is built. |
do_autotag |
if present and the value is
"yes", the autotag operation will be applied in the "raw,
unprocessed" folder immediately afterward. |
autotag_basenames |
if do_autotag is specified, a
space-separated sequence of basenames which are in "raw,
unprocessed" to autotag, rather than the entire contents of
the "raw, unprocessed" folder. |
||||
autotag_basename |
if do_autotag is specified, a basename which is in "raw, unprocessed" to autotag, rather than the entire contents of the "raw, unprocessed" folder. | ||||
markincomplete |
UI, command line |
Move the documents into the
"in process" folder. |
|||
autotagged |
handcorrect |
UI, command line |
Move the documents into the
"in process" folder. |
On the command line, these operations are applied by default to
all the files in the folder, and optional to a specified subset.
In the UI, on the other hand, these operations are only available
on a file-by-file basis. We haven't yet tackled managing the more
time-consuming folder-level operations in the UI.
Because interacting with the workspace means switching between
longer-duration batch operations (e.g., model building) and
quicker file-level operations, (e.g., hand tagging), the user will
end up moving back and forth between the UI and the terminal. This
is currently unavoidable. Here's what a typical interaction might
look like.
(Alternatively, steps 3 and 6 can happen, per document, in the
UI.) Steps 5 and 6 can be repeated with newly imported documents,
so you can iteratively expand the model and your supply of
hand-corrected documents.
File mode requires more of the user at each step, but is also
significantly more flexible than workspace mode. Workspace mode,
on the other hand, provides considerably more structured support
and bookkeeping for the user, at the sacrifice of flexibility. For
instance:
It's important to stress that file mode and workspace mode cannot be freely mixed. You
can invoke the file mode engine on a file in a workspace, but
you'll likely make a mess of things if you save it back to the
workspace. Similarly, you can't invoke the workspace engine on any
file that hasn't been imported into it. You can, for instance,
process some documents in file mode, and then import them into the
workspace, but you can make a mess of things by importing them
into the wrong folder in the workspace. Ideally, you'll load raw
documents into the "raw, unprocessed" folder in the workspace and
do all your operations on those documents starting from there.