Tutorial 6: Workspaces

Now that we've covered file mode in the first five tutorials, we're going to address workspace mode. In workspace mode, you don't have nearly as much control over

what your documents are named
how their annotation status is managed
where they live in the file system
where models are stored

On the other hand, you don't need to worry about any of those things, either.

We're going to use the same simple named entity task that comes with MAT, and we're going to assume that your task is installed. This tutorial involves both the UI and the command line. Because this tutorial involves the command line, make sure you're familiar with the "Conventions" section in your platform-specific instructions in the "Getting Started" section of the documentation.

Step 1: Create your workspace

The only way to create a workspace is on the command line. We use MATWorkspaceEngine. The first argument of MATWorkspaceEngine is the path of the affected workspace, and the second argument is the operation. Options and arguments for the chosen operation follow.

Creating a workspace requires a task, so we provide the --task directive:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace create --task 'Named Entity'

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace create --task "Named Entity"

Created workspace for task 'Named Entity' in directory /tmp/ne_workspace.

You now have a workspace in the specified directory. If you're interested in the structure of a workspace, look here.

Step 2: Import files into your workspace

Workspaces organize files by putting them in folders. The three folders we'll be concerned with in this tutorial are:

"raw, unprocessed" - raw files
"in process" - annotated files which are partially hand-tagged
"autotagged" - annotated files which have been automatically tagged
"completed" - annotated files which are believed to be correct

We'll begin by importing a single raw file.

Unix:

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" sample/ne/resources/data/raw/voa2.txt 

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" %CD%/sample\ne\resources\data\raw\voa2.txt

So here we use the "import" operation, which takes two arguments: the folder name ("raw, unprocessed") and the file to import.

We've also used the --strip_suffix directive to modify the name by which the workspace knows the file. We can see the contents of the workspace (and of each folder), with the "list" operation:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list "raw, unprocessed"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list "raw, unprocessed"

raw, unprocessed:
voa2

If you try to import the file again, you'll get an error:

Unix: 

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" sample/ne/resources/data/raw/voa2.txt 

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" sample\ne\resources\data\raw\voa2.txt 

Basename for sample/ne/resources/data/raw/voa2.txt already exists in workspace; not importing

In other words, once you create a particular basename in the workspace using the "import" operation, you can't do it again.

Step 3: Open the workspace in the UI

In this step, we're going to learn about the UI aspects of the workspace.

First, see the documentation on starting the Web server and starting the UI. We'll assume that you're running one of the tabbed terminal applications. In the first pane, you should see something like:

Web server started on port 7801.

Web server command loop. Commands are:

exit       - exit the command loop and stop the Web server
loopexit   - exit the command loop, but leave the Web server running
taggerexit - shut down the tagger service, if it's running
restart    - restart the Web server
ws_key     - show the workspace key
help, ?    - this message

Workspace key is XJ9dGBaCNveYHk9CZzw6wTM5WH8x05y1
Command:

Note the workspace key. This key is randomly generated, and known only to the user who starts the Web server. This key must be provided to the UI when the user opens the workspace. This simple security feature ensures that even though the Web server will be modifying the workspace, it does so if the UI user has proved that s/he has the appropriate access.

In the UI, select File -> Open workspace... . You'll see a popup window.
Copy the workspace key from the Web server output, and paste it into the "Workspace key" field. Press <tab>.
In the "Directory:" field, type "/tmp/ne_workspace". Press <tab>.
Press the "Open" button.

You should see a window that looks like this:

Select "raw, unprocessed" from the folder menu. You should now see this:

Step 4: Open a document

A single left click on the file name in the workspace window should open the file:

Note how this file window differs from the one in file mode:

The workspace is listed in the title bar, instead of the task.
The workflow menu is missing, and the folder is listed instead.
The status fields and forward and backward buttons are missing, and there's an operation menu instead.
There's no reload or save button.

Step 5: Prepare the document for hand annotation

Operations make changes to files and move them around the workspace. For instance, the "Prepare for hand tagging" operation removes a document from the "raw, unprocessed" folder, applies the appropriate engine steps, and saves it in the "in process" folder, at which point the document is ready for hand tagging.

Make sure that the operations menu says "Prepare for hand tagging" and press "Go". Your display should now look like this:

Note that the name of the folder in the file window has changes, and the list of available operations has changed. Note, too, that the workspace pane now shows that the "raw, unprocessed" folder is empty. If you were to switch the folder using the folder menu to "in process", you'd find this document there.

Step 6: Hand annotate

At this point, you can annotate your document as you did in Tutorial 1. If you want to leave the workspace without finishing your annotation, just select the Save operation in the operations menu and press Go; you can always return to the document. Once you're satisfied with your annotations, select "Mark completed" in the operations menu and press Go; your document will be saved and moved to the completed folder.

Step 7: Import more documents

You'd typically annotate several documents in the first round before building a model, but we want to move directly to that step. Since we only have one hand-annotated document at the moment, what we're going to do is import some other documents into the workspace. We're going to import some of the annotated documents that come with the Named Entity task into the completed folder, and we're going to import one of them into the "raw, unprocessed" folder.

Unix:

% cd $MAT_PKG_HOME
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" sample/ne/resources/data/raw/voa1.txt
% bin/MATWorkspaceEngine /tmp/ne_workspace import --strip_suffix ".txt.json" \
"completed" sample/ne/resources/data/json/voa[3-9].txt.json

Windows native:

> cd %MAT_PKG_HOME%
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt" \
"raw, unprocessed" sample\ne\resources\data\raw\voa1.txt
> bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace import --strip_suffix ".txt.json" \
"completed" sample\ne\resources\data\json\voa3.txt.json \
sample\ne\resources\data\json\voa4.txt.json \
sample\ne\resources\data\json\voa5.txt.json \
sample\ne\resources\data\json\voa6.txt.json \
sample\ne\resources\data\json\voa7.txt.json \
sample\ne\resources\data\json\voa8.txt.json \
sample\ne\resources\data\json\voa9.txt.json

Now, let's list the workspace to see what we have:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list

rich, incoming:

raw, processed:
voa6 voa7 voa4 voa5 voa2 voa3 voa8 voa9

completed:
voa6 voa7 voa4 voa5 voa2 voa3 voa8 voa9

in process:

raw, unprocessed:
voa1

autotagged:

You can see that the document you tagged is in "completed", along with the documents you just imported. You can also see that for each annotated document, there's a raw copy of the document in "raw, processed" (you can mostly ignore these). And finally, you can see that there is one document in "raw, unprocessed" waiting to be annotated.

Step 8: Build a model

Now, we build a model. This is a command line operation only. We're going to ask the workspace to autotag afterwards, which should move "voa1" into the "autotagged" folder. Each time we build a model and autotag, any documents that aren't in process or completed are autotagged; documents which have already been autotagged are returned to "raw, unprocessed" first.

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace modelbuild \
--do_autotag "completed"

Windows native:

% %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace modelbuild \
--do_autotag "completed"

Once this is done, we can look at the contents of the workspace again:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /tmp/ne_workspace list

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd %TMP%\ne_workspace list

rich, incoming:

raw, processed:
voa6 voa7 voa4 voa5 voa2 voa3 voa1 voa8 voa9

completed:
voa6 voa7 voa4 voa5 voa2 voa3 voa8 voa9

in process:

raw, unprocessed:

autotagged:
voa1

So you can see that there's no longer anything in "raw, unprocessed", but now there's one document in autotagged.

Step 9: Hand correct

Now, you'll want to hand-correct the autotagged document.

If the Web server has been running while you've performed the last two steps, the UI won't know that the state of the workspace has changed. The safe thing is to close all open workspace documents, and press the "Refresh" button on the workspace folder window. Now, the state of the UI and the state of the workspace will be synchronized.

Select the autotagged folder from the folder menu. You should see "voa1". Open the document. If you want to hand correct it, select the "Hand correct" operation and press Go, and the document will be moved into the "in process" folder; if the document is correct, choose "Mark completed" and press Go, and the document will be moved into the "completed" folder.

Once the document is in the "in process" folder, its status is identical to the document at the end of step 5 above, and at this point, you should be able to produce completed documents either with full hand annotation or corrected automated annotation, and repeat the cycle of model building and automated tagging.

Step 10: Clean up (optional)

In the next tutorial, we'll learn about the experiment engine. If you want to learn how to use the experiment engine with workspaces, don't remove your workspace.

If you're not planning on doing any other tutorials, remove the workspace:

Unix:

% rm -rf /tmp/ne_workspace

Windows native:

> rd /s /q %TMP%\ne_workspace list

If you don't want the "Named Entity" task hanging around, remove it as shown in the final step of Tutorial 1.

This concludes Tutorial 6.