To get our feet wet with MAT, let's load and hand tag a document.
We're
going to use a task that comes with MAT, which is a simple task for
named entity tagging (identifying people, organizations and locations).
Make sure you're familiar with the "Conventions" section in your
platform-specific instructions in the "Getting Started" section of the
documentation.
We're
going to do this tutorial in file
mode.
While the named entity task is included in the distribution, it is
not installed yet. The named entity task implementation is in the
sample/ne subdirectory of MAT_PKG_HOME. Install it as follows:
Unix:
% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs install $PWD/sample/ne
Windows native:
> cd %MAT_PKG_HOME%
> bin\MATManagePluginDirs.cmd install %CD%\sample\ne
(If you received this distribution as a tarball, you won't have to
do this with any tasks you find in src/tasks inside the tarball; these
will have been installed as part of the overall installation procedure.)
See the documentation on starting the Web
server and starting the UI.
You're now ready to load a document.
You should now see a window which contains the
document:
At the top, you'll see a menu where you can change the workflow. You'll see immediately below a status line, which contains each of the steps in the workflow; steps which are finished will be grayed out, and none should be grayed out at the moment. To the right of the status line is a forward button (move one step forward) and a backward button, and a reload button. Below left, you see a document pane containing the document, with a tag legend to the right, and at the bottom, a tagging status line which should say "Hand annotation unavailable".
The document window has two icons at the right end of its top bar. The "-" will hide the document, and the "x" will close it. The Show/Hide menu at the top of the UI provides an alternate way of hiding the document, and a way of showing it once it's hidden. Try the various ways to hide and show the document. If you press the "x" by mistake, just follow the instructions in this step above to load the document again.
You're now ready to prepare the document for hand tagging. In order
for the document to be hand taggable, it must be tokenized; i.e., the
basic word elements must be identified. In addition, the regions of the
document which might contain interesting elements must be identified
(this is called "zoning").
Press the forward button. The zone step should be grayed out. You
shouldn't see any change in the document pane, because this particular
task treats the entire document as potentially interesting; if there
were uninteresting areas, they would be grayed out in the document text:
Press the forward button again. The tokenize step should be
grayed out,
and all the words in the document should be surrounded by faint boxes.
These outlines show you where the system believes the word boundaries
are, which will be relevant in a moment:
At the bottom, it should now
say "Hand annotation available (swipe or left-click)".
You're now in the "hand tag" step. Note that this step is special,
in that the user performs it, not the MAT engine. The "hand tag" step
is mostly a placeholder for your annotation activity. Behind the
scenes, the document is marked as being tagged as soon as you insert
the first tag, but the UI doesn't advance you past the "hand tag" step
while the document is still open.
There are two ways to select text to tag. You can swipe using the
mouse (click left, hold,
and move), or click left on an individual word. The system will expand
the selection to the nearest word boundaries, and pop up a tagging
menu.
You can select the appropriate tag with the mouse, or use the keyboard
accelerators (in parentheses in the menu) or navigate the menu using
the
up and down mouse buttons and select with <return>.
If
you need to
remove or change a tag, you can do one of two things. You can swipe
some of the tokens in the tag to select just that portion of the tag,
or you can left-click on one of the tokens, and the entire tag will be
selected. Any tags your selection overlaps will be removed; if you
replace the tag with another one, only the selected region will be
reannotated.
Press the "Save" button, and select "mat-json". You should be
prompted with a file
save dialog. Put this file somewhere you can find it again; we'll come
back to it in a bit. Give it a name like "annotated_doc.json".
What you're doing is saving your document, along with its annotations. MAT uses standoff annotations, which record an annotation by recording offsets into the document, rather than in-line annotations, where the annotations would be inserted into the document text directly (e.g., XML). MAT's standoff annotation format is our own format, built on top of the Javascript Object Notation (JSON).
Finally, you'll close and reload this document, so you can see how
loading an annotated document differs from loading a raw document.
We'll also see how to start and stop the MAT UI logger.
To
close the document, press the "x" in the upper-right corner of the
document pane.
Now, let's start the logger.
Now, let's reload the document. All our actions will be logged.
A window should appear which shows your annotated document, complete
with its previous state, including the automatic steps you performed
and the annotations you added.
In this open window, add and remove some annotations, then select
"Logging -> Stop". The browser will download a CSV file which
contains the contents of the log. Open the file in your favorite
spreadsheet application to see how your actions were logged, and see
the logger documentation for a description
of the logger output.
Shut down your Web server. You can find instructions on how to do
that in the documentation on starting the Web
server.
If you're not planning on doing any other tutorials, and you don't
want the "Named Entity" task hanging around, remove it as follows:
Unix:
% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs remove $PWD/sample/ne
Windows native:
> cd %MAT_PKG_HOME%%
> bin\MATManagePluginDirs.cmd remove %CD%\sample\ne
This concludes Tutorial 1.