To get our feet wet with MAT, let's load and hand tag a document.
We're going to use a task that comes with MAT, which is a simple
task for named entity tagging (identifying people, organizations
and locations). Make sure you're familiar with the "Conventions"
section in your platform-specific instructions in the "Getting
Started" section of the documentation. We're going to do this
tutorial in file mode.
While the named entity task is included in the distribution, it
is not installed yet. The named entity task implementation is in
the sample/ne subdirectory of MAT_PKG_HOME. Install it as follows:
Unix:
% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs install $PWD/sample/ne
Windows native:
> cd %MAT_PKG_HOME%
> bin\MATManagePluginDirs.cmd install %CD%\sample\ne
(If you received this distribution as a tarball, you won't have
to do this with any tasks you find in src/tasks inside the
tarball; these will have been installed as part of the overall
installation procedure.)
Open another terminal, and start the Web server (see here for more details):
Unix:
% $MAT_PKG_HOME/bin/MATWeb
Windows native:
> %MAT_PKG_HOME%\bin\MATWeb
Then open your Firefox browser and:
You're now ready to load a document.
You should now see a window with a tab which contains the
document:
At the right, you'll see a menu where you can change the
workflow. You'll see immediately below a status line, which
contains each of the steps in the workflow; steps which are
finished will be grayed out, and none should be grayed out at the
moment. Below the status line are a forward button (move one step
forward), a backward button, and a reload button. Below this
section, you'll see a tag legend.
Within the document tab, you'll see a tagging status area at the
top which tells you that hand annotation is unavailable.
The document has two icons at the right end of its tab. The "-" will hide the document, and the "x" will close it. The Tabs menu at the top of the UI provides a way of showing it once it's hidden. Try hiding and showing the document. If you press the "x" by mistake, just follow the instructions in this step above to load the document again.
You're now ready to prepare the document for hand tagging. In
order for the document to be hand taggable, it should usually be
tokenized; i.e., the basic word elements must be identified. In
addition, the regions of the document which might contain
interesting elements must be identified (this is called "zoning").
Press the forward button. The zone step should be grayed out. You
shouldn't see any change in the document tab, because this
particular task treats the entire document as potentially
interesting; if there were uninteresting areas, they would be
grayed out in the document text:
Press the forward button again. The tokenize step should be
grayed out, and all the words in the document should be surrounded
by faint boxes. These outlines show you where the system believes
the word boundaries are, which will be relevant in a moment:
In the tagging status area, it should now say "Hand annotation: available
(swipe or left-click)".
You're now in the "hand tag" step. Note that this step is
special, in that the user performs it, not the MAT engine. The
"hand tag" step is a placeholder for your annotation activity.
There are two ways to select text to tag. You can swipe using the
mouse (click left, hold, and move), or click left on an individual
word. The system will expand the selection to the nearest word
boundaries, and pop up a tagging menu. You can select the
appropriate tag with the mouse, or use the keyboard accelerators
(in parentheses in the menu).
If you need to remove or change a tag, just click on it. You'll
get a popup menu that will allow you to do what you want.
Select "File -> Save..." in the menu bar, and then select
"mat-json". You should be prompted with a file save dialog. Put
this file somewhere you can find it again; we'll come back to it
in a bit. Give it a name like "annotated_doc.json".
What you're doing is saving your document, along with its
annotations. MAT uses standoff
annotations, which record an annotation by recording
offsets into the document, rather than in-line annotations, where the annotations would
be inserted into the document text directly (e.g., XML). MAT's
standoff annotation format is our
own format, built on top of the Javascript Object Notation
(JSON).
Note: if you're having trouble finding the document you saved, please keep in mind that the browser, not the MAT UI, is responsible for saving your file. In particular, if you haven't configured your browser to prompt you for where to save your file, it will be saved to your browser's download directory. To fix this in Firefox, see the documentation on starting the UI.
Finally, you'll close and reload this document, so you can see
how loading an annotated document differs from loading a raw
document. We'll also see how to start and stop the MAT UI logger.
To close the document, press the "x" in the upper-right corner of
the document pane.
Now, let's start the logger.
Now, let's reload the document. All our actions will be logged.
A tab should appear which shows your annotated document, and the
automatic steps you've already performed on the document should be
visible on the right.
In this open window, add and remove some annotations, then press
the logging button again. The browser will download a CSV file
which contains the contents of the log. Open the file in your
favorite spreadsheet application to see how your actions were
logged, and see the logger documentation
for a description of the logger output.
Shut down your Web server by typing "exit" in the window where
you started the Web server. More details here.
If you're not planning on doing any other tutorials, and you
don't want the "Named Entity" task hanging around, remove it as
follows:
Unix:
% cd $MAT_PKG_HOME
% bin/MATManagePluginDirs remove $PWD/sample/ne
Windows native:
> cd %MAT_PKG_HOME%%
> bin\MATManagePluginDirs.cmd remove %CD%\sample\ne
This concludes Tutorial 1.