Workspace Engine

Description

The workspace engine manages workspaces. Once you create a workspace, you can perform toplevel operations on it, such as importing a document into the workspace or listing the contents, or perform an operation on one of the folders in the workspace. There are core options, plus options which are specific to each activity.

Usage

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd

Usage: MATWorkspaceEngine [options] <dir> create ...
MATWorkspaceEngine [options] <dir> import ...
MATWorkspaceEngine [options] <dir> list ...
MATWorkspaceEngine [options] <dir> remove ...
MATWorkspaceEngine [options] <dir> operate ...
MATWorkspaceEngine [core options] <dir> tagprep ...
...

Provide the directory and operation followed by --help for more detailed help.

All workspace operations have fairly consistent syntax; first the core options, then the workspace directory, then the operation name, then any operation options, and then the operation arguments. If the operation is a folder operation, its first operation argument is the folder name.

The "operate" operation is special; it is an umbrella invocation tool for all folder operations, and its second operation argument, after the folder name, is the actual folder operation. This operation is a legacy operation which is no longer needed, since folder operations can now be called directly, but it has been retained for backward compatibility.

Core options

--other_app_dir <dir>
If present, a directory to look in to find a MAT application specification. This directory must contain a task.xml file which describes the application. This is only necessary if 'MATManagePluginDirs install' has not been called on the application directory.
--help
Prints the core help message and exits
--debug
Enable debug output.
--subprocess_debug <i>
Set the subprocess debug level to the value provided, overriding the global setting. 0 disables, 2 shows all subprocess activity.
--subprocess_statistics
Enable subprocess statistics (memory/time), if the capability is available and it isn't globally enabled.

Creation

Usage: MATWorkspaceEngine [options] <dir> create [create_options]

Create options

--task <task>
The name of the task to be associated with this workspace. Required if more than one task is available. The tasks are the same tasks as those available to MATEngine.
--max_old_models
Number of previous models to retain after model building. Default is 0.
--help
Prints the creation help message and exits.

Example 1

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace create --task 'Sentence tagging'

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace create --task "Sentence tagging"

Import

Usage: MATWorkspaceEngine [options] <dir> import [import_options] <folder> <file> ...

<folder>: The name of the folder to import documents into.
<file>: The file to import into the folder (can be repeated).

Available folders are:

completed: for rich files whose annotations are correct and complete
in process: for rich files being hand annotated
raw, unprocessed: for raw files which nothing has been done to
autotagged: for rich files which have been autotagged but not reviewed or hand-corrected

Import options

--strip_suffix <suff>
Remove this suffix from the file name when determining the basename for the file in the workspace. By default, the original file basename is used.
--encoding <encoding>
For raw documents, input encoding. Default is ASCII. All imported raw documents will be converted to utf-8.
--file_type <type>
The file type of the document. One of the readers. The default file type is raw for 'raw, unprocessed', mat-json for other folders.
--help
Prints the import help message and exits.

The reader referenced in the --file_type option may introduce additional options, which are described here.

Example 2

Let's say the directory /home/user/myrandomfiles contains the files first.txt.json and second.txt.json, and these are both rich annotated files which you've partially hand-annotated. Then:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import --encoding 'latin1' \
--strip_suffix '.txt.json' "in process" /home/user/myrandomfiles/*

Windows native:

> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --encoding "latin1"
--strip_suffix ".txt.json" "in process" %f

will import these two files, and name them "first" and "second".

Example 3

Let's say the directory /home/user/myrandomfiles contains the files first.xml and second.xml, and these are both rich annotated files which you've partially hand-annotated and saved in XML inline format. Then:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace import --encoding 'latin1' \
--strip_suffix '.xml' --file-type xml-inline "in process" /home/user/myrandomfiles/*

Windows native:

> for %f in ("c:\home\user\myrandomfiles\*") do call %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd \
c:\home\user\myworkspace import --encoding "latin1" -
-strip_suffix ".xml" --file-type xml-inline "in process" %f

will import these two files, and name them "first" and "second".

List

Usage: MATWorkspaceEngine [options] <dir> list ( <folder> ...)

<folder>: (optional) the name of the folder to import documents into.
If no folders are named, all folders will be listed.

Available folders are:

raw, processed: for raw files which have been partially or completely processed
completed: for rich files whose annotations are correct and complete
in process: for rich files being hand annotated
raw, unprocessed: for raw files which nothing has been done to
autotagged: for rich files which have been autotagged but not reviewed or hand-corrected

Your application may make other folders available.

Example 4

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace list "raw, processed"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace list "raw, processed"

Remove

Usage: MATWorkspaceEngine [options] <dir> remove <basename> ...

<basename>...: the basename(s) to be removed from the workspace.

Available basenames are: ...

The basenames in the workspace will be listed.

Example 5

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace remove first

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace remove first

Folder operations

Usage: MATWorkspaceEngine [options] <dir> <operation> [operation_options] <folder> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<basename>: (optional) The basename or basenames to restrict the operation to.

The following folders support this operation: ...
This is the syntax for all folder operations. The possible operations are described in the workspace documentation. Your task may support operations in addition to those listed.

If no basenames are specified, the operation will be applied to all the documents in the specified folder.

Currently, the only operation which has any relevant arguments is modelbuild.

operation
option
description
modelbuild
--do_autotag
If present, apply the autotag operation after the model is constructed.
--config_name <name>
If present, use a model settings configuration other than the default.
--autotag_basename <basename>
If --do_autotag is present, a single basename to autotag. This basename must be in "raw, unprocessed". This option can be repeated. If neither this option nor --autotagged_basenames is present, all files in "raw, unprocessed" will be autotagged.
--autotag_basenames <basenames>
If --do_autotag is present, a space-separated sequence of basenames to autotag. These basenames must be in "raw, unprocessed". This option can be repeated. If neither this option nor --autotagged_basenam is present, all files in "raw, unprocessed" will be autotagged.

Example 6

Let's say you've imported some raw documents into your workspace. Now, let's say you want to prepare one of them for hand tagging:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
tagprep "raw, unprocessed" file1

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
tagprep "raw, unprocessed" file1

This will move the file into the "in process" folder.

Example 7

Let's say that you've done your hand annotation on file1, and you want to mark it as completed:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
markcompleted "in process" file1
Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
markcompleted "in process" file1

Example 8

Let's say you want to build a model using all the documents in the completed folder, and you want to autotag whatever's left in "raw, unprocessed":

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
modelbuild --do_autotag "completed"
Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
modelbuild --do_autotag "completed"

Note that the flags for the operation appear in the command line immediately after the "operate" directive.

Example 9

Let's say you want to build a model using all the documents in the completed folder, but you want to use an alternative model settings configuration defined in your task.xml file:

Unix:

% $MAT_PKG_HOME/bin/MATWorkspaceEngine /home/user/myworkspace \
modelbuild --config_name alt_modelbuild "completed"

Windows native:

> %MAT_PKG_HOME%\bin\MATWorkspaceEngine.cmd c:\home\user\myworkspace \
modelbuild --config_name alt_modelbuild "completed"

Operate

As described above, the "operate" operation is an umbrella invocation operation for folder operations. It is no longer needed, but retained for backward compatibility.

Usage: MATWorkspaceEngine [options] <dir> operate [operation_options] <folder> <operation> [ <basename> ... ]

<folder>: The name of the folder to operate on.
<operation>: The operation to perform.
<basename>: (optional) The basename or basenames to restrict the operation to.

Folder 'completed' supports these operations: modelbuild markincomplete
Folder 'in process' supports these operations: markcompleted
Folder 'raw, unprocessed' supports these operations: autotag tagprep
Folder 'autotagged' supports these operations: handcorrect
Your application may support other operations.