The annotation reporter produces concordance-style reports on the
content annotations in a given set of documents, either in CSV or text
form. The CSV file contains the following columns:
file |
the name of the document from
which the entry is drawn |
start |
the start index, in characters,
of the span in the document |
end |
the end index, in characters, of
the span in the document |
left context |
the context to the left of the
start index |
text |
the text in between the start
and end indices |
label |
the label on the span in the
document. If the annotation contains attributes and values, these will
be represented in the label |
right context |
the context to the right of the
end index |
It's also possible to omit the left and right contexts, if you
prefer. The text file contains the same columns, except that file,
start, and end are collapsed into a single location column. It's also
possible to interpolate document-level statistics such as file length
and number of annotations per label into these reports.
Because the CSV files contain language data, please consult this special note on how to view them.
Unix:
% $MAT_PKG_HOME/bin/MATReport
Windows native:
> %MAT_PKG_HOME%\bin\MATReport.cmd
Usage: MATScore [options]
--task <task> |
Name of the task to use.
Obligatory if --content_annotations is not used and more than one task
is registered |
--content_annotations
ann,ann,ann... |
Optional. If no task is
specified, the reporter will try to use the metadata in the document to
determine which annotations are content annotations. If this metadata
is absent (e.g., if the 'metadata' slot in a mat-json document is
unpopulated), the scorer requires additional, external information. Use
this flag to provide a commma-separated sequence of annotation labels
which should be treated as content annotations. Ignored if --task is
present. |
--input_files <file> |
A glob-style pattern describing full pathnames to be reported on. May be specified with --input_dir. Can be repeated. |
--input_dir <dir> |
A directory, all of whose files
will be reported on. Can be repeated. May be specified with
--input_files. |
--file_type <t> |
The file type of the
document(s). One of the readers.
Default
is
mat-json. |
--encoding <e> |
The encoding of the input. The
default is the appropriate default for the file type. |
--output_dir <dir> |
The output directory for the
reports. Will be created if it doesn't exist. Required. |
--csv |
Generate a CSV file in the output directory, with concordance-style data: file, location, content, left and right context, annotation label. At least one of this option and --txt must be provided. The CSV file will be in UTF-8 encoding. See this special note on viewing CSV files containing natural language text. |
--txt |
Generate a text file in the
output directory, with concordance-style data, sorted first by
annotation label and then by content. At least one of this option and
--csv must be provided. The output file will be in UTF-8 encoding. |
--concordance_window <i> |
Use the specified value as the
window size on each side of the concordance. Default is 32. |
--omit_concordance_context |
Omit the left and right
concordance context from the output. |
--file_csv |
Generate a separate CSV file
consisting of file-level statistics such as file size in characters and
number of annotations of each type. |
--interpolate_file_info |
Instead of a separate CSV file
for the file-level statistics, interpolate them into the concordance. |
The readers referenced in the --file_type
option may introduce additional options, which
are described here. These
additional options must follow the --file_type
option.
Let's say you have a file, /path/to/file, whose annotations
you want to view in a spreadsheet. You want the results to be written
to /path/to/output.
Unix:
% $MAT_PKG_HOME/bin/MATReport --input_files /path/to/file --csv --output_dir /path/to/output
Windows native:
> %MAT_PKG_HOME%\bin\MATReport.cmd --input_files c:\path\to\file --csv --output_dir c:\path\to\output
Let's say that you only want textual output, and you don't want the
concordance columns:
Unix:
% $MAT_PKG_HOME/bin/MATReport --input_files /path/to/file --txt \
--output_dir /path/to/output --omit_concordance_context
Windows native:
> %MAT_PKG_HOME%\bin\MATReport.cmd --input_files c:\path\to\file --txt \
--output_dir c:\path\to\output --omit_concordance_context
Let's say you have a directory full of files. /path/to/files
contains files of the form file<n>.json. You want to view them
both in CSV and in text, and you want a smaller concordance window of
10 characters.
Unix:
% $MAT_PKG_HOME/bin/MATReport --input_files '/path/to/files/*.json' \
--csv --txt --output_dir /path/to/output --concordance_window 10
Windows native:
> %MAT_PKG_HOME%\bin\MATReport.cmd --input_files 'c:\path\to\files\*.json' \
-csv --txt --output_dir c:\path\to\output --concordance_window 10
For each file in /path/to/hyp, this invocation will prepare a
candidate filename to look for in /path/to/ref by removing the
.txt.json suffix and adding the .json suffix. The current directory
will contain bytag.csv and bytoken.csv.