# Scoring Engine

## Description

The scoring engine compares two tagged files, or two directories of tagged files. Typically, one input is the hypothesis (an automatically tagged file) and the other is the reference (a gold-standard tagged file). But this tool can be used to compare any two inputs.

There are three spreadsheets which can be produced: tag-level scores, token-level scores, and details. By default, only the tag-level scores are produced.

### Tag-level scores

The tag-level score table has the following columns:

 tag The label which is being scored in this row. The final row will be a cumulative score, with label "". If the --task option is specified (see below), the task.xml file may specify an "alias" for a tag plus some attribute subset (e.g., for named entity, an ENAMEX tag with attribute "type" = "PERSON", with an alias of "PERSON"). test docs The number of test (hypothesis) documents. This value will be the same for all rows. test toks The number of tokens in the test documents. This value will be the same for all rows. match The number of span annotations for this tag which occur with the same label and same span extent in the hypothesis document and its corresponding reference document. refclash The number of span annotations which bear this tag in the reference document which overlap with a tag in the corresponding hypothesis document, but does not match the tag, or the span extent, or both. Note that a count in this column may be mirrored by a corresponding count from the point of view of the hypothesis document in the hypclash column. missing The number of span annotations which bear this tag in the reference document but do not overlap with any tagged span in the corresponding hypothesis document. refonly refclash + missing reftotal refonly + match hypclash The number of span annotations which bear this tag in the hypothesis document which overlap with a tag in the corresponding reference document, but does not match the tag, or the span extent, or both. Note that a count in this column may be mirrored by a corresponding count from the point of view of the reference document in the refclash column. spurious The number of span annotations which bear this tag in the hypothesis document but do not overlap with any tagged span in the corresponding reference document. hyponly hypclash + spurious hyptotal hyponly + match precision match / hyptotal recall match / reftotal fmeasure 2 * ((precision * recall) / (precision + recall))

The user can also request confidence information. To compute confidence information, the scorer produces 1000 alternative score sets. Each score set is created by making M random selections of file scores from the core set of M file scores. The scorer then computes the overall metrics for each alternative score set, and computes the mean and variance over the 1000 instances of each of the precision, recall, and fmeasure metrics. This "sampling with replacement" yields a more stable mean and variance. This procedure adds three columns (mean, variance and standard deviation) to the spreadsheet for each of the metrics; these columns appear immediately to the right of the column for the metric.

### Token-level scores

The token-level score table has the same columns as the tag-level table, with some reinterpretations and additions. For all the columns in the tag-level score table which count span annotations, the corresponding columns in the token-level score table counts tokens in those annotations. Note that what this means for refclash and hypclash is that these can only reflect tag clashes, never extent clashes, because the tokens and their extents in the pair of documents are identical. The additional columns are:

 tag_sensitive_accuracy (test toks - refclash - missing - spurious)/test toks (essentially, the fraction of tokens in the reference which were tagged correctly, including those which were not tagged at all) tag_sensitive_error_rate 1 - tag_sensitive_accuracy tag_blind_accuracy (test toks - missing - spurious)/test toks (essentially, the fraction of tokens in the reference which were properly assigned a tag - any tag) tag_blind_error_rate 1 - tag_blind_accuracy

The user can also request confidence information. The confidence information is computed in the same way as it is for tag-level scores. Confidence information is reported for all four of these additional columns.

### Details

The detail spreadsheet is intended to provide a span-by-span assessment of the scoring inputs.

 file the name of the hypothesis from which the entry is drawn type one of missing, spurious, spanclash, tagclash, bothclash, match (the meaning of these values should be clear from the preceding discussion) reflabel the label on the span in the reference document refstart the start index, in characters, of the span in the reference document refend the end index, in characters, of the span in the reference document hyplabel the label on the span in the hypothesis document hypstart the start index, in characters, of the span in the hypothesis document hypend the end index, in characters, of the span in the hypothesis document refcontent the text between the start and end indices in the reference document hypcontent the text between the start and end indices in the hypothesis document

## Usage

### Example 2

Let's say that instead of printing a table to standard output, you want to produce CSV output with embedded formulas, and you want all three spreadsheets.

Unix:% $MAT_PKG_HOME/bin/MATScore --file /path/to/hyp --ref_file /path/to/ref \--csv_output_dir$PWD --details --by_tokenWindows native:> %MAT_PKG_HOME%\bin\MATScore.cmd --file c:\path\to\hyp --ref_file c:\path\to\ref \--csv_output_dir %CD% --details --by_token

This invocation will not produce any table on standard output, but will leave three files in the current directory: bytag.csv, bytoken.csv, and details.csv.

### Example 3

Let's say you have two directories full of files. /path/to/hyp contains files of the form file<n>.txt.json, and /path/to/ref contains files of the form file<n>.json. You want to compare the corresponding files to each other, and you want tag and token scoring, but not details, and you intend to view the spreadsheet in OpenOffice.

Unix:% $MAT_PKG_HOME/bin/MATScore --dir /path/to/hyp --ref_dir /path/to/ref \--ref_fsuff_off '.txt.json' --ref_fsuff_on '.json' \--csv_output_dir$PWD --oo_separator --by_tokenWindows native:> %MAT_PKG_HOME%\bin\MATScore.cmd --dir c:\path\to\hyp --ref_dir c:\path\to\ref \--ref_fsuff_off ".txt.json" --ref_fsuff_on ".json" \--csv_output_dir %CD% --oo_separator --by_token

For each file in /path/to/hyp, this invocation will prepare a candidate filename to look for in /path/to/ref by removing the .txt.json suffix and adding the .json suffix. The current directory will contain bytag.csv and bytoken.csv.

### Example 4

Let's say that you're in the same situations as example 3, but you want confidence information included in the output spreadsheets:

Unix:% $MAT_PKG_HOME/bin/MATScore --dir /path/to/hyp --ref_dir /path/to/ref \--ref_fsuff_off '.txt.json' --ref_fsuff_on '.json' \--csv_output_dir$PWD --oo_separator --by_token --compute_confidence_dataWindows native:> %MAT_PKG_HOME%\bin\MATScore.cmd --dir c:\path\to\hyp --ref_dir c:\path\to\ref \--ref_fsuff_off ".txt.json" --ref_fsuff_on ".json" \--csv_output_dir %CD% --oo_separator --by_token --compute_confidence_data

### Example 5

Let's say that you're in the same situation as example 3, but your documents contain lots of tags, but you're only interested in scoring the tags listed in the "Named Entity" task. Furthermore, you're going to import the data into a tool other than Excel, so you want the values calculated for you rather than having embedded equations:

Unix:% $MAT_PKG_HOME/bin/MATScore --dir /path/to/hyp --ref_dir /path/to/ref \--ref_fsuff_off '.txt.json' --ref_fsuff_on '.json' \--csv_output_dir$PWD --no_csv_formulas --by_token --task "Named Entity"Windows native:> %MAT_PKG_HOME%\bin\MATScore.cmd --dir c:\path\to\hyp --ref_dir c:\path\to\ref \--ref_fsuff_off ".txt.json" --ref_fsuff_on ".json" \--csv_output_dir %CD% --no_csv_formulas --by_token --task "Named Entity"

### Example 6

Let's say you're in the same situation as example 3, but your reference documents are XML inline documents, and are of the form file<n>.xml. Do this:

Unix:% $MAT_PKG_HOME/bin/MATScore --dir /path/to/hyp --ref_dir /path/to/ref \--ref_fsuff_off '.txt.json' --ref_fsuff_on '.xml' \--csv_output_dir$PWD --oo_separator --by_token --ref_file_type xml-inlineWindows native:> %MAT_PKG_HOME%\bin\MATScore.cmd --dir c:\path\to\hyp --ref_dir c:\path\to\ref \--ref_fsuff_off ".txt.json" --ref_fsuff_on ".xml" \--csv_output_dir %CD% --oo_separator --by_token --ref_file_type xml-inline

Note that --ref_fsuff_on has changed, in addition to adding the --ref_file_type option.