The MAT transducer supports a
very sophisticated conversion language which supports an enormous
range of operations on annotations: changing labels, demoting and
promoting labels, extracting spans, changing attribute values,
etc. Using this language, you can do an enormous range of things
to annotations without ever writing any code.
The general format of the XML is
<instructions>
<...selector ...>
<...operation.../>
<...selector ...>
...
</...selector...>
</...selector...>
<...operation.../>
...
</instructions>
The selectors or operations are processed in order, and each is
applied to the current state of the annotations. You can have as
many selectors or operations as you want. For instance, let's say
you want to change the name of the PER annotation to PERSON, and
the ORG annotation to ORGANIZATION, and you want to rename the
nomtype attribute on all of them to NOMTYPE:
<instructions>
<labels source="PER">
<map target="PERSON"/>
</labels>
<labels source="ORG">
<map target="ORGANIZATION"/>
</labels>
<labels source_re="ORGANIZATION|PERSON">
<map_attr source="nomtype" target="NOMTYPE"/>
</labels>
</instructions>
There are three selectors: <labels>, which establishes an
annotation scope; <attrs>, which occurs only within the
<label> scope and establishes an attribute scope, and
<values>, which occurs only within the <attr> scope
and establishes an attribute value scope. Within the annotation
scope, the default object being modified is the annotation; within
the attribute scope, the default object being modified is the
attribute label; and within the value scope, the default object
being modified is the attribute value.
Here's an index of all the selectors and their operators:
<labels>
<discard>
<discard_failed>
<make_spanless>
<of_attr>
<with_attrs>
<demote>
<apply>
<discard_if_null>
<touch>
<untouch>
<map>
<map_attr>
<promote_attr>
<discard_attrs>
<split_attr>
<join_attrs>
<set_attr>
<attrs>
<promote>
<discard>
<split>
<discard_annot_if_null>
<discard_annot>
<map>
<values>
<promote>
<discard>
<map>
Within the <instructions> element, there are only two elements: the <discard_untouched> operator and the <labels> selector.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
source_re | a Python regular expression |
no | A regular expression which
matches a full true label. Ignored if source is specified |
source | a string |
no | The name of a true label |
excluding_re | a Python regular expression |
no | A regular expression which
matches a full true label. Ignored if "excluding" is
specified |
excluding | a string |
no | The name of a true label |
This selector identifies particular annotations for further
processing, and establishes an annotation scope. The annotation's
true label must match source or source_re (if specified) and not
match excluding or excluding_re (if specified). With no attributes
specified, this selector matches all annotations.
This operator deletes all annotations which have not been "touched" (i.e., modified by another operator).
Within the annotation scope, there is one selector,
<attrs>; two operators which further restrict the
annotations processed, <with_attrs> and <of_attr>; and
numerous operations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
source_re | a Python regular expression |
no | A regular expression which
matches the full attribute name. Ignored if source is
specified |
source | a string |
no | The name of an attribute |
excluding_re | a Python regular expression | no | A regular expression which matches the full attribute name. Ignored if source is specified |
excluding | a string |
no | The name of an attribute |
This selector identifies particular attributes for further
processing, and establishes an attribute scope. The attribute name
must match source or source_re (if specified) and not match
excluding or excluding_re (if specified). With no attributes
specified, this selector matches all attributes of the annotation.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
attr_re | a Python regular expression |
no | A regular expression which
matches the full attribute name. Ignored if attr is
specified. |
attr | a string |
no | The name of an attribute |
label_re | a Python regular expression |
no | A regular expression which
matches a true label name. Ignored if label is specified. |
label | a string |
no | A true label |
This operator further restricts the annotations specified in
<labels>. An annotation which satisfies this restriction
must be the value of the specified attr within an annotation of
the specified label. For example:
<labels source="PERSON">
<of_attr attr_re="arg[12]" label="LOCATED"/>
...
</labels>
selects those PERSON annotations which are the value of either
the arg1 or arg2 attribute of a LOCATED annotation.
If repeated, any annotation which matches one of the
<of_attr> declarations satisfies the restriction.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
<attr> | a string | no | the <with_attrs> element supports arbitrary attribute-value pairs |
This operator further restricts the annotations specified in
<labels>. Each attribute-value pair is matched to the
annotation under consideration, and only those annotations which
match all pairs satisfy the restriction. If the attribute is of
type string, int, or float, the value specified should be the
expected value; if the attribute is of type boolean, the
recognized values are "yes" and "no"; and if the attribute is of
type annotation, the value is the label of the attribute
value. For example, if the LOCATED annotation has a boolean
attribute "verbal", a string attribute "tense", and an annotation
attribute "arg1", this selector and operator:
<labels source="LOCATED">
<with_attrs verbal="yes" tense="PAST" arg1="PERSON"/>
...
</labels>
selects those LOCATED annotations whose verbal value is true,
whose tense value is "PAST", and whose arg1 value is an annotation
with the "PERSON" label.
If repeated, any annotation which matches one of the
<with_attrs> declaration satisfies the restriction.
This operator discards the selected annotations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
demoted_label | a string |
no | A label name |
demoted_attr | a string |
no | An attribute name |
This operator applies to all selected annotations which are
spanned, and makes them spanless. You can use the optional
attributes to specify an attribute and label to demote the span
to; i.e., the following specification:
<labels source="LOCATED">
<make_spanless demoted_label="span" demoted_attr="extent"/>
</labels>
converts all LOCATED annotations to spanless, and introduces a
new spanned annotation "span", and creates an instance of that
annotation with the span of the original LOCATED annotation, and
inserts that annotation into the "extent" attribute of the LOCATED
annotation.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
target_attr | a string |
yes | An attribute name |
target_label | a string |
yes | A label name |
This operator takes the label of the current annotation, and
makes it the value of the attribute specified in target_attr, and
makes the label of the annotation the label specified in
target_label. For instance:
<labels source="PERSON">
<demote target_attr="TYPE" target_label="ENAMEX/>
</labels>
converts PERSON annotations to ENAMEX annotations with the
TYPE=PERSON attribute-value pair.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
fn | a string |
yes | A function name |
This operation applies an arbitrary Python function to the
selected annotations. This capability has not been tested, and
documenting it further is beyond the scope of this documentation.
This operator "touches" the selected annotations, and blocks them
from being discarded by the global <discard_untouched>
operator.
This operator "untouches" the selected annotations, and makes
them eligible to be discarded by the global
<discard_untouched> operator.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
attrs | a comma-separated string |
yes | a sequence of attribute names |
This operator discards the selected annotations if the named
attributes all have null values.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
target | a string |
yes | A label name |
This operator changes the name of the selected annotations to the
name specified in the target.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
source | a string |
yes | An attribute name |
target_aggregation | "singleton", "set", or "list" |
no | An aggregation |
target | a string |
no | An attribute name |
target_type | "int", "float", "string",
"boolean" |
no | A type name, other than
"annotation" |
This operator converts the attribute named in source. You can
convert the aggregation (from singleton to set, for instance), or
the type (from int to string, or string to int), or map the name
to the specified target, or some combination of these actions.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
source | a string |
yes | The name of an attribute |
This operator changes the label of the selected annotations to
the value of the attribute specified by the source, and removes
the attribute. For example:
<labels source="ENAMEX">
<promote_attr source="TYPE"/>
</labels>
converts all ENAMEX annotations and removes the TYPE attribute;
those with TYPE=PER become PER annotations, etc.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
attrs | a comma-separated string |
yes | a sequence of attribute names |
This operator deletes the specified attributes for the selected
annotations.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
attr | a string |
yes | An attribute name |
target_attrs | a comma-separated string |
yes | a sequence of attribute names |
This operator takes the value of the attribute specified in attr
and splits it among the attributes specified in target_attrs. This
is useful if, e.g., the original attribute is a list aggregation,
and you need those values in separate attributes.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
value_aggregation | "list" or "set" |
yes | an aggregation name |
source_attrs | a comma-separated string |
yes | a sequence of attribute names |
attr | a string |
yes | an attribute name |
This operator takes the values in the attributes specified in
source_attrs and unifies them in a single aggregation of the type
specified by the value_aggregation, and stores the value in the
attribute specified by attr. This is useful if, e.g., the original
values are spread among separate attributes, and you need those
values in a single list or set aggregation.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
attr | a string |
yes | an attribute name |
value | a string |
yes | an attribute value |
value_aggregation | "list" or "set" |
no | an aggregation name |
value_type | "int", "float", "boolean", or
"string" |
no | a type name |
This operation sets the attribute specified by attr to the value
specified by value, interpreted according to the value_aggregation
and value_type provided. The default is a singleton string.
Within the attribute scope, there is one selector,
<values>; and numerous operators.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
source_re | a Python regular expression |
no | a regular expression matching
the full attribute value. Ignored if source is specified. |
source | a string |
no | an attribute value |
excluding_re | a Python regular expression |
no | a regular expression matching
the full attribute value. Ignored if "excluding" is
specified |
excluding | a string |
no | an attribute value |
This selector identifies particular values for further processing, and establishes a value scope. The value, when converted to a string, must match source or source_re (if specified) and not match excluding or excluding_re (if specified). Annotation values cannot be selected; boolean values are converted to "yes" or "no" before being compared. With no values specified, this selector matches all values of the selected attributes.
This operator is the equivalent of the <promote_attr>
operator in the annotation scope. In other words, the following
two specifications are equivalent:
<labels source="ENAMEX">
<promote_attr source="TYPE"/>
</labels>
<labels source="ENAMEX">
<attrs source="TYPE">
<promote/>
</attrs>
</labels>
This operator is the equivalent of the <discard_attrs>
operator in the annotation scope. In other words, the following
two specifications are equivalent:
<labels source="PERSON">
<discard_attrs attrs="nomtype"/>
</labels>
<labels source="PERSON">
<attrs source="nomtype">
<discard/>
</attrs>
</labels>
This operator is sometimes more convenient, since the selection
capabilities of the <attrs> selector are more flexible than
the attrs attribute of the <discard_attrs> operator.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
target_attrs | a comma-separated string |
yes | a sequence of attribute names |
This operator is essentially equivalent to the <split_attr>
operator in the annotation scope.
This operator discards the annotation that bears the selected
attributes if the value of all the selected attributes is null.
This operator discards the annotation that bears the selected
attributes.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
target_aggregation | "singleton", "list', or "set" |
no | an aggregation name |
target | a string |
no | an attribute name |
target_type | "int", "float", "boolean" or
"string" |
no | a type name |
This operation modifies the selected attributes in the specified
way: modifying their aggregation type, modifying their target
type, or changing the attribute name, or some combination of the
three.
The value scope contains three operators.
This operator is like the <promote> operator in attribute
scope, but applies to specific values. For instance:
<labels source="ENAMEX">
<attrs source="TYPE">
<values source="PER">
<promote/>
</values>
</attrs>
</labels>
promotes only ENAMEX TYPE=PER to PER.
This operator is like the <discard> operator in attribute
scope, but discards the only the attribute-value pairs which match
the selected values.
Attribute |
Value |
Obligatory? |
Description |
---|---|---|---|
target_type | "int", "float", "boolean" or
"string" |
no | a type name |
target_aggregation | "singleton", "list" or "set" |
no | an aggregation name |
target | a string |
no | the name of an attribute |
target_value | a string |
no | an attribute value |
This operator is like the <map> operator in attribute
scope, except you also have the option of mapping the value itself
to something else (specified by target_value). For instance, if
you want to convert ENAMEX TYPE=PER annotations into PERSON
annotations, you can do it in one of a number of ways, for
instance:
<labels source="ENAMEX">
<attrs source="TYPE">
<values source="PER">
<map target_value="PERSON"/>
<promote/>
</values>
</attrs>
</labels>
<labels source="ENAMEX">
<with_attrs TYPE="PER"/>
<promote_attr source="TYPE"/>
<map target="PERSON"/>
</labels>