The HIPAA task is a general-purpose deidentification task which
addresses deidentification of medical records. Make sure you've
the documentation on general
The name of this task, when you need to refer to it in MATEngine,
MATWorkspaceEngine, or the UI, is "HIPAA Deidentification".
Most of the workflows in the HIPAA task have an additional initial step, "clean", which cannot be undone; this step converts the document to ASCII with Unix line endings.
The HIPAA law which governs medical record privacy specifies 19
categories which the law requires to be obscured. We discuss their
implementation here. This implementation is informed by our
experiences so far with our research partners. The bold text for
section is taken directly from 45CFR164.514, the law which governs
The NAME tag (full and partial names)
the INITIALS tag (initials).
(B) All geographic subdivisions
smaller than a State, including
street address, city, county, precinct, zip code, and their equivalent
geocodes, except for the initial three digits of a zip code if,
according to the current publicly available data from the Bureau of the
(1) The geographic unit formed by combining all zip codes with the
same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a zip code for all such geographic
units containing 20,000 or fewer people is changed to 000.
The LOCATION tag. This tag does not
permit subdivision of ZIP codes. The state should be obscured as
All contiguous elements of a location should be included in a
tag, e.g., "12 Mulberry Lane, Winston-Salem, NC, 52004". Locations
internal to a hospital, such as room numbers, should use the OTHER
(C) All elements of dates
year) for dates directly related
to an individual, including birth date, admission date, discharge date,
date of death; and all ages over 89 and all elements of dates (including
year) indicative of such age, except that such ages and elements may be
aggregated into a single category of age 90 or older
The DATE tag and the AGE tag. The DATE
tag should include the year, to support resynthesis of realistic
fillers (this process is significantly hampered by leaving the
out). We recommend that all ages be tagged.
(D) Telephone numbers;
(E) Fax numbers;
The HIPAA task provides some special replacer implementations.
||clear -> clear
||A specialization of the
clear -> clear replacer which provides some
rendering some HIPAA-specific categories.
||clear -> DE-ID
||Maps clear text PIIs to the a
DE-id-style obscured pattern.
For most tags, the pattern is, e.g., **HOSPITAL. However, AGE, DATE and NAME have subsequent patterns surrounded by angle brackets.
||DE-ID -> clear
||Maps the DE-id-style pattern
described above into clear text
||[ ] -> clear
||A specialization of the
[ ] -> clear replacer which provides some customizations
rendering some HIPAA-specific categories