The HIPAA task is a general-purpose deidentification task which
addresses deidentification of medical records. Make sure you've
read
the documentation on general
deidentification
customizations.
The name of this task, when you need to refer to it in MATEngine,
MATWorkspaceEngine, or the UI, is "HIPAA Deidentification".
Most of the workflows in the HIPAA task have an additional initial step, "clean", which cannot be undone; this step converts the document to ASCII with Unix line endings.
The HIPAA law which governs medical record privacy specifies 19
categories which the law requires to be obscured. We discuss their
implementation here. This implementation is informed by our
experiences so far with our research partners. The bold text for
each
section is taken directly from 45CFR164.514, the law which governs
PHI
privacy.
(A) Names
The NAME tag (full and partial names)
and
the INITIALS tag (initials).
(B) All geographic subdivisions
smaller than a State, including
street address, city, county,
precinct, zip code, and their equivalent
geocodes, except for the initial
three
digits of a zip code if,
according to the current publicly
available data from the Bureau of the
Census:
(1) The geographic unit
formed
by combining all zip codes with the
same
three initial digits contains more than 20,000 people; and
(2) The initial three
digits of
a zip code for all such geographic
units
containing 20,000 or fewer people is changed to 000.
The LOCATION tag. This tag does not
permit subdivision of ZIP codes. The state should be obscured as
well.
All contiguous elements of a location should be included in a
single
tag, e.g., "12 Mulberry Lane, Winston-Salem, NC, 52004". Locations
internal to a hospital, such as room numbers, should use the OTHER
tag.
(C) All elements of dates
(except
year) for dates directly related
to an individual, including birth
date, admission date, discharge date,
date of death; and all ages over
89
and all elements of dates (including
year) indicative of such age,
except
that such ages and elements may be
aggregated into a single category
of
age 90 or older
The DATE tag and the AGE tag. The DATE
tag should include the year, to support resynthesis of realistic
fillers (this process is significantly hampered by leaving the
year
out). We recommend that all ages be tagged.
(D) Telephone numbers;
(E) Fax numbers;
The HIPAA task provides some special replacer implementations.
Implementation |
UI name |
Description |
---|---|---|
HIPAAResynthesis.HIPAAClearReplacementEngine |
clear -> clear |
A specialization of the
general
clear -> clear replacer which provides some
customizations for
rendering some HIPAA-specific categories. |
HIPAAResynthesis.HIPAADEIDStyleReplacementEngine |
clear -> DE-ID |
Maps clear text PIIs to the a
DE-id-style obscured pattern. For most tags, the pattern is, e.g., **HOSPITAL. However, AGE, DATE and NAME have subsequent patterns surrounded by angle brackets.
|
HIPAAResynthesis.HIPAADEIDStyleResynthesisEngine |
DE-ID -> clear |
Maps the DE-id-style pattern
described above into clear text |
HIPAAResynthesis.HIPAABracketResynthesisEngine |
[ ] -> clear |
A specialization of the
general
[ ] -> clear replacer which provides some customizations
for
rendering some HIPAA-specific categories |