The HIPAA task is a general-purpose deidentification task which
addresses deidentification of medical records. Make sure you've read
the documentation on general
deidentification customizations.
Most of the workflows in the HIPAA task have an additional initial step, "clean", which cannot be undone; this step converts the document to ASCII with Unix line endings.
The HIPAA law which governs medical record privacy specifies 19
categories which the law requires to be obscured. We discuss their
implementation here. This implementation is informed by our
experiences so far with our research partners. The bold text for each
section is taken directly from 45CFR164.514, the law which governs PHI
privacy.
(A) Names
The NAME tag (full and partial names) and
the INITIALS tag (initials).
(B) All geographic subdivisions
smaller than a State, including
street address, city, county,
precinct, zip code, and their equivalent
geocodes, except for the initial three
digits of a zip code if,
according to the current publicly
available data from the Bureau of the
Census:
(1) The geographic unit formed
by combining all zip codes with the
same
three initial digits contains more than 20,000 people; and
(2) The initial three digits of
a zip code for all such geographic
units
containing 20,000 or fewer people is changed to 000.
The LOCATION tag. This tag does not
permit subdivision of ZIP codes. The state should be obscured as well.
All contiguous elements of a location should be included in a single
tag, e.g., "12 Mulberry Lane, Winston-Salem, NC, 52004". Locations
internal to a hospital, such as room numbers, should use the OTHER tag.
(C) All elements of dates (except
year) for dates directly related
to an individual, including birth
date, admission date, discharge date,
date of death; and all ages over 89
and all elements of dates (including
year) indicative of such age, except
that such ages and elements may be
aggregated into a single category of
age 90 or older
The DATE tag and the AGE tag. The DATE
tag should include the year, to support resynthesis of realistic
fillers (this process is significantly hampered by leaving the year
out). We recommend that all ages be tagged.
(D) Telephone numbers;
(E) Fax numbers;
The HIPAA task provides some special replacer implementations.
Implementation |
UI name |
Description |
---|---|---|
HIPAAResynthesis.HIPAAClearReplacementEngine |
clear -> clear |
A specialization of the general
clear -> clear replacer which provides some customizations for
rendering some HIPAA-specific categories. |
HIPAAResynthesis.HIPAADEIDStyleReplacementEngine |
clear -> DE-ID |
Maps clear text PIIs to the a
DE-id-style obscured pattern. For most tags, the pattern is, e.g., **HOSPITAL. However, AGE, DATE and NAME have subsequent patterns surrounded by angle brackets.
|
HIPAAResynthesis.HIPAADEIDStyleResynthesisEngine |
DE-ID -> clear |
Maps the DE-id-style pattern
described above into clear text |
HIPAAResynthesis.HIPAABracketResynthesisEngine |
[ ] -> clear |
A specialization of the general
[ ] -> clear replacer which provides some customizations for
rendering some HIPAA-specific categories |