The MITRE Identification Scrubber Toolkit (MIST) is a suite of tools
for deidentifying
free-text documents containing personally identifiable information
(PII). MIST helps you replace these PII either
with obscuring fillers, such as [NAME], or with artificial,
synthesized, but realistic English fillers. The transformed documents
you create with this toolkit are more likely to meet the requirements
of your organization for protecting privacy in documents you distribute.
For example, assume you're given the following document:
Patient ID: P89474
Mary Phillips is a 45-year-old woman with a history of diabetes.
She arrived at New Hope Medical Center on August 5 complaining
of abdominal pain. Dr. Gertrude Philippoussis diagnosed her
with appendicitis and admitted her at 10 PM.
MIST provides facilities for converting this document to something
like this:
Patient ID: [ID]
[NAME] is a [AGE]-year-old woman with a history of diabetes.
She arrived at [HOSPITAL] on [DATE] complaining
of abdominal pain. Dr. [PHYSICIAN] diagnosed her
with appendicitis and admitted her at 10 PM.
or even to resynthesize an English document replacement like this:
Patient ID: ID586
Sandy Parkinson is a 34-year-old woman with a history of diabetes.
She arrived at Mercy Hospital on July 10 complaining
of abdominal pain. Dr. Myron Prendergast diagnosed her
with appendicitis and admitted her at 10 PM.
MIST decomposes the deidentification task into two subtasks:
The first subtask is addressed by the MITRE Annotation Toolkit
(MAT),
which is a highly customizable suite of tools for natural language
processing upon which MIST is built. The customizations for MIST itself
address the second subtask. The documentation that follows uses the
terms annotation and tagging interchangeably for the task
of identifying, either by hand or automatically, the PII
phrases in your documents. The labels for your PII types (e.g., NAME,
PHYSICIAN, AGE, DATE) will be the tags that you'll be applying to your
documents.
All of MITRE's contributions to MIST are licensed under the BSD
license, as described in the file LICENSE at the root of your
distribution.
There are many third-party packages and utilities which are distributed
with MIST; they and their licenses are all described in the license
file.
The documentation that follows is approximately in the order you
should review it.
Have fun!