MIST: The MITRE Identification Scrubber Toolkit

The MITRE Identification Scrubber Toolkit (MIST) is a suite of tools for deidentifying free-text documents containing personally identifiable information (PII). MIST helps you replace these PII either with obscuring fillers, such as [NAME], or with artificial, synthesized, but realistic English fillers. The transformed documents you create with this toolkit are more likely to meet the requirements of your organization for protecting privacy in documents you distribute.

For example, assume you're given the following document:

Patient ID: P89474

Mary Phillips is a 45-year-old woman with a history of diabetes.
She arrived at New Hope Medical Center on August 5 complaining
of abdominal pain. Dr. Gertrude Philippoussis diagnosed her
with appendicitis and admitted her at 10 PM.

MIST provides facilities for converting this document to something like this:

Patient ID: [ID]

[NAME] is a [AGE]-year-old woman with a history of diabetes.
She arrived at [HOSPITAL] on [DATE] complaining
of abdominal pain. Dr. [PHYSICIAN] diagnosed her
with appendicitis and admitted her at 10 PM.

or even to resynthesize an English document replacement like this:

Patient ID: ID586

Sandy Parkinson is a 34-year-old woman with a history of diabetes.
She arrived at Mercy Hospital on July 10 complaining
of abdominal pain. Dr. Myron Prendergast diagnosed her
with appendicitis and admitted her at 10 PM.

How it works

MIST decomposes the deidentification task into two subtasks:

The first subtask is addressed by the MITRE Annotation Toolkit (MAT), which is a highly customizable suite of tools for natural language processing upon which MIST is built. The customizations for MIST itself address the second subtask. The documentation that follows uses the terms annotation and tagging interchangeably for the task of identifying, either by hand or automatically, the PII phrases in your documents. The labels for your PII types (e.g., NAME, PHYSICIAN, AGE, DATE) will be the tags that you'll be applying to your documents.


All of MITRE's contributions to MIST are licensed under the BSD license, as described in the file LICENSE at the root of your distribution. There are many third-party packages and utilities which are distributed with MIST; they and their licenses are all described in the license file.

What you should do next

The documentation that follows is approximately in the order you should review it.

Have fun!