Skip to main content

Linking Federal Health Data While Protecting Privacy

Stock photo of medical records in Doctors office
Proof that data sets can be combined without exchanging sensitive information
  • Client
    National Center for Health Statistics (NCHS)
  • Dates
    2020 - 2022

Problem

Combining sets of health records while maintaining privacy is a challenge.

The government collects and stores data on each of the more than 80 million people enrolled in Medicare, Medicaid, and the Children's Health Insurance Program. On its own, this data can help policymakers and practitioners provide better health care. But the data would be even more valuable linked with other records that hold additional personal health information. The problem is that those individual health records contain names, dates of birth, home addresses, and other identifiers that cannot be shared with other parties, including the custodian of the other file in the linkage.

Solution

NORC tested using “hashing” to link records while maintaining privacy.

The National Center for Health Statistics (NCHS) asked NORC at the University of Chicago to conduct a two-step test. The first test was whether datasets could be linked using encrypted identifiers instead of names, a method known as hashing. The second test was whether hashing accurately connected records from the same individual, thereby producing combined sets of data that were trustworthy. We used software from Datavant for the encryption. We then evaluated linkage efficacy by comparison to a gold-standard linkage we conducted previously using the same data prior to encryption with enhanced linkage strategies. 

Result

We showed data sharing and personal privacy protection are not mutually exclusive. 

The project demonstrated that encryption could be employed to link datasets reliably and efficiently without exchanging the linkage files between distinct custodians.

Project Leads

“We’ve demonstrated a capability for building linked files in a way that protects privacy that can be used for evidence-based policy making.”

Principal Data Scientist

“We’ve demonstrated a capability for building linked files in a way that protects privacy that can be used for evidence-based policy making.”

Explore NORC Health Projects

A Blueprint for Collecting National Firearms Data

Reliable data hold the key to better policymaking

Client:

Arnold Ventures LLC

ACA Public Comment Review and Management (2024 & 2025)

Categorizing, analyzing, and synthesizing public comments on Affordable Care Act rulemaking

Client:

Center for Consumer Information and Insurance Oversight (CCIIO) at the Centers for Medicare and Medicaid Services (CMS)