Linking Federal Health Data While Protecting Privacy
Problem
Combining sets of health records while maintaining privacy is a challenge.
The government collects and stores data on each of the more than 80 million people enrolled in Medicare, Medicaid, and the Children's Health Insurance Program. On its own, this data can help policymakers and practitioners provide better health care. But the data would be even more valuable linked with other records that hold additional personal health information. The problem is that those individual health records contain names, dates of birth, home addresses, and other identifiers that cannot be shared with other parties, including the custodian of the other file in the linkage.
Solution
NORC tested using “hashing” to link records while maintaining privacy.
The National Center for Health Statistics (NCHS) asked NORC at the University of Chicago to conduct a two-step test. The first test was whether datasets could be linked using encrypted identifiers instead of names, a method known as hashing. The second test was whether hashing accurately connected records from the same individual, thereby producing combined sets of data that were trustworthy. We used software from Datavant for the encryption. We then evaluated linkage efficacy by comparison to a gold-standard linkage we conducted previously using the same data prior to encryption with enhanced linkage strategies.
Result
We showed data sharing and personal privacy protection are not mutually exclusive.
The project demonstrated that encryption could be employed to link datasets reliably and efficiently without exchanging the linkage files between distinct custodians.
Related Tags
Project Leads
-
Edward Mulrow
Senior Vice President & DirectorProject Director -
Dean Resnick
Principal Data ScientistPrincipal Investigator -
Chris Cox
Senior FellowSenior Staff -
Scott Campbell
Senior StatisticianSenior Staff