Linking Parent & Statistical Agency Data
Problem
In support of the National Secure Data Service (NSDS) Demonstration project, this project will demonstrate utilizing a privacy preserving record linkage (PPRL) open-source tool to link two disparate data sources.
The NSDS Demonstration project aims to strengthen data linkage and data access infrastructure. As an effort to inform the NSDS, the National Center for Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF) aims to explore testing the feasibility of an open-source tool. The project will develop a data sharing agreement between a federal statistical agency and its parent agency, link two disparate sources, and create an analytic dataset that can be used to answer questions that could not be answered with either source alone.
This project will provide critical insights into best practices for data linkage, interoperability, and privacy-enhancing technologies. It will also help establish standardized agreements and methodologies that can be adopted across the federal data ecosystem. Additionally, the project will evaluate the benefits and challenges of different PPRL tools based on the nature of the data being linked, ensuring that future NSDS implementations have a solid foundation for secure and effective data sharing.
Solution
NORC linked data using an open source PPRL tool.
To demonstrate a linkage between a statistical agency and its parent agency, NORC linked the following two data sources:
- NCSES Survey of Earned Doctorates (SED)
- NSF Principal Investigator (PI) award data
Working closely with NCSES and NSF staff, NORC developed a data sharing agreement while documenting and highlighting the required considerations of developing such an agreement specifically between a statistical agency and its parent agency. As part of developing the agreement, a process flow was created to present a suggested infrastructure that identifies responsibilities for data ownership, storage, processing, and linking. This infrastructure ensures the ability to conduct PPRL to link sources without ever exchanging direct personally identifiable information (PII).
NORC also considered both open-source and commercial PPRL software options to identify the types of considerations and precautions that should be taken when selecting software for linkage activities such as strengths or limitations in the capabilities of a particular software based on the available PII in source data.
Result
This project is currently in the process of developing a recommended linkage strategy and guidance.
A Data Sharing Agreement as well as guidance on selecting an appropriate PPRL tool and linkage strategy are delivered throughout the project lifecycle. A final methodology report will detail the specific PPRL tool selection and methods for linking SED and PI data as well as guidance and lessons learned to inform future linkages relying on PPRL. A statistical analysis report will detail specific analyses on the final linked SED-PI data as well as what is needed to assess the feasibility of analyzing linked data in a secure environment to support evidence-based policymaking.
Related Tags
Project Leads
-
Don Jang
Vice PresidentProject Director -
Chrystine Tadler
Senior StatisticianProject Manager -
Dean Resnick
Principal Data ScientistTechnical Lead -
Chris Cox
Senior FellowSubject Matter Expert