Skip to main content

Linking Parent & Statistical Agency Data

Padlock on top of a computer keyboard
Linking NCSES SED and NSF PI data to inform future linkages between a statistical agency and its parent agency
  • Client
    National Center for Science and Engineering Statistics
  • Dates
    2023 – 2025

Problem

In support of the National Secure Data Service (NSDS) Demonstration project, this project will demonstrate utilizing a privacy preserving record linkage (PPRL) open-source tool to link two disparate data sources.

The NSDS Demonstration project aims to strengthen data linkage and data access infrastructure. As an effort to inform the NSDS, the National Center for Engineering Statistics (NCSES) within the U.S. National Science Foundation (NSF) aims to explore testing the feasibility of an open-source tool. The project will develop a data sharing agreement between a federal statistical agency and its parent agency, link two disparate sources, and create an analytic dataset that can be used to answer questions that could not be answered with either source alone.

This project will provide critical insights into best practices for data linkage, interoperability, and privacy-enhancing technologies. It will also help establish standardized agreements and methodologies that can be adopted across the federal data ecosystem. Additionally, the project will evaluate the benefits and challenges of different PPRL tools based on the nature of the data being linked, ensuring that future NSDS implementations have a solid foundation for secure and effective data sharing. 

Solution

NORC linked data using an open source PPRL tool. 

To demonstrate a linkage between a statistical agency and its parent agency, NORC linked the following two data sources:

  • NCSES Survey of Earned Doctorates (SED)
  • NSF Principal Investigator (PI) award data

Working closely with NCSES and NSF staff, NORC developed a data sharing agreement while documenting and highlighting the required considerations of developing such an agreement specifically between a statistical agency and its parent agency. As part of developing the agreement, a process flow was created to present a suggested infrastructure that identifies responsibilities for data ownership, storage, processing, and linking. This infrastructure ensures the ability to conduct PPRL to link sources without ever exchanging direct personally identifiable information (PII).

NORC also considered both open-source and commercial PPRL software options to identify the types of considerations and precautions that should be taken when selecting software for linkage activities such as strengths or limitations in the capabilities of a particular software based on the available PII in source data.

Result

This project is currently in the process of developing a recommended linkage strategy and guidance.

A Data Sharing Agreement as well as guidance on selecting an appropriate PPRL tool and linkage strategy are delivered throughout the project lifecycle. A final methodology report will detail the specific PPRL tool selection and methods for linking SED and PI data as well as guidance and lessons learned to inform future linkages relying on PPRL. A statistical analysis report will detail specific analyses on the final linked SED-PI data as well as what is needed to assess the feasibility of analyzing linked data in a secure environment to support evidence-based policymaking.

Project Leads

"By pioneering the use of advanced privacy-protecting record linkage tools, we are creating new opportunities for data-driven insights that were previously unattainable. This PPRL project underscores our commitment to innovation in data linkage and privacy protection, setting a precedent for future collaborations between federal statistical agencies and their parent organizations."

Vice President

"By pioneering the use of advanced privacy-protecting record linkage tools, we are creating new opportunities for data-driven insights that were previously unattainable. This PPRL project underscores our commitment to innovation in data linkage and privacy protection, setting a precedent for future collaborations between federal statistical agencies and their parent organizations."

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University