Skip to main content

Artificial Intelligence for Enhancing Data Quality, Standardization & Integration

Two people looking at programming code on a computer screen in an office
Applying innovative data science methods to create datasets that support evidence-based decision-making
  • Client
    National Center for Science and Engineering Statistics
  • Dates
    2024 – 2026

Challenge

Informed policy decisions require high quality data, but many data sources are inconsistent, incomplete, or difficult to use.

Data sources must go through an array of assessment, processing, and standardization steps before they can easily and accurately be used for analysis. While these activities are necessary even for structured survey data, they are even more extensive for nontraditional data sources such as administrative records, geospatial data, or sensor data.

The 2022 CHIPS and Science Act authorized a five-year demonstration project to explore the implementation of a future National Secure Data Service (NSDS), which will support evidence-based decision-making by improving the access and usability of federal, state, and local government data assets. This project is part of the National Secure Data Service roadmap for developing a future toolkit that looks to promote the development of high-quality data. This toolkit can potentially streamline data preparation activities for the federal statistical system and promote the development of high-quality data.

Solution

NORC is exploring innovative applications of artificial intelligence to reduce the resources required to create high-quality data.

NORC’s solution begins with identifying the most promising areas for AI to streamline data preparation activities. We draw from interviews with federal statistical experts and subject matter experts in data quality, privacy, and ethics, as well as the literature and our experience preparing high-quality data sources. Our assessment activities will include considering types of data that require preparation, the challenges that need to be addressed with those data sources, existing tools, and relevant ethical or privacy concerns. 

Result

Our toolkit will increase the quality and scope of data available to build evidence and support decision-making.

After identifying high-priority use cases for automation to support data standardization, integration, and quality, we are building, documenting, and packaging a toolkit that addresses these needs. Our toolkit will improve the accessibility and quality of data sources, especially non-traditional sources such as administrative records or geospatial data, which hold particular potential to shed new light on important policy questions and evidence gaps.

Project Leads

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University