Skip to main content

Brandon Sepulvado

Pronouns: He/Him

Senior Research Methodologist
Brandon specializes in natural language processing, data infrastructure, and research on research.

Brandon is a senior research methodologist in the Methodology & Quantitative Social Sciences department at NORC at the University of Chicago. He leads work in natural language processing (NLP), research on research and science evaluation, and research data infrastructure. Brandon’s NLP work leverages advanced methods, such as deep learning and named-entity recognition, to generate meaningful insights from a wide range of text data, such as social media, administrative records, and surveys, and Brandon’s considerable expertise in data infrastructure allows him to help clients make their data assets secure, accessible, and interoperable. As a sociologist, he uses a deep knowledge of research on research to help clients across sectors monitor and evaluate investments in science, technology, and innovation.

Brandon has delivered research solutions to help a wide range of government and private clients, helping them gain insights into R&D investments. As Co-Principal Investigator on a project with America’s DataHub Consortium, Brandon is uncovering new insights about technology transfer among foreign-born scientists and engineers in the U.S. Brandon has worked with the ASPE Office of Science and Data Policy to understand federal investments, regulatory decisions, and research outcomes surrounding pharmaceutical products that are subject to extended intellectual property claims. Brandon has helped the National Science Foundation assess the quality of data from automated data collection processes, in order to reduce time and effort involved in funding program evaluations, and he works on behalf of the National Institutes of Health to understand collaboration patterns the stem from its Justice Community Opioid Innovation Network.

Brandon’s data science work focuses on NLP and data infrastructure. NLP examples include using named-entity recognition to identify new e-cigarette brands and flavors on social media and developing text classifiers to monitor commercial smokeless tobacco campaigns across social media platforms. His work has entailed the use of deep learning to build a recommender system connecting synthetic biologists with information concerning the ethical and societal implications of their research as well as the development of tools—using algorithms such as topic models and stochastic blockmodels—to process open-ended responses to large-scale surveys. Brandon is helping the NCSES to establish a research data infrastructure integrating survey, administrative, and bibliometric data about the scientific workforce. He was Principal Investigator on a team of experts across the U.S. to develop the Synthetic Biology Knowledge System, which aims to provide a single interface that transforms the way researchers access diverse types of synthetic biology data.

Brandon’s work has been published in peer-reviewed journals and conference proceedings across disciplines and has been supported by prestigious awards from the National Science Foundation, a Fulbright fellowship, the Government of France, and the Countway Library of Medicine (Harvard University/Boston Medical Library). He currently serves as Secretary of the Washington Statistical Society and as Program Chair for the American Statistical Association’s Section on Text Analysis. In the past, he has been elected to multiple section councils of the American Sociological Association, has served as assistant editor for the American Sociological Review, and has been a member of the South Big Data Hub’s Data Science Education & Workforce working group. He regularly serves on the program committee of many conferences, including the Conference on Empirical Methods in Natural Language Processing (EMNLP) and Widening NLP (WiNLP). Brandon frequently speaks around the globe on research on research, NLP, and data infrastructure.

Project Contributions

Support for the Justice Community Opioid Innovation Network

Building the evidence base for addressing opioid use disorder (OUD) stigma and access to OUD treatment

Client:

National Institute of Drug Abuse

STEM Learning Opportunities Before & After COVID-19 School Closures

Examining COVID-19’s impact on high school math and science course trajectories

Client:

National Science Foundation

Early Childhood Training and Technical Assistance Cross-System Evaluation

A first-of-its-kind evaluation to maximize the effectiveness of TTA provided to early childhood grantees

Client:

Office of Head Start and Office of Child Care in the Administration for Children and Families, U.S. Department of Health and Human Services

Developing a Supervised Model of Online COVID Vaccine Information

Examining AI biases in model development to accurately detect and classify online COVID-19 vaccine information

Client:

Amazon Web Services (AWS)

Data Usage Platform as a Federal Data Asset

User experience research and prototyping for a federal data usage platform

Client:

National Center for Science and Engineering Statistics

Data Concierge Models for a National Secure Data Service

Developing novel models and tools to assist federal data users

Client:

National Center for Science and Engineering Statistics

Curriculum & Learning Improvement Project (CLIP)

Creating an innovative data ecosystem to support classroom instruction and education research

Client:

Bill & Melinda Gates Foundation

Curriculum & Learning Improvement Project (CLIP)

Creating an innovative data ecosystem to support classroom instruction and education research

Client:

Bill & Melinda Gates Foundation

Assessing the Effects of Smokeless Tobacco Influencer Marketing in the Rapidly Changing Media Environment

The first comprehensive study to examine the effects of social media promotion of smokeless tobacco use among rural and urban populations

Funder:

National Cancer Institute and the Food and Drug Administration

America’s DataHub Consortium

Demonstrating replicable processes for acquiring and providing secure access to linked data sources

Client:

National Center for Science and Engineering Statistics

Publications

  • Brandon Sepulvado Elected Chair of the Section on Text Analysis of the American Statistical Association

    Announcement | August 6, 2024

  • opens in new tab"Detecting and Mitigating Algorithmic Bias in Online Misinformation"

    Presentation | August 6, 2024

    Sepulvado, B., Burke-Garcia, A., Lerner, J.Y., Carter, C., Cutroneo, E., Tran, H. and Lafia, S.
  • "Detecting and Mitigating Algorithmic Bias in Online Misinformation"

    Presentation | May 15, 2024

    Carter, C., Sepulvado, B., Lerner, J.Y., Cutroneo, E., Tran, H., Lafia, S. and Burke-Garcia, A.
  • Enhancing Survey Methodology with LLMs.

    Presentation | May 1, 2024

    Lerner, J.Y., Sepulvado, B., Bilgen, I., Christian, L. and Huang, L.
  • Optimizing Open-Ended Questions for Natural Language Processing and Enhancing Survey Research Quality with LLMs.

    Presentation | April 1, 2024

    Sepulvado, B., Lerner, J.Y., Bilgen, I., Christian, L. and Huang, L.
  • Metadata Considerations to Ensure Interoperability for an International Network of Infrastructure Projects.

    Presentation | October 1, 2023

    Sepulvado, B.
  • An NLP-based Approach to Record Linkage.

    Presentation | October 1, 2023

    Huang, L., Sepulvado, B., Resnick, D., Taub, J. and Betancourt, B.
  • Using NLP to Achieve a More Equitable and Inclusive Survey Design?

    Presentation | September 1, 2023

    Sepulvado, B., Fordyce, E., Hansen, C., Christian, L. and du Toit, N.