Skip to main content

The Promise & Pitfalls of AI-Augmented Survey Research

Expert View

Author

Joshua Lerner
Research Methodologist

October 2024

Artificial intelligence (AI) tools, particularly large language models (LLMs), are transforming survey research in ways that I find exciting and full of potential.

As someone deeply invested in the intersection between AI and the social sciences, I’ve seen firsthand how these tools can streamline research and unlock new possibilities. However, we must approach these possibilities with a balanced perspective. While AI-augmented surveys and survey methods offer enormous opportunities, we must realize that LLMs cannot fully capture the nuances of human behavior that survey research is designed to explore. 

Because AI tools process vast amounts of text quickly and efficiently, they can help us overcome some of the traditional roadblocks in survey research. The emerging pathways discussed below have exciting potential. 

“Because AI tools process vast amounts of text quickly and efficiently, they can help us overcome some of the traditional roadblocks in survey research.”

Research Methodologist, Methodology & Quantitative Social Sciences

“Because AI tools process vast amounts of text quickly and efficiently, they can help us overcome some of the traditional roadblocks in survey research.”

Automating Coding of Open-Ended Responses

One standout application of LLMs is their ability to automate the coding and analysis of open-ended responses. Traditionally, this work has been labor-intensive, but AI can now categorize and interpret textual data more efficiently than humans—especially at scale. Like Natural Language Processing (NLP) models, LLMs have succeeded remarkably in replicating the quality of human coders while reducing direct costs.

This has the potential to reduce the workload for researchers and accelerate the analysis of qualitative data—a significant breakthrough that can revolutionize one of the most challenging and labor-intensive survey analysis tasks. Even when benchmarked against traditional machine learning approaches, LLMs do just as well and require less statistical and programming knowledge to implement. 

Enhancing Questionnaire Design

Generative AI can also transform how we design surveys. By analyzing large datasets and previous survey responses, LLMs can help draft more effective, pointed questions and even suggest variations to reduce bias or improve clarity. This idea, presented by multiple research teams at AAPOR, is built around the concept that LLMs can be trained to learn the principles of good questionnaire design and aid researchers in writing questions with that advice.

While it doesn’t seem likely that LLMs will replace trained survey methodologists any time soon, a well-trained LLM can generate first-draft questions for standard topics when trained on question design best practices. The idea would not be to replace standard survey methodologists with LLMs, but to help guide people designing surveys to best practices without requiring additional oversight: a situation where LLMs might raise the floor on research quality, not the ceiling. 

Imputing Missing Public Opinion Data

LLMs can help impute missing data to complex, multicategory, and even open-ended questions. By analyzing patterns from existing responses, LLMs can estimate answers that were never collected. For instance, I’ve seen LLMs predict public opinion trends—like attitudes toward same-sex marriage—using historical survey data, even from periods when such questions weren’t asked.

This ability to “fill in the gaps” has opened up new ways to study long-term trends, giving us a clearer picture of societal change over time, even with incomplete data. Having the ability to retrodict public opinion, especially when the LLMs are trained on decades of public opinion data and individual socio-demographic trends around those opinions, should also allow us to explore perspectives from underrepresented or hard-to-reach populations.  

AI-Assisted Chatbots to Improve Survey Delivery

Another exciting frontier is using AI-powered chatbots to guide respondents through surveys. This has the potential to be a game changer for improving response quality and completion rates. These chatbots can clarify questions, prompt for more detailed answers, and create a more interactive survey experience. This could help reduce respondent fatigue and improve the depth and quality of the data we collect.

Further, this development would allow for more personalized tailoring of the survey experience to specific respondent needs and limitations, potentially improving the respondent experience during the survey and the quality and consistency of the responses provided. There are some concerns about how respondents will react to the chatbots—will they view them as annoyances like customer service agents? Will they react more favorably? While the early evidence is good that they’re an improvement, this is still a burgeoning field of study. 

Fraud Detection & Data Quality

AI also has a role to play in ensuring data quality, which is one of the major challenges in large-scale surveys. LLMs can be used to detect fraudulent or lower-quality responses by identifying patterns in open-ended answers that don’t match genuine human behavior.

I think we’re only beginning to tap into this area, but it’s crucial for maintaining the integrity of survey data, especially in the digital age, where responses generated by AI could skew results. Ironically, AI is part of both the potential problem and solution here. 

Recognizing the Limitations of LLMs

As excited as I am about the potential of LLMs as research tools, I believe it’s equally important to acknowledge what they cannot do—at least not yet. Several limitations have become clear as we integrate LLMs into survey research, and these are areas where we need to be cautious.

Struggles to Align with Human Opinion Patterns

While there has been some hype around the potential for LLMs to supplement or even replace human respondents in survey research, especially coming from AI companies, LLMs don’t always align tightly with the nuanced patterns of human opinions, mainly when those opinions are shaped by demographic factors like race, age, or political party. The more complex the interactions between various demographics and identities, the more the AI models will default to something generally true but without the subtle inconsistencies and peculiarities that mark individual policy opinions. While AI can approximate public sentiment, it often misses the subtle shifts in opinion that we see across different groups. This limits how well we can use LLMs to predict future trends in public opinion or use them as a testing ground for not-yet fielded survey questions: they provide data that are not similar enough regarding messiness to actual human responses to give us a good baseline for comparison. 

Difficulty Replicating Human Language in Open-Ended Responses

If we explore areas where LLM usage might be limited, we have to think about specific problems LLMs have with generating synthetic open-ended responses. I have seen proclamations that LLMs can reduce or eliminate the need for cognitive testing of open-ended questions because we can learn from the biases in the generated answers without spending resources on human oversight. While LLMs can generate coherent, polished answers, they often lack the raw, varied, and sometimes messy nature of real human responses. In my experience, this is where the richness of qualitative data is lost. LLM-generated responses tend to be more uniform, limiting our ability to draw deep insights from the authentic, personal ways respondents express their thoughts. 

Bias in Training Data

I’m particularly concerned about the risk of bias in LLM outputs. These models are trained on vast amounts of text from the internet. As a result, they can reflect the biases present in that data. When we rely on LLMs to autogenerate any measurement, we risk perpetuating these biases, especially when it comes to underrepresented or marginalized groups. Even with fine-tuning, LLMs can still struggle to accurately reflect the opinions of underrepresented groups and have the potential for some implicit political biases. This is why, even if you’re using LLMs completely transparently in your research pipeline, extensive validation and spot-checking are necessary to ensure these biases do not drive more significant problems. While LLMs will all have biases based on their training, not all LLMs will have the same biases. Testing multiple different LLMs—especially using open-source LLMs as much as possible—can provide an implicit safeguard against some types of biases.

Ethical Considerations & Privacy

We can’t ignore the ethical implications of using AI in survey research. Predicting someone’s opinion without their explicit input raises serious questions about privacy and consent. As we integrate AI more deeply into research, it’s essential to ensure that the human element is still front and center. We need to remain transparent and ethical in how we use AI to inform surveys, particularly in high-stakes areas like political polling or market research. 

The Challenges of Replication & Reproducibility

Finally, a major concern with LLM usage in any research capacity is that LLMs do not always lend themselves to the standard replication and reproducibility standards that have become the status quo in much of the social sciences. With LLMs, every time the model is updated or retrained, it can change how it generates responses. This makes it challenging to reproduce earlier findings, even with the same input data and prompts. Plus, LLMs are sensitive to subtle variations in the way questions are framed, which can lead to inconsistencies in results. At NORC, we are very concerned about replication and reproducibility of our results. This highlights the need for transparency and version control when using LLMs in surveys—documenting the model version and prompt structure becomes critical to ensure that results are reliable and reproducible over time. 

A Path Forward: Integrating AI with Caution

I’m optimistic about the future of AI in surveys, but I also believe that a balanced approach is necessary. LLMs are potent tools that can greatly enhance what we do. Still, they shouldn’t replace learned expertise from trained survey methodologists and social scientists who balance technical skills with human insight and years of practical experience.

As we continue exploring these technologies, we must remember the limitations outlined above and proceed cautiously. But at the same time, we must be aware that AI models will be another important tool in the methodologist/researcher’s toolbox. We need a lot of experimentation and rigorous testing to learn how to use it effectively and to understand the biases it introduces. 

As researchers, we need to think through some of these big questions and design research agendas around answering them. Some promising areas include understanding: 

  • How adaptive chats and surveys impact respondent behavior
  • What biases are introduced when coding open-ends with LLMs
  • Whether we can build LLMs that can generate completely synthetic respondents that capture the nuances and complexities humans are known for

Doing these things rigorously and with an eye toward transparency is the only way to advance the state of knowledge in this field honestly and is the way forward for AI and survey research. 

As we can already see in AI, there will always be people interested in promising the moon with each innovation and, simultaneously, people approaching each new technology with the unmoored skepticism of modern-day Luddites. The future of AI in survey research is undoubtedly exciting, but as with any transformative technology, we need to approach it with a clear-eyed view of its strengths and weaknesses. If we can stay grounded in the real-world complexities of human behavior while leveraging AI’s efficiency and scalability, I’m confident that we can unlock new insights that benefit both researchers and the public.



Relevant Bibliography

Argyle, Lisa P., Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. "Out of One, Many: Using Language Models to Simulate Human Samples." Political Analysis 31 (2023): 337-351. https://doi.org/10.1017/pan.2023.2

Bell, Kelly, Sherin Mattappallil, Sarah Kahl, and Jordan Forrest. "Cracking the Code: AI vs. Human Accuracy in Open-Ended Questions." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Bisbee, James, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. "Synthetic Replacements for Human Survey Data? The Perils of Large Language Models." Political Analysis 31 (2023). https://doi.org/10.1017/pan.2023.1

Buskirk, Trent D., Adam Eck, and Jerry Timbrook. "The Task Is to Improve the Ask: An Experimental Approach to Developing Optimal Prompts for Generating Survey Questions from Generative AI Tools." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Christiansen, William, and Matthew Wagner. "Using Generative AI to Design Survey Vignettes in Public Opinion Research." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Geisen, Emily. "Prompting Insight: Enhancing Open-Ended Survey Responses with AI-Powered Follow-Ups." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Jaiman, Ashish. "Large Language Models [LLMs]: An Overview." Medium, 2023. 

Johnson, Edward Paul, and Carole Hubbard. "Combating AI Bots with Imagery-Powered Open Ends." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Kelley, Sarah, and Claire Kelley. "Customizing Generative AI Tools for Systematic Literature Review." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Kelley, Sarah. "ChatGPT, Can You Help Me Understand If I Am Eligible for Student Aid? Using Customized Chatbots to Code Policy Documents and Help Students Navigate Resources." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Kim, Junsol, and Byungkyu Lee. "AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction." University of Chicago, 2023. 

Lerner, Joshua Y., Brandon Sepulvado, Ipek Bilgen, Leah Christian, and Lilian Huang. "The Questionable Utility of LLMs for Open-Ended Question Design Research." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Link, Michael W., and Nick Bertoni. "Testing Large Language Models to Identify Themes and Sentiment in the Voice of the Respondent: Efficiencies and Cautions." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Nesho, Dritan, Luqman Osman, Jarod Kelly, Erik Green, and Mark Clipsham. "Detecting Fraud through Open-Ended Questions with Language Models." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Padgett, Zoe, Antonio Maiorino, and Sam Gutierrez. "Evaluating the Quality of Questionnaires Created with SurveyMonkey’s Build with AI." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Rogers, Benjamin, Catherine Lamoreaux, and Valerie Ryan. "Curating Themes in Open-Ended Survey Responses with AI." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Steiger, Darby, and Robyn Rapoport. "Modernization of Qualitative Research: Will AI Ever Replace Us?" Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Tao, Ran, Rosalynn Yang, Gina Walejko, Yongwei Yang, and Brianna Groenhout. "Using Large Language Models (LLM) to Pretest Survey Questions." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Velez, Y. R., and P. Liu. "Confronting Core Issues: A Critical Assessment of Attitude Polarization Using Tailored Experiments." American Political Science Review (2024): 1-18. https://doi.org/10.1017/S0003055424000819

von der Heyde, Leah, Anna-Carolina Haensch, and Alexander Wenz. "Vox Populi Vox AI? Using Language Models to Estimate German Public Opinion." Paper presented at the American Association for Public Opinion Research (AAPOR) Conference, Atlanta, GA, 2024. 

Wu, Patrick Y., Jonathan Nagler, Joshua A. Tucker, and Solomon Messing. "Large language models can be used to estimate the latent positions of politicians." arXiv preprint arXiv:2303.12057 (2023). 

Zhang, Yue, Yafu Li, Leyang Cui, et al. "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models." arXiv, 2023. https://arxiv.org/abs/2302.06453

Suggested Citation

Learner, J. (2024, October 9). The Promise and Pitfalls of AI-Augmented Survey Research. [Web blog post]. NORC at the University of Chicago. Retrieved from www.norc.org.


Tags

Research Divisions

Departments, Centers & Programs


Experts

Explore NORC Research Science Projects

Analyzing Parent Narratives to Create Parent Gauge™

Helping Head Start build a tool to assess parent, family, and community engagement

Client:

National Head Start Association, Ford Foundation, Rainin Foundation, Region V Head Start Association

America in One Room

A “deliberative polling” experiment to bridge American partisanship

Client:

Stanford University