PUBLICATION • Beijer Discussion Paper
Speech-based measurement of polarization using text-as-data
Political polarization research depends on valid measurement of policy positions, yet established data sources—roll-call votes, elite surveys, ex- pert judgments—are limited in temporal coverage and substantive scope. Parliamentary speech offers an alternative: abundant, publicly available text that reflects real-time political positioning across all policy debates. However, whether speech-based measures recover the same latent con- structs as survey instruments remains an open empirical question, partic- ularly given concerns about systematic reduction in observed variance in LLM-based measurement. This article validates speech-based measure- ment of policy positions and polarization using large language models (LLMs) to classify parliamentary speech from Swedish MPs (1998–2022) into survey response categories from Riksdagsundersökningen, a biennial elite survey. Employing retrieval-augmented generation (RAG) and multi-model validation, we assess construct validity through correlation analysis, rank-order agreement, and comparison to manual annotation. Results demonstrate strong construct validity: speech-based party–year– topic means achieve Spearman correlations exceeding ρ = 0.85 with sur- vey benchmarks, correctly rank-order parties on most policy dimensions (Kendall’s τ >0.78), and reproduce temporal trends in polarization. How- ever, systematic positive bias (speech-based positions are more extreme than survey responses) and issue-specific performance variation indicate that speech and surveys measure overlapping but distinct constructs— public signaling versus private attitudes. This divergence reflects substantive political dynamics (party discipline, strategic ambiguity, audience targeting) rather than measurement failure. Speech-based polarization indices (Dalton index, bloc distance) closely track survey-based estimates, capturing known features of Swedish party competition including left-right bloc structure and issue-specific temporal dynamics. The findings demonstrate that parliamentary speech can serve as a valid data source for measuring policy positions when appropriately validated, expanding methodological capacity for historical and comparative research where survey data are unavailable. We distinguish measurement validity (correlation with external criteria) from construct equivalence (identical latent concepts), clarifying that method divergence may reflect what is measured rather than how well it is measured.
Engström, G., S. Axelsson, J. Gars, S. Källman, and T. Lindahl. 2026. Beijer Discussion Paper 285: Speech-based measurement of polarization using text-as-data. Beijer Discussion Paper Series.
DOWNLOAD PDF