IRT models for expert-coded panel data
Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross- national expert-coded panel data. In this article, we investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both real and simulated data. Specifically, we use expert-coded cross-national panel data from the V–Dem data set to both conduct real-data comparisons and inform ecologically-motivated simulation studies. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit di↵erential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data of latent concepts.
⇤Earlier drafts presented at the 2016 MPSA Annual Convention, the 2016 IPSA World Convention and the 2016 V–Dem Latent Variable Modeling Week Conference. The authors thank Chris Fariss, Pippa Norris, Jon Polk, Shawn Treier, Carolien van Ham and Laron Williams for their comments on earlier drafts of this paper, as well as V–Dem Project members for their suggestions and assistance. This material is based upon work supported by the National Science Foundation under Grant No. SES-1423944, PI: Daniel Pemstein, by Riksbankens Jubileumsfond, Grant M13-0559:1, PI: Sta↵an I. Lindberg, V–Dem Institute, University of Gothenburg, Sweden; by Swedish Research Council, 2013.0166, PI: Sta↵an I. Lindberg, V–Dem Institute, University of Gothenburg, Sweden and Jan Teorell, Department of Political Science, Lund University, Swe- den; by Knut and Alice Wallenberg Foundation to Wallenberg Academy Fellow Sta↵an I. Lindberg, V–Dem Institute, University of Gothenburg, Sweden; by University of Gothenburg, Grant E 2013/43; as well as by internal grants from the Vice-Chancellor’s o ce, the Dean of the College of Social Sciences, and the Department of Political Science at University of Gothenburg. We performed simulations and other compu- tational tasks using resources provided by the Notre Dame Center for Research Computing (CRC) through the High Performance Computing section and the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre in Sweden. We specifically acknowledge the assistance of In-Saeng Suh at CRC and Johan Raber at SNIC in facilitating our use of their respective systems.