This is the MRP hex sticker. It is based off a Mondrian painting known as Boogie Woogie

CHANGE TO ONLINE CONFERENCE On March 8th following a change in Columbia’s new COVID-19 policies, we decided we will be conducting the scheduled MRP conference via Zoom rather than an in person meeting. Please accept our apologies for the inconvenience this causes.

We are excited to present two full days of invited and contributed talks and discussions. The conference will be fully online. The following times are in EDT.

Day 1 (Friday)

8:30 - 9 am Event will be open from 8:30 to allow for attendees to ensure they can reach the meeting and hear etc.

9 am Conference welcome

9:15 am Invited Talk 1
Elizabeth Tipton RCT Designs for Causal Generalization

10 am: Contributed Talk 1
Benjamin Skinner Why did you go? Using multilevel regression with poststratification to understand why community colleges students exit early.

10: 20 am Coffee and morning tea break

10: 50 am Invited Talk 2:
Jon Zelner From person-to-person transmission events to population-level risks: MRP as a tool for maximizing the public health benefit of infectious disease data

11:35 am Contributed Talk 2
Katherine Li Multilevel Regression and Poststratification with Unknown Population Distributions of Poststratifiers

11:55 am LUNCH break

1 pm Invited Talk 3
Qixuan Chen Use of administrative records to improve survey inference: a response propensity prediction approach

1:45 pm

Lauren Kennedy and Andrew Gelman 10 things to love and hate about MRP

2:15 pm Birds of a Feather Session (seperate meetings, TBA)

3 pm Approximate end time

Day 2 (Saturday)

9:25 am Contributed Talk 4 Shiro Kuriwaki and Soichiro YamauchiPartial Pooling with Weights: Parallels and Differences between MRP and Weighting

9:45 am Contributed Talk 5 Roberto Cerina Micro Learning from Synthetic Data

10:05 am Contributed Talk 6:
Douglas Rivers Modeling elections with multiple candidates

10:25 am Coffee and morning tea break

11 am Invited Talk 5 Yajuan Si Statistical Data Integration and Inference with Multilevel Regression and Poststratification

11: 45 am Contributed Talk 7 Yutao Liu (Model-based prediction using auxiliary information)

12:05 pm LUNCH break

1:05 pm Invited Talk 6 Samantha Sekar Evaluating the construct validity of MRP climate change opinion predictions

1:50 pm Contributed Talk 8
Chris Hanretty Hierarchical related regression for individual and aggregate electoral data

2:10 pm Contributed Talk 9 Lucas Leemann *Improved Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)

2:30 pm Afternoon tea and coffee break

3:05 pm Invited Talk 7
Leontine Alkema Got data? Quantifying the contribution of population-period-specific information to model-based estimates in demography and global health

3:50 pm Contributed Talk 10:
Jonathan Gellar Are SMS (text message) surveys a viable form of data collection in Africa and Asia?

4:10 pm Contributed Talk 11:
Charles Margossian Laplace approximation for speeding computation of multilevel models

4:30 pm Andrew Gelman and Lauren Kennedy
Final comments and questions

Talks/events that were scheduled or accepted that were not possible due to challenges surrounding the Covid-19 pandemic.

Rohan Alexander Looks or brains? Choosing the optimal set of co-variates to satisfy competing priorities in multi-level regression with post-stratification

Jeffrey Lax Applied MRP in Political Science: Tips, Tricks, and Headaches

Christopher Skovron Assessing Differential Non-Response Over Time in Online Panel Surveys

Discussants: Jennifer Hill and Shira Mitchell

We thank you for your contributions!


Douglas Rivers

Stanford University and YouGov
MRP has been very successful in forecasting election outcomes, but the typical textbook examples (with only two choices) are too simple for real applications. For example, the number of candidates is usually more than two and other alternatives (such as being undecided or not voting) are possible. I describe prior choices and estimation for multinomial logit models of electoral choice, with applications to recent U.S. and U.K. elections.

Yutao Liu

Yutao Liu, Andrew Gelman & Qixuan Chen
Columbia University
Survey inference can be challenged by non-representativeness of survey samples, either imperfect probability samples or non-probability samples without a probability sampling design. We consider improving survey inference with a potentially non-representative sample in the presence of high-dimensional auxiliary information, which is measured in the survey sample and also available about the population via such as census or administrative records. We propose Bayesian model-based predictive methods for estimating finite population totals by modeling the conditional distribution of the survey outcome using Bayesian additive regression trees (BART) and soft BART. Besides the auxiliary variables, inspired by Little and An (2004), we also explore modified methods that estimates the propensity score for a unit to be included in the sample using probit BART and also include it as a covariate in the model. We show through simulation studies and a real survey that the Bayesian model-based methods using (soft) BART improve survey inference.

Improved Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)

Lucas Leemann, Philipp Broniecki & Reto Wuest
University of Zürich, Essex University, University of Bergen
Multilevel regression with post-stratification (MrP) has quickly become the gold standard for small area estimation. While the first MrP models did not include context level information, current applications almost always make use of such data. When using MrP, researchers are faced with three problems: how to select features, how to specify the functional form; and how to regularize the model parameters. These problems are especially important with regard to features included at the context level. We propose a systematic approach to estimating MrP models that addresses these issues by employing a number of machine learning techniques. We illustrate our approach based on 89 items from public opinion surveys in the US and demonstrate that our approach outperforms a standard MrP model, in which the choice of context- level variables has been informed by a rich tradition of public opinion research.

Looks or brains? Choosing the optimal set of co-variates to satisfy competing priorities in multi-level regression with post-stratification

Monica Alexander & Rohan Alexander
University of Toronto
At the heart of multi-level regression with post-stratification (MRP) is a tension between the desires of the model and the needs of the post-stratification dataset. The model wants more coefficients. But adding more coefficients to an MRP model not only requires considering the usual bias-variance trade-off, but also the effect on the post-stratification dataset. The cell counts in the post-stratification dataset are subject to uncertainty and adding an additional coefficient to the model usually requires making the cells smaller, hence increasing the effect of that uncertainty. In this paper we examine the trade-offs between these two competing effects. Using simulated data, we explore the relationship between increasing model accuracy and decreasing post-stratification precision as additional covariates are added, and suggest a general approach to find an optimal balance between the two. We then illustrate these ideas on real data from Australia and the UK.”

Chris Hanretty
Royal Holloway, University of London
Ordinarily MRP is used to estimate, at the local level, a quantity which is otherwise unknown (for example: state opinion on the death penalty). In this presentation, I look at the benefits of running MRP ““in reverse””, using known aggregate outcomes to sharpen our inferences about individual level predictors of opinion. I estimate a hierarchical related regression (HRR; Jackson et al. 2008) model which combines individual level data from a national election study together with known election outcomes at the level of the electoral district. I use the model to draw inferences about the behaviour of ethnic minority voters in the 2010 UK general election. I discuss the advantages of HRR in making more precise our estimates about the association between belonging to an ethnic minority and voting. I also discuss the computational issues involving in estimating a joint individual and aggregate model.

Roberto Cerina

Roberto Cerina, Gianluca Baio, Stephen Fisher, Raymond Duch
Nuffield College -Oxford University, Department of Statistical Sciences - University College London, Trinity College - Oxford University, Nuffield College - Oxford University
During the 2019 UK General Election, a number of prominent pollsters released highly accurate MRP projections, with Survation’s model correctly calling 94.3% of constituencies. These efforts rely on large high-quality individual-level samples (typically in the neighbourhood of 100,000 respondents), which imply huge infrastructure and sampling costs. In this paper, we leverage machine-learning techniques to offer an attractive alternative which relies exclusively on publicly available data.
First, we leverage random forests and multiple imputation with chained equations to augment census-microdata with political variables from the British Election Study. Second, we reconstruct individual-level data for each publicly available opinion poll released prior to the election, by sampling from the augmented microdata and raking to the poll’s cross-tabs. Third, we train random forests with the synthetic data, tuning hyper-parameters to avoid attenuation-bias. Finally we apply post-stratification to obtain high-quality constituency-level estimates, benchmarking against actual election outcomes.”

Charles Margossian

Department of Statistics, Columbia University
MRP models, like most hierarchical models, are powerful tools to describe the world but often frustrate our inference algorithms. This is due to the geometry of the posterior induced by these models, and how this geometry interacts with Markov chains Monte Carlo samplers. Much of our geometric grief comes from high-dimensional group level parameters. Fortunately, we can marginalize out these parameters using a nested Laplace approximation, which leads to fast and reliable inference. In this talk, I present a prototype in Stan whereby I couple the Laplace approximation with Hamiltonian Monte Carlo sampling. The end product allows us to do fast inference on high-dimensional structured group parameters, and use robust HMC for the hyper-parameters, which we can regularize with sophisticated priors.

Why did you go? Using multilevel regression with poststratification to understand why community colleges students exit early

Justin Ortagus, Benjamin Skinner & Melvin Tanner
University of Florida
Although graduating from a community college can be economically rewarding, most students leave without earning a degree, including many who have made substantial academic progress. Because students in this latter group are more likely to graduate should they return, many community colleges want to target them in re-enrollment campaigns. Yet schools often have limited understanding of the factors most critical to early exit and, subsequently, how best to support students’ return. To help fill this information gap, we partnered with five high-enrollment Florida community colleges to survey over 27,000 former students with substantial credits. We asked the students to choose from short lists of various factors that might have contributed to their premature departure. Due to a response rate of less than 10%, we use MRP to reweight the responses so that they better represent the original population of students who were sent the survey.

Assessing Differential Non-Response Over Time in Online Panel Surveys

Drew Linzer & Christopher Skovron
Civiqs, Civiqs and Northwestern Institute on Complex Systems
It is widely known that potential survey-takers respond at different rates based on a variety of personal and contextual factors. These can include demographic characteristics such as age, education, and socio-economic status; as well as more idiosyncratic, personal tendencies including strength of partisan identification, or political knowledge and interest in current affairs. Fortunately for survey researchers, many of these factors are fairly straightforward to measure and correct for when producing a sampling design. However, these approaches can break down when the correlates of self-selection interact with significant news and events of the day – often in unpredictable ways, with uncertain consequences. In this paper, we analyze five years of daily online survey response data to assess the magnitude of differential survey non-response, its major predictors, and the characteristics of those most prone to select into – or out of – polls, and under what circumstances. The data, collected using the Civiqs online survey panel, provides granular insight at both the demographic and temporal level of analysis. We show which recent political events led to changes in non-response rates, among which groups, and by how much. We assess the implications for analysts using dynamic over-time MRP methods and suggest approaches to mitigating the effect of differential non-response on results. “

Are SMS (text message) surveys a viable form of data collection in Africa and Asia?

Jonathan Gellar
With the rapid expansion of mobile phone use, Short Message Service (SMS) presents an opportunity to conduct inexpensive, fast and scalable surveys. However, the use of these surveys to obtain nationally representative estimates suffers from two limitations: they are not representative of the target population, and they may contain other intrinsic biases that are induced by the survey mode. Multilevel regression with poststratification (MRP) can solve for the former problem, but the intrinsic biases still remain. We introduce a procedure to calibrate MRP results using a relatively small sample of face-to-face data that is known (or assumed) to be unbiased. We apply this method to the problem of estimating financial inclusion (access to formal banking systems) in eight countries in Africa and Asia. Our calibrated MRP approach is effective in replicating estimates from a larger and much more expensive gold standard survey.

Multilevel Regression and Poststratification with Unknown Population Distributions of Poststratifiers

Katherine Li and Yajuan Si
Multilevel regression and poststratification stabilizes small area estimation via hierarchical model smoothing and adjusts for the sample nonrepresentatitiveness by poststratifying to the population information. The utility of MRP is thus limited by the number of poststratifiers for which we have the population distribution. Incorporating poststratifiers whose population distributions are not available may improve precision and reduce bias. We propose flexible, nonparametric methods to impute a poststratifier’s values for unsampled units and integrate the imputation uncertainty with the finite population inference of interest under a systematic Bayesian framework. We present results from simulation studies to illustrate the bias and efficiency gains of imputing the unsampled poststratifiers under MRP compared to alternative approaches.

Got data? Quantifying the contribution of population-period-specific information to model-based estimates in demography and global health

Leontine Alkema

Multilevel models are commonly used to produce estimates for demographic and health indicators for populations with limited data. We aim to provide a standardized approach to answer the question: To what extent is a model-based estimate of an indicator of interest informed by data for the relevant population-period as opposed to information supplied by other periods and populations and model assumptions? We propose a data weight measure to calculate the weight associated with population-period data y relative to the model-based prior estimate obtained by fitting the model to all data excluding y. In addition, we propose a data-model accordance measure which quantifies how extreme the population-period data are relative to the prior model-based prediction. Based on the combination of the two measures, we classify a model-based estimate into one of the following groups: (1) interesting case study, when estimate is data-driven and model-based prior and data have low accordance, (2) data driving the estimate, estimate is expected under the model (3) data are limited but there is no strong evidence that data and model are in conflict and finally, (4) more data are needed, when estimate is model-driven and accordance is low. We illustrate the insights obtained from the combination of both measures in the estimation of modern contraceptive use for 69 low-income countries. This is joint work with Guandong (Elliot) Yang and Krista Gile.

Partial Pooling with Weights: Parallels and Differences between MRP and Weighting

Shiro Kuriwaki and Soichiro Yamauchi

Multilevel regression and poststratification (MRP) borrows information across subpopulations and post-stratify those estimates to population distributions. However, MRP models are outcome-specific, and the assumptions for MRP to be unbiased are not well understood in practice. Here we propose a new weighting estimator drawing from both MRP and causal inference frameworks that is simpler and can perform just as well. Our approach contains a synthetic component that leverages the out-of-area sample information by weighting them to resemble the in-area distribution. Our approach therefore pools observations (as in MRP), while generating post-stratification survey weights that can be applied to any outcome in the survey (as in traditional weighting). Moreover, the method can incorporate the full set of variables in the survey (unlike both MRP and traditional weighting). In the process, we identify two theoretical assumptions for our weighting estimator to be consistent for small areas. We validate our method by estimating turnout at the congressional district level using the Cooperative Congressional Election Study, and provide a R package to implement our method.