Inverse probability weighted estimation for general missing data problems
Introduction
In this paper I extend earlier work on inverse probability weighted (IPW) M-estimation along several dimensions. One important extension is that I allow the selection probabilities to depend on selection predictors that are not fully observed. In Wooldridge (2002a), building on the framework of Robins and Rotnitzky (1995) for attrition in regression, I assumed that the variables determining selection were always observed and that the selection probabilities were estimated by binary response maximum likelihood. These assumptions exclude some interesting cases, including: (i) variable probability (VP) sampling with known retention frequencies; (ii) a censored response variable with varying censoring times, as in Koul et al. (1981); (iii) unobservability of a response variable due to censoring of a second variable, as in Lin (2000).
Extending previous results to allow more general selection mechanisms is fairly routine when interest lies in consistent estimation. My goal here is to expand the scope of a result that has appeared in a variety of settings with missing data: estimating the selection probabilities generally leads to a more efficient weighted estimator than if the known probabilities could be used. A few examples include Imbens (1992) for choice-based sampling, Robins and Rotnitzky (1995) for IPW estimation of nonlinear regression models, and Wooldridge (2002a) for general M-estimation under the Robins and Rotnitzky (1995) sampling scheme.
Having a unified setting where asymptotic efficiency is improved by using estimated selection probabilities has several advantages. First, knowing that an estimator produces narrower asymptotic confidence intervals has obvious benefits. Second, the proof of relative efficiency leads to a computationally simple estimator of the asymptotic variance for a broad class of estimation problems, including popular nonlinear models. For example, Koul et al. (1981) and Lin (2000) treat only the linear regression case, and the formulas are almost prohibitively complicated. A third benefit is that I expand the scope of models and estimation methods where one can obtain conservative inference by ignoring the first-stage estimation of the selection probabilities.
Another innovation in this paper is my treatment of exogenous selection when some feature of a conditional distribution is correctly specified. Namely, I study the properties of the IPW M-estimator when the selection probability model is possibly misspecified. Among other things, allowing misspecified selection probabilities in the exogenous selection case leads to key insights for more robust estimation of average treatment effects (ATEs).
The remainder of the paper is organized as follows. In Section 2, I briefly introduce the underlying population minimization problem. In Section 3, I describe the selection problem and propose a class of conditional likelihoods for estimating the selection probabilities; obtain the asymptotic variance of the IPW M-estimator; show that it is more efficient to use estimated probabilities than to use the known probabilities; and provide a simple estimator of the efficient asymptotic variance matrix. Section 4 covers the case of exogenous selection, allowing the selection probability model to be misspecified. In Section 5, I provide a general discussion of the considerations when deciding whether or not to use inverse-probability weighting. I cover three examples in Section 6: (i) estimating a conditional mean function when the response variable is missing due to a censored duration; (ii) estimating an ATE with a possibly misspecified conditional mean function; and (iii) VP sampling with observed retention frequencies.
Section snippets
The population optimization problem and random sampling
The starting point is a population optimization problem, which essentially defines the parameters of interest. Let be an random vector taking values in . Some aspect of the distribution of depends on a parameter vector, , contained in a parameter space . Let denote an objective function. Assumption 2.1 is the unique solution to the population minimization problem
Often, indexes some correctly specified feature of the distribution of , usually a feature of a
Non random sampling and inverse probability weighting
As in Wooldridge (2002a), I characterize nonrandom sampling through a selection indicator. For any random draw from the population, we also draw , a binary indicator equal to unity if observation i is used in the estimation, and zero otherwise. Typically we have in mind that all or part of is not observed if . We are interested in estimating , the solution to (2.1).
One possibility for estimating is to use M-estimation on the observed sample. That is, we solve
Estimation under exogenous selection
It is well known that certain kinds of sample selection do not cause bias in standard, unweighted estimators. I covered the VP sampling case in Wooldridge (1999) and considered more general kinds of exogenous selection in Wooldridge (2002a). Nevertheless, in both cases I defined exogenous selection to be selection on x in the context of estimating some feature of a conditional distribution, . Here, I consider a more general notion of exogenous selection.
In earlier work I assumed that the
When should we use a weighted estimator?
We can use the results in 3 Non random sampling and inverse probability weighting, 4 Estimation under exogenous selection to discuss when weighting is desirable, and when it may be undesirable. If features of an unconditional distribution, say , are of interest, unweighted estimators consistently estimate the parameters only if —that is, the data are “missing completely at random” (Rubin, 1976). Of course, consistency of the weighted estimator relies on the presence of z such
Missing data due to censored durations
Let y be a univariate response and x a vector of conditioning variables, and suppose we are interested in estimating . A random draw i from the population is denoted (, ). Let be a duration and let denote a censoring time. (The case is allowed here.) Assume that (, ) is observed whenever , so that . Under the assumption that is independent of (, ),where . In order to use inverse probability weighting, we
Summary
This paper unifies the current literature on inverse probability weighted estimation by allowing for a fairly general class of conditional maximum likelihood estimators of the selection probabilities. The cases covered are as diverse as variable probability sampling, treatment effect estimation, and selection due to censoring. While each of these has been studied in special cases—often linear regression—the framework here allows for nonlinear models and a variety of estimation methods. In all
Acknowledgments
Two anonymous referees, an associate editor, a coeditor, Artem Prokhorov, Peter Schmidt, and numerous seminar participants provided comments that greatly improved this work.
References (28)
- et al.
Quantile regression under random censoring
Journal of Econometrics
(2002) A method of moments interpretation of sequential estimators
Economics Letters
(1984)- et al.
Large sample estimation and hypothesis testing
Advanced Econometrics
(1985)- et al.
Linear regression with censored data
Biometrika
(1979) - et al.
Coarsening at random: characterizations, conjectures, and counter-examples
- et al.
Pseudo-maximum likelihood methods: theory
Econometrica
(1984) - Heckman, J.J., 1976. The common structure of statistical models of truncation, sample selection, and limited dependent...
- et al.
Ignorability and coarse data
Annals of Statistics
(1991) - et al.
Efficient estimation of average treatment effects using the estimated propensity score
Econometrica
(2003)
A generalization of sampling without replacement from a finite universe
Journal of the American Statistical Association
An efficient method of moments estimator for discrete choice models with choice-based sampling
Econometrica
Regression analysis with randomly right-censored data
Annals of Statistics
Cited by (680)
Impact of access to cash remittances on cocoa yield in Southwestern Nigeria
2024, Sustainable FuturesThe distributional effects of labour market deregulation: Wage share and fixed-term contracts
2024, Structural Change and Economic DynamicsSubgroup detection based on partially linear additive individualized model with missing data in response
2024, Computational Statistics and Data AnalysisNonparametric augmented probability weighting with sparsity
2024, Computational Statistics and Data Analysis