Elsevier

Journal of Econometrics

Volume 141, Issue 2, December 2007, Pages 1281-1301
Journal of Econometrics

Inverse probability weighted estimation for general missing data problems

https://doi.org/10.1016/j.jeconom.2007.02.002Get rights and content

Abstract

I study inverse probability weighted M-estimation under a general missing data scheme. Examples include M-estimation with missing data due to a censored survival time, propensity score estimation of the average treatment effect in the linear exponential family, and variable probability sampling with observed retention frequencies. I extend an important result known to hold in special cases: estimating the selection probabilities is generally more efficient than if the known selection probabilities could be used in estimation. For the treatment effect case, the setup allows a general characterization of a “double robustness” result due to Scharfstein et al. [1999. Rejoinder. Journal of the American Statistical Association 94, 1135–1146].

Introduction

In this paper I extend earlier work on inverse probability weighted (IPW) M-estimation along several dimensions. One important extension is that I allow the selection probabilities to depend on selection predictors that are not fully observed. In Wooldridge (2002a), building on the framework of Robins and Rotnitzky (1995) for attrition in regression, I assumed that the variables determining selection were always observed and that the selection probabilities were estimated by binary response maximum likelihood. These assumptions exclude some interesting cases, including: (i) variable probability (VP) sampling with known retention frequencies; (ii) a censored response variable with varying censoring times, as in Koul et al. (1981); (iii) unobservability of a response variable due to censoring of a second variable, as in Lin (2000).

Extending previous results to allow more general selection mechanisms is fairly routine when interest lies in consistent estimation. My goal here is to expand the scope of a result that has appeared in a variety of settings with missing data: estimating the selection probabilities generally leads to a more efficient weighted estimator than if the known probabilities could be used. A few examples include Imbens (1992) for choice-based sampling, Robins and Rotnitzky (1995) for IPW estimation of nonlinear regression models, and Wooldridge (2002a) for general M-estimation under the Robins and Rotnitzky (1995) sampling scheme.

Having a unified setting where asymptotic efficiency is improved by using estimated selection probabilities has several advantages. First, knowing that an estimator produces narrower asymptotic confidence intervals has obvious benefits. Second, the proof of relative efficiency leads to a computationally simple estimator of the asymptotic variance for a broad class of estimation problems, including popular nonlinear models. For example, Koul et al. (1981) and Lin (2000) treat only the linear regression case, and the formulas are almost prohibitively complicated. A third benefit is that I expand the scope of models and estimation methods where one can obtain conservative inference by ignoring the first-stage estimation of the selection probabilities.

Another innovation in this paper is my treatment of exogenous selection when some feature of a conditional distribution is correctly specified. Namely, I study the properties of the IPW M-estimator when the selection probability model is possibly misspecified. Among other things, allowing misspecified selection probabilities in the exogenous selection case leads to key insights for more robust estimation of average treatment effects (ATEs).

The remainder of the paper is organized as follows. In Section 2, I briefly introduce the underlying population minimization problem. In Section 3, I describe the selection problem and propose a class of conditional likelihoods for estimating the selection probabilities; obtain the asymptotic variance of the IPW M-estimator; show that it is more efficient to use estimated probabilities than to use the known probabilities; and provide a simple estimator of the efficient asymptotic variance matrix. Section 4 covers the case of exogenous selection, allowing the selection probability model to be misspecified. In Section 5, I provide a general discussion of the considerations when deciding whether or not to use inverse-probability weighting. I cover three examples in Section 6: (i) estimating a conditional mean function when the response variable is missing due to a censored duration; (ii) estimating an ATE with a possibly misspecified conditional mean function; and (iii) VP sampling with observed retention frequencies.

Section snippets

The population optimization problem and random sampling

The starting point is a population optimization problem, which essentially defines the parameters of interest. Let w be an M×1 random vector taking values in WRM. Some aspect of the distribution of w depends on a P×1 parameter vector, θ, contained in a parameter space ΘRP. Let q(w,θ) denote an objective function.

Assumption 2.1

θo is the unique solution to the population minimization problemminθΘE[q(w,θ)].

Often, θo indexes some correctly specified feature of the distribution of w, usually a feature of a

Non random sampling and inverse probability weighting

As in Wooldridge (2002a), I characterize nonrandom sampling through a selection indicator. For any random draw wi from the population, we also draw si, a binary indicator equal to unity if observation i is used in the estimation, and zero otherwise. Typically we have in mind that all or part of wi is not observed if si=0. We are interested in estimating θo, the solution to (2.1).

One possibility for estimating θo is to use M-estimation on the observed sample. That is, we solveminθΘN-1i=1Nsiq(wi

Estimation under exogenous selection

It is well known that certain kinds of sample selection do not cause bias in standard, unweighted estimators. I covered the VP sampling case in Wooldridge (1999) and considered more general kinds of exogenous selection in Wooldridge (2002a). Nevertheless, in both cases I defined exogenous selection to be selection on x in the context of estimating some feature of a conditional distribution, D(y|x). Here, I consider a more general notion of exogenous selection.

In earlier work I assumed that the

When should we use a weighted estimator?

We can use the results in 3 Non random sampling and inverse probability weighting, 4 Estimation under exogenous selection to discuss when weighting is desirable, and when it may be undesirable. If features of an unconditional distribution, say D(w), are of interest, unweighted estimators consistently estimate the parameters only if P(s=1|w)=P(s=1)—that is, the data are “missing completely at random” (Rubin, 1976). Of course, consistency of the weighted estimator relies on the presence of z such

Missing data due to censored durations

Let y be a univariate response and x a vector of conditioning variables, and suppose we are interested in estimating E(y|x). A random draw i from the population is denoted (xi, yi). Let ti>0 be a duration and let ci>0 denote a censoring time. (The case ti=yi is allowed here.) Assume that (xi, yi) is observed whenever tici, so that si=1(tici). Under the assumption that ci is independent of (xi, yi,ti),P(si=1|xi,yi,ti)=G(ti),where G(t)P(cit). In order to use inverse probability weighting, we

Summary

This paper unifies the current literature on inverse probability weighted estimation by allowing for a fairly general class of conditional maximum likelihood estimators of the selection probabilities. The cases covered are as diverse as variable probability sampling, treatment effect estimation, and selection due to censoring. While each of these has been studied in special cases—often linear regression—the framework here allows for nonlinear models and a variety of estimation methods. In all

Acknowledgments

Two anonymous referees, an associate editor, a coeditor, Artem Prokhorov, Peter Schmidt, and numerous seminar participants provided comments that greatly improved this work.

References (28)

  • B. Honoré et al.

    Quantile regression under random censoring

    Journal of Econometrics

    (2002)
  • W.K. Newey

    A method of moments interpretation of sequential estimators

    Economics Letters

    (1984)
  • W.K. Newey et al.

    Large sample estimation and hypothesis testing

  • T. Amemiya

    Advanced Econometrics

    (1985)
  • J. Buckley et al.

    Linear regression with censored data

    Biometrika

    (1979)
  • R.D. Gill et al.

    Coarsening at random: characterizations, conjectures, and counter-examples

  • C.A. Gourieroux et al.

    Pseudo-maximum likelihood methods: theory

    Econometrica

    (1984)
  • Heckman, J.J., 1976. The common structure of statistical models of truncation, sample selection, and limited dependent...
  • D.F. Heitjan et al.

    Ignorability and coarse data

    Annals of Statistics

    (1991)
  • K. Hirano et al.

    Efficient estimation of average treatment effects using the estimated propensity score

    Econometrica

    (2003)
  • D.G. Horvitz et al.

    A generalization of sampling without replacement from a finite universe

    Journal of the American Statistical Association

    (1952)
  • G.W. Imbens

    An efficient method of moments estimator for discrete choice models with choice-based sampling

    Econometrica

    (1992)
  • H. Koul et al.

    Regression analysis with randomly right-censored data

    Annals of Statistics

    (1981)
  • Lancaster, T., 1990. The Econometric Analysis of Transition Data. Cambridge University Press,...
  • Cited by (680)

    • Nonparametric augmented probability weighting with sparsity

      2024, Computational Statistics and Data Analysis
    View all citing articles on Scopus
    View full text