
Causal Inference and Missing Data Analysis
Saturday, Aug. 26, 2023




Renmin University of China
Title: Sparse Mediation Analysis with Unmeasured Mediator-Outcome Confounding
Abstract: Causal mediation analysis aims to investigate how an intermediary factor called mediator regulates the causal effect of a treatment on an outcome. With the increasing availability of measurements on a large number of potential mediators in various disciplines, methods for conducting mediation analysis with many or even high-dimensional mediators have been proposed. However, they often assume there is no unmeasured confounding between mediators and the outcome. This paper allows such confounding and provides an approach to address both identification and mediator selection problems under the structural equation modeling framework. The identification strategy involves constructing a pseudo proxy variable for unmeasured confounding based on a latent factor model for multiple mediators. Using this proxy variable, we then propose a partially penalized procedure to select important mediators which have nonzero effects on the outcome. The resultant estimates are consistent and the estimates of nonzero parameters are asymptotically normal. Simulation studies show advantageous performance of the proposed procedure over other existing methods. We finally apply our approach to genomic data and identify gene expressions that may actively mediate the effect of a genetic variant on mouse obesity.
Tsinghua University
Title: Design-Based Theory for Cluster Rerandomization
Abstract: Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with rerandomization in the design stage. We refer to cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors.
East China Normal University
Title: Semiparametric Inference for Nonignorable Missing Data by Catching Covariate Marginal Information
Abstract: Nonignorable missing data problems are challenging because of the parameter identifiability issue. Existing approaches to nonignorable missing data with possibly missing responses usually fail to or do not make full use of covariate marginal information and hence may suffer from efficiency loss. In addition to a logistic propensity score model, we assume a semiparametric proportional likelihood ratio model (SPLRM) for the completely observed data, which is as weak as possible. We find that the model parameters are identifiable in most cases and no instrument or shadow variable is needed. In the only exception where the SPLRM is a normal model, we conduct a sensitivity analysis by making full use of the marginal covariate marginal information. In the rest cases, we estimate the parameters in the SPLRM by their maximum likelihood estimators (MLEs), and then use the density-ratio-model-based empirical likelihood to catch the covariate distribution information and to estimate all the rest parameters. We show that the MLE for the target parameter is asymptotically normal and semiparametric efficient. Our numerical results indicate that compared with existing estimators, the proposed estimator is more reliable and more robust to model misspecification.
Peking University
Title: Introducing a Generalized Tetrad Constraint
Abstract: The tetrad constraint is widely used to test whether the associations between four observed variables can be attributed to a latent common cause. However, the classical tetrad constraint is designed typically for linear models and may fail to work in nonlinear and nonparametric settings. In this paper, we provide an extension of the classical tetrad constraint by relaxing the linearity assumption. We use the confounding bridge function to characterize the relationship between the observed variables and the latent cause, based on which we propose a generalized tetrad constraint for four observed variables to be conditional independent given the latent common cause. We then propose a test statistic based on the generalized tetrad constraint. We illustrate the proposed approach via simulations and a real data application.