Biostatistics Meets Machine Learning

Friday, Aug. 25, 2023


Session: Biostatistics Meets Machine Learning
Time: 3:30 p.m. — 5:00 p.m.
Location: 华东师范大学普陀校区 文史楼215
Session Chair: Tao Hu, Capital Normal University

Customizing Personal Large-Scale Language Model
Shuwei Li
Guangzhou University
Title: Factor-Augmented Transformation Models for Interval-Censored Failure Time Data
Abstract: Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This study proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates.We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative study is provided. An R package ICTransCFA is also available for practitioners.
Baosheng Liang
Peking University
Title: An Accumulative Delta-Radiomics Model Based on Cone-Beam CT for the Prediction of Radiation Pneumonitis in Thoracic Cancer Patients Treated with Radiotherapy
Abstract: This work aims to construct a cone-beam CT-based delta radiomics nomogram for individualized prediction of radiation pneumonitis (RP) in thoracic cancer patients treated with radiotherapy. Weekly CBCT images of 92 thoracic cancer patients treated with image-guided-radiotherapy were retrospectively included for this study. Images before the first treatment and the 1st, 2nd, 3rd week thereafter were collected. Novel accumulative CBCT delta-radiomics features (Delta-RFaccu) was proposed to stack the temporal changes of lung tissues over the treatment course. Significance tests of difference were performed to screen delta-radiomics features before the principal component analysis (PCA). The first principal component was taken as the signature of the corresponding Delta-RFi. The predicting performance of signature was evaluated and compared by univariate Logistic regression. Finally, a multivariate Logistic regression model for RP prediction was constructed by incorporating the best Delta-RFi signature and important dosimetric predictors. Calibration curves and decision curves showed accurate prediction and satisfactory clinical performance of the proposed comprehensive model.
Tao Sun
Renmin University of China
Title: Neural Network on Interval Censored Data with Application to the Prediction of Alzheimer’s Disease
Abstract: Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles.
Da Xu
Northeast Normal University
Title: Regression Analysis of Misclassified Current Status Data with Informative Observation Times
Abstract: Misclassified current status data arises if each study subject can only be observed once and the observation status is determined by a diagnostic test with imperfect sensitivity and specificity. For the situation, another issue that may occur is that the observation time may be correlated with the interested failure time, which is often referred to as informative censoring or observation times. It is well-known that in the presence of informative censoring, the analysis that ignores it could yield biased or even misleading results. In this paper, the authors consider such data and propose a frailty-based inference procedure. In particular, an EM algorithm based on Poisson latent variables is developed and the asymptotic properties of the resulting estimators are established. The numerical results show that the proposed method works well in practice and an application to a set of real data is provided.