Recent Advances in Statistical Computation and Machine Learning

Friday, Aug. 25, 2023


Session: Recent Advances in Statistical Computation and Machine Learning
Time: 1:30 p.m. — 3:00 p.m.
Location: 华东师范大学普陀校区 文史楼201
Session Chair: Fang Fang, East China Normal University

Customizing Personal Large-Scale Language Model
Jialiang Li
National University of Singapore
Title: A Network Approach to Compute Hypervolume under ROC Manifold for Multi-Class Biomarkers
Abstract: Computation of hypervolume under ROC manifold (HUM) is necessary to evaluate biomarkers for their capability to discriminate among multiple disease types or diagnostic groups. However, the original definition of HUM involves multiple integration and thus a medical investigation for multi-class ROC analysis could suffer from huge computational cost when the formula is implemented naively. We introduce a novel graph-based approach to compute HUM efficiently in this paper. The computational method avoids the time-consuming multiple summation when sample size or the number of categories is large. We conduct extensive simulation studies to demonstrate the improvement of our method over existing R packages. We apply our method to two real biomedical data sets to illustrate its application.
Xuening Zhu
Fudan University
Title: Distributed Estimation and Inference for Spatial Autoregression Model with Large Scale Networks
Abstract: The rapid growth of online network platforms generates large-scale network data and it poses great challenges for statistical analysis using the spatial autoregression (SAR) model. In this work, we develop a novel distributed estimation and statistical inference framework for the SAR model on a distributed system. We first propose a distributed network least squares approximation (DNLSA) method. This enables us to obtain a one-step estimator by taking a weighted average of local estimators on each worker. Afterwards, a refined two-step estimation is designed to further reduce the estimation bias. For statistical inference, we utilize a random projection method to reduce the expensive communication cost. Theoretically, we show the consistency and asymptotic normality of both the one-step and two-step estimators. In addition, we provide theoretical guarantee of the distributed statistical inference procedure. The theoretical findings and computational advantages are validated by several numerical simulations implemented on the Spark system. Lastly, an experiment on the Yelp dataset further illustrates the usefulness of the proposed methodology.
Xiaozhou Wang
East China Normal University
Title: Some Recent Progress in Quantile Regression Analysis
Abstract: In this talk, we will introduce some research results on quantile regression and composite quantile regression models. The algorithms and theoretical properties are presented. Numerical studies are conducted to demonstrate the performance of the proposed methodologies.
Fang Fang
East China Normal University
Title: Kolmogorov-Smirnov Learning
Abstract: Kolmogorov-Smirnov (KS) statistic has been widely used in many areas to evaluate the performance of binary classification. However, almost no classification algorithm tries to optimize it directly at the training stage due to the computational and theoretical challenges brought by the special form of KS. In this talk, we will introduce our work about Kolmogorov-Smirnov learning which uses KS as the target function. Several methods will be discussed and their theoretical and empirical results will be presented.