
Statistical Foundations for Modern Machine Learning
Friday, Aug. 25, 2023




Shanghai Jiao Tong University
Title: Applying Modern Neural Networks in Statistical Problems: Semiparametrics, Causal Inference, and Differential Equations
Abstract: In this talk, we will survey some of our recent explorations of modern neural networks in problems spanning semiparametric statistics, causal inference, and forward & inverse problems in differential equations. We will discuss the practical performance, some limited theory, and when the theoretical results fall short of guiding practice. Time permitted, we will discuss some potential directions we are working on to narrow the theory-practice gap.
Renmin University of China
Title: Gaussian Mixture Reduction with Composite Transportation Divergence
Abstract: Gaussian mixtures are widely used as a parametric distribution for approximating smooth density function of various, simplifying downstream inference tasks. They find extensive applications in density estimation, belief propagation, and Bayesian filtering. These applications often utilize finite Gaussian mixtures as initial approximations that are recursively updated. A challenge in these recursions is that the order of the Gaussian mixture increases exponentially, and the inference quickly becomes intractable. To overcome the difficulty, the Gaussian mixture reduction, which approximates a high order Gaussian mixture by one with a lower order, can be used. Although existing clustering-based methods are known for their satisfactory performance and computational efficiency, their convergence properties and optimal targets remain unknown. In this work, we propose a novel optimization-based Gaussian mixture reduction method based on composite transportation divergence (CTD). We develop a majorization-minimization algorithm for numerically computing the reduced Gaussian mixture and establish its theoretical convergence under general conditions. Furthermore, we demonstrate that many existing clustering-based methods are special cases of our approach, effectively bridging the gap between optimization-based and clustering-based techniques. Our unified framework empowers users to select the most appropriate cost function in CTD to achieve superior performance in their specific applications. Through extensive empirical experiments, we demonstrate the efficiency and effectiveness of our proposed method, showcasing its potential in various domains.
Tsinghua University
Title: Towards a Mathematical Foundation of Federated Learning: A Statistical Perspective
Abstract: Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. The successful deployment faces many challenges in both theory and practice such as data heterogeneity and client unavailability. In this talk, I will discuss the convergence and statistical efficiency of two widely-adopted FL algorithms: FedAvg and FedProx, from a statistical perspective. Our analysis is based on the standard non-parametric regression in a reproducing kernel Hilbert space. Additionally, we propose the concept of federation gain to quantify the impact of heterogeneity. Time permitted, FL from the perspectives of clustering and robust statistics will be discussed.
Shanghai Jiao Tong University
Title: Supervised Topic Modeling: Optimal Estimation and Statistical Inference
Abstract: With the development of computer technology and the internet, increasingly large amounts of textual data are generated and collected every day. It is a significant challenge to analyze and extract meaningful and actionable information from vast amounts of unstructured textual data. Driven by applications in a wide range of fields, there is an increasing need for developing computationally efficient statistical methods for analyzing a massive amount of textual data with theoretical guarantees. In the presentation, I will discuss supervised topic modeling, which jointly considers a collection of documents and their paired side information. A bias-adjusted algorithm is developed to study the regression coefficients in the supervised topic modeling under the generalized linear model formulation. I will also introduce an approach to constructing valid confidence intervals. Applications of the proposed methods reveal meaningful latent topic structures of textual data.