Provable Domain Generalization via Invariant-Feature Subspace Recovery



Han Zhao (University of Illinois Urbana-Champaign)

Han Zhao is a tenure-track assistant professor in the Department of Computer Science, also affiliated with the Department of Electrical and Computer Engineering, at the University of Illinois Urbana-Champaign. Prior to joining UIUC, he was a machine learning researcher at the D. E. Shaw Group. He received his Ph.D. degree in Computer Science from the Machine Learning Department at Carnegie Mellon University. He works in the field of machine learning and artificial intelligence, with a focus on trustworthy machine learning, including domain generalization, algorithmic fairness, adversarial robustness and multi-task and meta-learning.



Short Abstract: Domain generalization asks for models trained over a set of training environments to perform well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) has been proposed for domain generalization. However, it has been shown that (Rosenfeld et al. 2021) IRM and its extensions cannot generalize to unseen environments with less than d_s+1 training environments, where d_s is the dimension of the spurious-feature subspace. In this talk, I will present our recent work that proposes to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with d_s+1 training environments. Our second algorithm, ISR-Cov, further reduces the required number of training environments to O(1) by using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Then, I will also talk about extensions of our algorithm to the general multi-class classification and regression settings as well. Empirically, we show that ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts. Our code is publicly available at https://github.com/Haoxiang-Wang/ISR.