Provably calibrating ML classifiers without distributional assumptions



Chirag Gupta (Carnegie Mellon University)

Chirag Gupta is a final year PhD student in the Machine Learning Department at Carnegie Mellon University, working on statistical machine learning. His thesis is on distribution-free uncertainty quantification for classification and regression problems.



Short Abstract: Most ML classifiers provide probability scores for the different classes. What do these scores mean? Probabilistic classifiers are said to be calibrated if the observed frequencies of labels match the claimed/reported probabilities. While calibration in the binary classification setting has been studied since the mid-1900s, there is less clarity on the right notion of calibration for multiclass classification. In this talk, I will present recent work where we investigate the relationship between commonly considered notions of multiclass calibration and the calibration algorithms used to achieve these notions. We will discuss our proposed notion of top-label calibration, and the general framework of multiclass-to-binary (M2B) calibration. We show that any M2B notion of calibration can be provably achieved, no matter how the data is distributed. I will present these calibration guarantees as well as experimental results on calibrating deep learning models. Our proposed algorithms beat existing algorithms in most situations. Code for this work is available at https://github.com/aigen/df-posthoc-calibration. Main paper: https://arxiv.org/abs/2107.08353 (ICLR 2022) Additional relevant papers: https://arxiv.org/abs/2105.04656 (ICML 2021), https://arxiv.org/abs/2006.10564 (Neurips 2020), https://arxiv.org/abs/2204.13087 (COLT 2022)