Rigorous evaluation of machine learning models



Olivia Wiles (DeepMind)

Olivia Wiles is a Senior Researcher at DeepMind working on robustness in machine learning, focussing on how to detect and mitigate failures arising from spurious correlation and distribution shift. Prior to this, she was a PhD student at Oxford with Andrew Zisserman studying self-supervised representations for 3D and spent a summer at FAIR working on view synthesis with Justin Johnson, Georgia Gkioxari and Rick Szeliski.



Short Abstract: Despite achieving super-human accuracy on benchmarks like ImageNet, machine learning models are still susceptible to a number of issues leading to poor performance in the real world. For example, models are prone to shortcut learning and use spurious correlations, leading to poor performance under distribution shift. I will present two works we have done to expose the fragility of machine learning models. The first work introduces a framework to define different types of distribution shift and evaluates how methods degrade under varying amounts and types of distribution shift. Then we demonstrate how we can go beyond requiring specific datasets to investigate shifts. Instead, we surface human interpretable failures in vision models automatically in an open-ended manner. These works are steps along the path to building comprehensive evaluation tools for reliable AI.