Are These Datasets The Same? Learning Kernels for Efficient and Fair Two-sample Tests



Danica Sutherland (University of British Columbia)

Danica Sutherland is an Assistant Professor in Computer Science at the University of British Columbia, Vancouver, and holds a Canada CIFAR AI Chair at Amii. She received her PhD in 2016 from Carnegie Mellon University, and was previously research faculty at TTI-Chicago and a postdoc at UCL. Her research broadly focuses on representation learning, particularly in the form of learning “deep kernels,” especially on problems involving understanding the differences between probability distributions.



Short Abstract: Two-sample testing asks whether two datasets are meaningfully different, or whether their differences could be reasonably attributed to simple chance. This is a vital question not only for, say, telling whether a treatment group differs from a control, but also for asking whether production data is meaningfully different from training data. We will overview a line of work for tests that answer this question based on deep learning of a representation for use in a kernel test, with particular focus on recent progress in effective testing with small sample sizes and on learning fair representations with and for testing.