Understanding Pre-Training, Fine-Tuning, and Self-Training for Unsupervised Domain Adaptation



Ananya Kumar (Stanford University)

Ananya Kumar is a 4th year PhD student at Stanford University advised by Percy Liang and Tengyu Ma.



Short Abstract: When ML systems are deployed, they often face test examples that are very different from training. We examine three powerful ways to build ML models that are robust to these distribution shifts. (1) Given a good pre-trained model, how should we fine-tune it for a downstream application? We explain why the standard approach of fine-tuning all model parameters can distort pretrained representations and perform poorly out of distribution. Our theory leads to practical insights and a simple fix for the problem. (2) Next, we show in controlled settings that contrastive pre-training is competitive with targeted unsupervised domain adaptation methods like DANN and SENTRY. However, we find that contrastive pre-training does not learn domain-invariant features, diverging from conventional UDA intuitions. We show theoretically that contrastive pre-training can learn features that vary substantially across domains but still generalize to the target domain, by disentangling domain and class information. (3) Time permitting, we explain why self-training is an effective tool for dealing with gradual shifts, spurious correlations, and accuracy tradeoffs across domains.