On the Predictive Power of Graph Neural Networks



Weihua Hu (Stanford University)

Weihua Hu is a Ph.D. student of Computer Science at Stanford University, advised by Jure Leskovec. His research interests lie in graph representation learning, especially focusing on graph neural networks, with applications to drug discovery, social networks, and recommender systems. He is supported by Funai Overseas Scholarship and Masason Foundation Fellowship. Before joining Stanford, Weihua received his Bachelor's and Master’s degrees from the University of Tokyo.



Short Abstract: Graph Neural Networks (GNNs) are a class of deep learning models for making predictions on graph-structured data. Many different GNN models have been proposed to achieve promising empirical performance. However, their architectural designs were ad-hoc, and their theoretical understanding remained limited. Moreover, these models were developed on small graph benchmark datasets, which altogether limit the development of powerful GNNs for real-world prediction tasks over graphs.
In this talk, I aim to build powerful predictive GNNs by understanding, improving, and benchmarking the predictive power of GNNs---the ability of GNNs to make accurate predictions over graphs. This talk consists of three parts. In Part I, I present a theoretical framework for understanding the predictive power of GNNs. I specifically focus on the expressive power, asking whether GNNs can express desired functions over graphs. I use the theoretical framework to provide insight into whether a given GNN is powerful enough to model the ground-truth target function that underlies the data. I also propose a maximally-expressive GNN model that can provably model most functions over graphs. Equipped with the framework to design expressive GNN models, in Part II, I move on to improve their predictive power on unseen/unlabeled data, i.e., improve the generalization power of GNNs. Motivated by real-world applications, I develop methods for improving the generalization power of GNNs under two common limited data scenarios: limited labeled data and limited edge connectivity. Finally, in Part III, I introduce new graph benchmark datasets to resolve the issues with the existing benchmarks and to engage the community toward improving the predictive power of GNNs. I present the Open Graph Benchmark (OGB) and OGB-LSC, a collection of challenging, realistic, and large-scale benchmark datasets for machine learning on graphs. I discuss the impact our benchmarks have had in advancing the predictive power of GNNs and conclude with future challenges of applying GNNs to real-world prediction tasks.