Ομιλία Γιάννη Μιτλιάγκα - Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Τίτλος: Neural Networks Efficiently Learn Low-Dimensional Representations with SGD. Mostly from ICLR 2023 paper: https://arxiv.org/abs/2209.14863 

Μέρα/Ώρα: Παρασκευή 8/3, ώρα 10πμ

Τόπος: Τ105 στο κτήριο της Τροίας

Περίληψη: We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $x \in R^d$ is Gaussian and the target $y \in R$ follows a multiple-index model with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensionalprincipal subspace spanned by the vectors u_1, ..., u_k of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k < d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(<u,x>) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.

Σύντομο Βιογραφικό: Ioannis works as an associate professor in the department of Computer Science and Operations Research (DIRO) at the University of Montréal. He is also a core member of Mila, Canadian CIFAR AI chair holder and part-time staff research scientist at Google DeepMind Montreal. Previously, he was a postdoctoral scholar with the departments of Statistics and Computer Science at Stanford University. He obtained his Ph.D. from the department of Electrical and Computer Engineering at The University of Texas at Austin. His research includes topics in optimization, smooth games, statistical learning and deep learning theory and generative models.
Personal website: https://mitliagkas.github.io/