abstract: In this short course, we give an introduction to numerical methods for training deep neural networks. The course consists of three lectures.
In the first lecture, we introduce the basic notation and some examples of learning problems and then review linear models in detail. We consider linear regression and classification problems and review numerical optimization methods used for training those models. We emphasize the importance of generalization and show how to achieve it using regularization theory.
In the second lecture, we extend our discussion to nonlinear models, in particular, multi-layer perceptrons and residual neural networks. We demonstrate that even the training of a single-layer neural network leads to a challenging non-convex optimization problem and overview some heuristics such as Variable Projection and stochastic approximation schemes that can effectively train nonlinear models. Finally, we demonstrate challenges associated with deep networks such as their stability and computational costs of training.
In the last lecture, we show that residual neural networks can be interpreted as discretizations of a nonlinear time-dependent ordinary differential equation that depends on unknown parameters, i.e., the network weights. We show how this insight has been used, e.g., to study the stability of neural networks, design new architectures, or use established methods from optimal control methods for training ResNets. Finally, we discuss open questions and opportunities for mathematical advances in this area. Material for the course can be found online at http://www.mathcs.emory.edu/~lruthot/courses/NumDL/index.html.