abstract: Many machine learning methodologies, based on the minimization of the empirical risk, lead to optimization problems in which the objective function is the sum of loss functions, depending on the samples of a finite training set. These optimization problems are challenging in the case of large scale training sets because the computation of the objective function and its gradient is too expensive. In these cases, Stochastic Gradient (SG) methods are the main approaches. Many SG variants are available in literature, based on different strategies for reducing the adverse effect of noisy gradient estimates and for defining the steplength (or learning rate) parameter. In this work, starting from recent advances on state of the art steplength rules for deterministic gradient schemes and noise reduction strategies, we investigate possible techniques for selecting the learning rate parameter in SG approaches. Preliminary studies on the behaviour of popular steplength selections, such as the Barzilai-Borwein rules, in the stochastic gradient framework, have shown that many interesting questions need to be fixed before obtaining effective benefits}. In this work, we investigate the possibility to make the SG algorithms more robust by exploiting steplength selections based on recently proposed limited memory strategies.
We also discuss some of these open problems on the steplength selection within other widely used stochastic optimizers, exploiting momentum terms and adaptive variance techniques.