Advances of Momentum in Optimization Algorithm and Neural Architecture Design
Dr Bao Wang
, University of Utah
We will present a few recent results on leveraging momentum techniques to improve stochastic optimization and neural architecture design. First, designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this frame- work and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. Also, we show the empirical advantage of the momentum enhanced RNNs over the baseline models. Second, we will present the recent advances of the adaptive momentum in accelerating the stochastic gradient descent (SGD). The adaptive momentum assisted SGD remarkably improves the deep neural net-work training in terms of acceleration and improved generalization and significantly reduces the effort for hyperparameter tuning.