D. I. Ozornin – Post-graduate Student, Yandex LLC, Moscow. E-mail: email@example.com
Statistical language models based on recurrent neural networks are one of the most popular methods of language representation nowadays. However, training the model is a very time-consuming process, in spite of all of the advances in computational power. One of the bottlenecks is the size of the output layer which is as large as the vocabulary. In this paper we are suggesting a new technique based on Huffman codes to decrease training time. It does so by training smartly chosen part of the weights between hidden and output layers so that overall quality will not suffer much. To ensure that quality stays on the same level we suggest using a momentum method for optimizing recurrent neural network. It allows us to tune the model more precisely and use less number of epochs to achieve convergence, which also speeds up the time used for training. Momentum method is a powerful optimization method of stochastic gradient decent, which is based on accumulating the error gradient from one training example to another. Then this accumulated gradient is used for the weight tuning during iterations with the current error gradient. This allows the model to converge more quickly and go through less epochs of training. In conclusion we analyse the experiments with this two methods on two corpuses of different sizes: English and Russian. We are able to show that technique for speed increase combined with momentum method can produce an insignificant quality decrease with a good increase in speed of training. There are a lot of other means and methods to optimize this types of models, which will be investigated in future papers.
Boden M. A Guide to Recurrent Neural
Networks and Backpropagation // In the Dallas project. SICS Technical Report
Goodman J.T. A bit of progress in
language modeling, extended version // Technical report MSR-TR-2001-72. 2001.
Mikolov T., Karafiat M., Burget L.,
Cernocky J., Khudanpur S. Recurrent neural network based language model //
Proceedings of Interspeech. 2010.
Mikolov T., Kombrink S., Burget L.,
Cernocky J., Khudanpur S. Extensions of recurrent neural network language model
// Proceedings of ICASSP. 2011.
Recht B., Re C., Wright S., Niu F.
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent //
Advances in Neural Information Processing Systems. 24. 2011. Р. 693-701.
Rosenfeld R. Adaptive Statistical
Language Modeling: A Maximum Entropy Approach. Ph.D. thesis, Carnegie Mellon
Rumelhart D.E., Hinton G.E., Williams
R.J. Learning internal representations by back-propagating errors // Nature.
Stolcke A. SRILM – An Extensible Language Modeling Toolkit // Proc.
Intl. Conf. on Spoken Language Processing. 2002. V. 2. P. 901-904.