Du Jianming – Post-graduate Student, Department ИУ-3, Bauman Moscow State Technical University
Modern automatic speech recognition systems usually have very big dictionary as database. In this paper, a method based on the idea of phonetic algorithms would be discussed by using the CMUSphinx recognition toolkit to recognize single word.
In the recognition process of recognition, the decoder will scan all the pronunciation dictionary, which contains all words and it presents their possible variants of pronunciation by phonemes. Therefore, it is planned to add one operation «phonetic processing» before decoding.
This part applies simple acoustic features of words and separates one abridged dictionary with words that have similar pronunciation with these features. It will achieve the goal of reducing decoding time in the process of recognition of single word.
After achieving the recognition results of the phonemes of words, there are some «rules of the similarity» between transcription and result of phonemes and use these to do a preliminary search and extract words for the dictionary. After an artificial analysis, there are 3 phonetic rules can be used to form the abridged dictionary.
1) The length of the sequence of phonemes and the length of the transcription of the words.
2) The number of vowels in the sequence of phonemes and the relevant number in the sequence of transcriptions.
3) Recognition of the first consonant.
According to these rules, 3 new relative table for coding are created After coding, 2 kinds of code are got, «codes of transcriptions» and «codes of phonemes». These codes will not be directly compared. A translation from the code of phonemes to that of transcriptions is used.
The main idea of the translation is to find words, which meet the 3 requirements from the code, instead of using the full dictionary to pass to the next decode step of recognition.
Experiments are conducted from 2 aspects - the effectiveness and the correctness of the system.
In the first experiment of recognizing 2000 words by using a classic recognition system and the modified system, the relevant result of the average time of recognizing single words is noticeably faster than the classic one. In accuracy experiment, the relevant results of the accuracy of the modified system is higher than that of the classic one.
- Aiman F., Saquib Z., Nema S. Hidden Markov Model system training using HTK // International Conference on Advanced Communication Control and Computing Technologies. Ramanathapuram (India). 25−27 May 2016. P. 806−809.
- Povey D., Ghoshal A. The KALDI Speech Recognition Toolkit // IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii (USA). 11−15 December 2011. P. 786−789.
- Gales M., Young S. The Application of Hidden Markov Models in Speech Recognition // Foundations and Trends in Signal Processing. 2007. V. 1. № 3. P. 195−304.
- Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech recognition // Proceedings of IEEE. 1989. V. 77. № 2. P. 257−286.
- Xiong W., Droppo J., Huang X., Seide F., Seltzer M., Stolcke A., Yu D., Zweig G. The Microsoft 2016 conversational speech recognition system // ICASSP IEEE. New Orleans (USA). 5−9 March 2017. P5255−5259.
- Mosleh M. Setayeshi S., Kheyrandish M. Accelerating Speech Recognition Algorithm with Synergic Hidden Markov Model and Genetic Algorithm Based on Cellular Automata // International Conference on Signal Processing Systems. Singapore. 15−17 May 2009. P3−7.
- Yu L., Ukdave Y., Kaeli D. GPU-accelerated HMM for Speech Recognition // Parallel Processing Workshops. Minneapolis (USA). 9−12 September 2014. P395−402.
- Vy'xovanecz V.S., Du Cz., Sakulin S.A. Obzor algoritmov foneticheskogo kodirovaniya // Upravlenie bol'shimi sistemami. 2018. Vy'pusk 73. M.: IPU RAN. S. 67−94.
- Knuth D.E. The Art of Computer Programming. Boston: Addison-Wesley. 1973. V. 3. P. 391−392.
- Paramonov V.V., Shigarov A.O. Polyphon: An Algorithm for Phonetic String Matching in Russian // Language International Conference on Information and Software Technologies. 13−15 October 2016. P. 568−579.
- Ladegorged P. Vowels and Consonants: An Introduction to the Sounds of Languages. Oxford: Blackwell. 2001. 191 p.
- Lamere P., Kwok P. The CMU Sphinx 4 speech recognition system // IEEE International Conference on Acoustics, Speech and Signal Processing. Hong Kong (China). 6−10 April 2003. P. 333−337.