D.A. Astashov – Undergraduate, Department «Design and manufacturing of electronic equipment», Kaluga branch of the Bauman MSTU
V.E. Drach – Ph.D.(Eng.), Associate Professor, Department «Design and manufacturing of electronic equipment», Kaluga branch of the Bauman MSTU
A.V. Rodionov – Ph.D.(Eng.), Associate Professor, Department «Computer systems and networks», Kaluga branch of the Bauman MSTU
The task of computer-aided text recognition has been studied for a long time, but this task has not been completely solved for now. This is mainly due to the complexity of the recognition process. Firstly, each speaker is unique, besides, a person usually does not pause when pronouncing words. Secondly, there is a difference of voices, diction, etc. Third, the acoustic aspect: the location of the microphone and the acoustic environment in the room. There are other nuances also.
There are two typical approaches: with the adjustment to the voice and without it. The most difficult and promising approach is the second one, but the size of the dictionary is small. The idea of voice command recognition by the microcontroller systems looks promising due to the widespread of embedded microcontroller systems.
Known solutions for speech recognition are divided into software and hardware implementation. The software implementation uses the power of the processor or microcontroller, and relies on external servers. The hardware implementation needs a structurally complete module, like EasyVR, MOVI and similar.
The aim of this work is the development of an autonomous embedded system based on an eight-bit microcontroller for voice commands recognition.
To solve this problem, a specialized library is used. The main advantage of the chosen algorithm is the lack of Fourier transform in the main loop, which gives an advantage in speed of operation in comparison with analogues. For correct operation, adjustment (calibration) is required. For calibration, a PC with a serial port, a microcontroller module, and a corresponding integrated development environment are used. With a special utility, the volume of the microphone of the variable is adjusted.
The phonemes are tuned by the dictation to the microphone with subsequent fixation. Low-frequency noise can be defined as a pho-neme. The algorithm of the program takes into account «spaces» (or gaps), absence of sound when the command is spoken. After this, the recorded string is checked against the predetermined commands in memory. If there are no matches, then the Levenshtein distance is calculated for each candidate. The command with the shortest distance is selected.
The developed program recognizes several commands in Russian, which are designed to control the robot. It can also be used in other systems where voice control is required, for example, «smart home».
As a result of the work, a clear recognition of commands in Russian was achieved.
- Vasilenko I.A. «Umny’j gorod» kak soczial’no-politicheskij proekt: vozmozhnosti i riski smart-texnologij v gorodskom rebrendinge // Vlast’. 2018. T. 26. № 3. S. 13−19.
- Shikov S.A. Problemy’ informaczionnoj bezopasnosti: internet veshhej // Vestnik Mordovskogo universiteta. 2017. T. 27. № 1. S. 27−40.
- Drach V.E., Rodionov A.V., Chuxraev I.V. Bezmaketnoe osvoenie programmirovaniya Raspberry Pi // Voprosy’ radioe’lektroniki. 2015. № 8. S. 133−142.
- Kinnunen Tomi, Li Haizhou. An Overview of Text-independent Speaker Recognition: From Features to Supervectors // Speech Communication. 2010. V. 52. P. 12−40.
- You C.H., Ma B. Spectral-domain speech enhancement for speech recognition // Speech Communication. 2017. V. 94. P. 30−41.
- Littman D., Moran T. Macworld // Voice-recognition software. 1992. T. 9. № 10. S. 123.
- Johnson R.C. Voice recognition added to web-based businesses // Electronic Engineering Times. 1998. № 1035. S. 68.
- Levenshtejn V.I. Dvoichny’e kody’ s ispravleniem vy’padenij, vstavok i zameshhenij simvolov // Doklady’ Akademij Nauk SSSR. 1965. 163.4:845-848.