Radiotekhnika
Publishing house Radiotekhnika

"Publishing house Radiotekhnika":
scientific and technical literature.
Books and journals of publishing houses: IPRZHR, RS-PRESS, SCIENCE-PRESS


Тел.: +7 (495) 625-9241

 

Time costs reduction of parameters selection for noise reduction algorithms in the speaker identification problem

Keywords:

G.S. Tupitsin – Ph. D. (Eng.), Department of Infocommunication and Radiophysics, P.G. Demidov Yaroslavl State University
E-mail: genichyar@genichyar.com
A.I. Topnikov – Ph. D. (Eng.), Associate Professor, Department of Infocommunication and Radiophysics, P.G. Demidov Yaroslavl State University
E-mail: topartgroup@gmail.com
A.L. Priorov – Dr. Sc. (Eng.), Associate Professor, Department of Infocommunication and Radiophysics, P.G. Demidov Yaroslavl State University
E-mail: andcat@yandex.ru


Speaker identification is becoming a high-relevant task in many fields specially in the framework of security remote applications. These systems usually developed under laboratory conditions and severely degrade their performance level when an acoustical mismatch appears among training and testing phases. For example, it can occur in acoustic noise presence. In this case one of the most effective ways to provide more robustness to the recognizer is using noise reduction algorithms for speech signals.
It should be noted that noise reduction algorithms maximizing quality and intelligibility of speech signals are not always effective for signal preprocessing in the problem of speaker identification. Parameters selection of the noise reduction algorithms is also a problem because the traditional approach of speaker identification accuracy estimation is time-demanding. Resource requirements can be reduced using the combined speech quality measure based on objective speech quality measures instead of real speaker identification system. It uses full set of test signals of the speech database.
In this paper a quick speaker identification accuracy estimation technique using linear combination of some objective speech quality measures and reduced set of test signals was proposed. Signal-to-noise ratio (SNR), segment signal-to-noise ratio (Segment SNR), weighted spectral slope (WSS), log-likelihood ratio (LLR) and 3 measures based on the distance between mel-frequency cepstral coefficients (MFCC) was used as speech quality measures for the proposed technique
A test signal selection algorithm was described. Four test signals were chosen from each of speech databases using in the research. A value of the linear correlation coefficient between speaker identification accuracy and proposed estimator was 0.99.
An experiment of parameters selection of the two-step algorithm based on minimum mean square error short-time spectral amplitude estimator was performed. Obtained results indicated the possibility of test speech signals number reducing in order to accelerate speaker identification accuracy estimation while relatively reliability of the results was remained high.

References:
  1. Matrouf D., W. Ben Kheder, Bousquet P.M., Ajili M., Bonastre J.F. Dealing with additive noise in speaker recognition systems based on i-vector approach // 23rd European Signal Processing Conference (EUSIPCO). 2015. P. 2092−2096.
  2. Zhao X., Wang Y., Wang D. Robust speaker identification in noisy and reverberant conditions // Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2014. V. 22. № 4. P. 3997−4001.
  3. Zheng T.F., L. Li. Robustness-Related Issues in Speaker Recognition. Springer Singapore. 2017.
  4. Ortega-Garcia J., Gonzalez-Rodriguez J. Overview of speech enhancement techniques for automatic speaker recognition // IEEE Proceeding of Fourth International Conference on Spoken Language Processing (ICSLP). 1996. V. 2. P. 929−932.
  5. Tupiczin G.S., Topnikov A.I., Priorov A.L. Metodika oczenki myagkoj maski dlya zadachi predobrabotki zashumlenny’x rechevy’x signalov v sistemax identifikaczii diktora // Uspexi sovremennoj radioe’lektroniki. 2016. № 6. P. 73−80.
  6. Tupiczin G.S. Predobrabotka rechevy’x signalov v sistemax avtomaticheskoj identifikaczii diktora / Dis. … kand. texn. nauk: 05.12.04. Vladimir: Vladimirskij gosudarstvenny’j universitet imeni Aleksandra Grigor’evicha i Nikolaya Grigor’evicha Stoletovy’x. 2015.
  7. Tupiczin G.S., Topnikov A.I., Priorov A.L. Modifikacziya dvuxstupenchatogo algoritma shumopodavleniya dlya uluchsheniya kachestva identifikaczii diktora v usloviyax shumov // Informaczionny’e sistemy’ i texnologii. 2015. № 6. P. 39−47.
  8. Zeinali H., Sameti H., Babaali B. A Fast Speaker Identification Method Using Nearest Neighbor Distance // IEEE International Conference on Signal Processing (ICSP). 2012. P. 6−9.
  9. Tupitsin G., Topnikov A., Priorov A. Two-step noise reduction based on soft mask for robust speaker identification // IEEE 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT). 2016. P. 351−356.
  10. Kinnunen T., Karpov E., Franti P. Real-time speaker identification and verification // IEEE Transactions on Audio, Speech and Language Processing. 2006. V. 14. № 1. P. 277−288.
  11. Battula V.K., Gottapu A.N. General Kalman Filter & Speech Enhancement for Speaker Identification // International Journal on Cybernetics & Informatics. 2016. V. 5. № 4. P. 117−126.
  12. Tupiczin G.S., Topnikov A.I. Kombinirovanny’j pokazatel’ kachestva rechevy’x signalov dlya oczenki tochnosti identifikaczii diktorov // Materialy’ 11 j Mezhdunar. nauchno-texnich. konf. «Perspektivny’e texnologii v sredstvax peredachi informaczii». Vladimir. 2015. P. 240−243.
  13. Tupiczin G.S., Topnikov A.I., Priorov A.L. Speaker Recognition Test Framework – programma dlya issledovaniya algoritmov raspoznavaniya diktora // Svidetel’stvo o gosudarstvennoj registraczii programmy’ dlya E’VM № 2015660245 ot 25 sentyabrya 2015 g.
  14. Cummins F., Grimaldi M., Leonard T., Simko J. The CHAINS Speech Corpus: CHAracterizing INdividual Speakers // Proc of SPECOM. 2006. P. 1−6.
  15. Varga A., Steeneken H.J.M. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems // Speech Communication. 1993. V. 12. № 3. P. 247−251.
  16. International Telecommunication Union. P. 862: Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs / International Telecommunication Union // ITU T Recommendation. 2001. V. 862. P. 862.
  17. Klatt D. Prediction of perceived phonetic distance from critical-band spectra: A first step // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1982. Institute of Electrical and Electronics Engineers. V. 7. P. 1278−1281.
  18. Kondo K. Subjective Quality Measurement of Speech: Signals and Communication Technology. Berlin, Heidelberg: Springer Berlin Heidelberg. 2012.
  19. Tupiczin G.S. Ispol’zovanie rasstoyaniya mezhdu mel-chastotny’mi kepstral’ny’mi koe’fficzientami dlya oczenki tochnosti identifikaczii diktorov // Doklady’ 18 j Mezhdunar. nauchno-texnich. konf. «Problemy’ peredachi i obrabotki informaczii v setyax i sistemax telekommunikaczij». Ryazan’. 2015. P. 98−99.
  20. Boll S. Suppression of acoustic noise in speech using spectral subtraction // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1979. V. 27. № 2. P. 113−120.
  21. Plapous C., Marro C., Mauuary L., Scalart P. A two-step noise reduction technique // IEEE International Conference on Acoustics, Speech and Signal Processing. 2004. V. 1. P. 289−92.

© Издательство «РАДИОТЕХНИКА», 2004-2017            Тел.: (495) 625-9241                   Designed by [SWAP]Studio