Fractal Dimensions of Voice Patterns and Voice Recognition
DOI:
https://doi.org/10.32871/rmrj1402.01.07Keywords:
Voice recognition, voice patterns, fractals, fractal dimension, speaker identificationAbstract
The popularity and the convenience of using electronic communications have given rise to more transactions online. Despite the regular updates of safeguards, there are significant numbers of transactions that go awry. In the hotel business, forgeries and prank calls may be problematic, but there is nothing more distressing than to settle conflicts with guests. The lack of means to recognize, identify and verify callers exposes the transactions to pranks or to misunderstandings. In either case these frustrating transactions erode goodwill, which result in loss of future business. This study explores the use of fractal dimensions in characterizing the different facets of voice and speech dynamics. The different sinusoid samples intend to measure the physiological and the dynamics aspect of vocalization. Test results have shown that the differences of the group mean of the fractal dimensions of the voice wave patterns among the volunteers are significant. These also have shown the potential use of fractal dimensions in characterizing the voice patterns of different speakers and the eventual voice recognition or speaker identification.
References
cs.rochester.edu/u/james/CSC248/lec12.pdf)
Anusuya, M. A., & Katti, S. K. (2010). Speech recognition by machine, a review. arXiv preprint arXiv:1001.2267.
Austerlitz, H. (2002). Data acquisition techniques using PCs. Academic press.
Berg, R. E., & Stork, D. G. (1982). The physics of sound. Pearson Education India. Charlton, G. “Problems Cancelling a Hotel Booking.â€
http://www.telegraph.co.uk/travel/traveladvice/9847658/Problems-cancelling-a-hotelbooking.html. Travel. February 4, 2013. Web.
September 20, 2013
Charlton, G. “Resolving Hotel Booking Issues With Expedia Call-Centre Staff.†http://www.telegraph. co.uk/travel/columnists/gillcharlton/9333560/Resolving-hotel-booking-issues-with-Expediacall-centre-staff.html. Travel. June 15, 2012. Web.
September 20, 2013
Chavez, S. (2013, April ,). Speech Recognition: A Work in Progress. For the Record, 25(Special Showcase Edition). Spring City, California, USA: Great Valley Publishing Co., Inc. Retrieved june 28, 2014, from http://www.fortherecordmag.com/archives/0413bonusp10.shtml
De Smedt, K. (1996). Computional models of incremental grammatical encoding. In A.
Dijkstra & K. de Smedt (eds.) (1996). Computational psycholinguistics: AI and connectionist models of human language processing (pp. 24-48). London:Taylor & Francis, 1996.
Fractal Analysis System ver 3.4.7 (Fractal3E) downloaded from http://cse.naro.affrc.go.jp/ sasaki /index-e.html
Free Audio Editor version: 2014 8.6.1 downloaded from
http://www.free-audio-editor.com/
Furtună, T. F. (2008). Dynamic programming algorithms in speech recognition. Revista Informatica Economică nr, 2(46), 94. Gershenson, C. (2003). Artificial neural networks for beginners. arXiv preprint cs/0308031.
Hemdal, J. F., & Hughes, G. W. (1967). A feature based computer recognition program for the modeling of vowel perception. Models for the Perception of Speech and Visual Form, Wathen-Dunn, W. Ed. MIT Press, Cambridge, MA.
Hecht-Nielsen, R. (1989, June). Theory of the backpropagation neural network. In Neural Networks, 1989. IJCNN., International Joint Conference on (pp. 593-605). IEEE.
Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 23(1), 67-72.
Knight, W. (2012, May 29). Business Report :Where Speech Recognition is Going. Retrieved june 28, 2014, from MIT Technology Review: http://www.technologyreview.com
Mandelbrot, B. B. (1983). The fractal geometry of nature. Macmillan.
Melim, P., Urias, J., Solano, D., Soto, M., Lopez, M., & Castillo, O. ( 2006). Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms. Engineering Letters, 13:2.
Melim, P., & Castillo, O. (2005). Voice recognition with neural networks, fuzzy logic and genetic algorithms. In Hybrid Intelligent Systems for
Pattern Recognition Using Soft Computing (pp.223-240). Springer Berlin Heidelberg.
Moore, R. K. (1994, September). Twenty things we still don’t know about speech. In by H. Niemann, R. De Mori, and G. Hanrieder (infix, St. Augustin) (Vol. 9, p. 17).
Padmanabhan, R. (2012). Studies on voice activity detection and feature diversity for speaker recognition (Doctoral dissertation, INDIAN
INSTITUTE OF TECHNOLOGY, MADRAS).
Perrachione, T. K., Del Tufo, S. N., & Gabrieli, J. D. (2011). Human voice recognition depends on language ability. Science, 333(6042), 595-595.
Pogue, D. (2010, November 17). Talk to the Macnihes:Progress in Speech Recognition Software. Retrieved June 28, 2014, from Scientific American: http://www.scientificamerican.com/article/talk-to-the-machine/
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
Rabiner, L. R., Juang, B. H., & Lee, C. H. (1996). An overview
of automatic speech recognition. In Automatic Speech and Speaker Recognition (pp. 1-30). Springer US.
Reynolds, D. A. (1995). Automatic speaker recognition using Gaussian mixture speaker models. In The Lincoln Laboratory Journal.
Downloads
Published
How to Cite
Issue
Section
License
Copyright of the Journal belongs to the University of San Jose-Recoletos