Fractal Dimensions of Voice Patterns and Voice Recognition

Authors

  • Randy K. Salazar University of San Jose-Recoletos
  • Rosario L. Cabillada University of San Jose-Recoletos
  • Mark S. Borres University of San Jose-Recoletos
  • Anthony Jagures University of San Jose-Recoletos

DOI:

https://doi.org/10.32871/rmrj1402.01.07

Keywords:

Voice recognition, voice patterns, fractals, fractal dimension, speaker identification

Abstract

The popularity and the convenience of using electronic communications have given rise to more transactions online. Despite the regular updates of safeguards, there are significant numbers of transactions that go awry. In the hotel business, forgeries and prank calls may be problematic, but there is nothing more distressing than to settle conflicts with guests. The lack of means to recognize, identify and verify callers exposes the transactions to pranks or to misunderstandings. In either case these frustrating transactions erode goodwill, which result in loss of future business. This study explores the use of fractal dimensions in characterizing the different facets of voice and speech dynamics. The different sinusoid samples intend to measure the physiological and the dynamics aspect of vocalization. Test results have shown that the differences of the group mean of the fractal dimensions of the voice wave patterns among the volunteers are significant. These also have shown the potential use of fractal dimensions in characterizing the voice patterns of different speakers and the eventual voice recognition or speaker identification.

Author Biographies

Randy K. Salazar, University of San Jose-Recoletos

is a Mechanical Engineer, Assistant Professor of the University of San Jose Recoletos teaching Mechanical Engineering. Holder of a Masters in Science in Management Engineering from the University of San Jose Recoletos and finishing his Masters degree in Mechanical Engineering – Dynamic Design Systems at the University of San Carlos.

Rosario L. Cabillada, University of San Jose-Recoletos

Faculty of the Tourism and Hospitality Management Department of the College of Commerce, is a graduate of the Bachelor of Science Degree in Hotel and Restaurant Management, Colegio de San Jose-Recoletos in 1983. She worked for Costabella Tropical Beach Resort, Mactan, Cebu in 1984 and Sunburst Fried Chicken Group of Restaurants, Cebu City in 1987. She then joined the University of San Jose-Recoletos in 1988 as a part-time faculty, then as full-time faculty from 1989 up to the present. She also served in an appointive capacity as the Chairperson of the Hotel and Restaurant Management Department with the College of Arts and Sciences for the periods in 1990-1995, 1999-2002 and with the College of Commerce in its new nomenclature of the Tourism and Hospitality Management Department from 2004-2008. Aside from her teaching duties she has also been serving as a member of the Regional Quality Assessment Team of CHED Region VII from 2008 up to the present.

Mark S. Borres, University of San Jose-Recoletos

graduated Bachelor of Science in Mathematics–major in Pure Mathematics at the University of the Philippines, Cebu College. Since 2009, he worked for the University of San Jose- Recoletos as a faculty member of the College of Arts and Sciences and handled Mathematics subjects such as College Algebra, Advanced Algebra, Abstract Algebra, Analytical Geometry, Euclidean geometry, Trigonometry, Business Mathematics, Linear Programming, Mathematics of Investment, Discrete Structure, and Statistics across colleges.

References

Allen, J. F. (2003).CSC248, Lec 12: Approaches to Speech Recognition. (Retrieved date April 20, 2014, from: Hajim School of Engineering and Applied Science Department of Computer Science: http://www.
cs.rochester.edu/u/james/CSC248/lec12.pdf)

Anusuya, M. A., & Katti, S. K. (2010). Speech recognition by machine, a review. arXiv preprint arXiv:1001.2267.

Austerlitz, H. (2002). Data acquisition techniques using PCs. Academic press.

Berg, R. E., & Stork, D. G. (1982). The physics of sound. Pearson Education India. Charlton, G. “Problems Cancelling a Hotel Booking.”
http://www.telegraph.co.uk/travel/traveladvice/9847658/Problems-cancelling-a-hotelbooking.html. Travel. February 4, 2013. Web.
September 20, 2013

Charlton, G. “Resolving Hotel Booking Issues With Expedia Call-Centre Staff.” http://www.telegraph. co.uk/travel/columnists/gillcharlton/9333560/Resolving-hotel-booking-issues-with-Expediacall-centre-staff.html. Travel. June 15, 2012. Web.
September 20, 2013

Chavez, S. (2013, April ,). Speech Recognition: A Work in Progress. For the Record, 25(Special Showcase Edition). Spring City, California, USA: Great Valley Publishing Co., Inc. Retrieved june 28, 2014, from http://www.fortherecordmag.com/archives/0413bonusp10.shtml

De Smedt, K. (1996). Computional models of incremental grammatical encoding. In A.

Dijkstra & K. de Smedt (eds.) (1996). Computational psycholinguistics: AI and connectionist models of human language processing (pp. 24-48). London:Taylor & Francis, 1996.

Fractal Analysis System ver 3.4.7 (Fractal3E) downloaded from http://cse.naro.affrc.go.jp/ sasaki /index-e.html

Free Audio Editor version: 2014 8.6.1 downloaded from
http://www.free-audio-editor.com/

Furtună, T. F. (2008). Dynamic programming algorithms in speech recognition. Revista Informatica Economică nr, 2(46), 94. Gershenson, C. (2003). Artificial neural networks for beginners. arXiv preprint cs/0308031.

Hemdal, J. F., & Hughes, G. W. (1967). A feature based computer recognition program for the modeling of vowel perception. Models for the Perception of Speech and Visual Form, Wathen-Dunn, W. Ed. MIT Press, Cambridge, MA.

Hecht-Nielsen, R. (1989, June). Theory of the backpropagation neural network. In Neural Networks, 1989. IJCNN., International Joint Conference on (pp. 593-605). IEEE.

Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 23(1), 67-72.

Knight, W. (2012, May 29). Business Report :Where Speech Recognition is Going. Retrieved june 28, 2014, from MIT Technology Review: http://www.technologyreview.com

Mandelbrot, B. B. (1983). The fractal geometry of nature. Macmillan.

Melim, P., Urias, J., Solano, D., Soto, M., Lopez, M., & Castillo, O. ( 2006). Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms. Engineering Letters, 13:2.

Melim, P., & Castillo, O. (2005). Voice recognition with neural networks, fuzzy logic and genetic algorithms. In Hybrid Intelligent Systems for
Pattern Recognition Using Soft Computing (pp.223-240). Springer Berlin Heidelberg.

Moore, R. K. (1994, September). Twenty things we still don’t know about speech. In by H. Niemann, R. De Mori, and G. Hanrieder (infix, St. Augustin) (Vol. 9, p. 17).

Padmanabhan, R. (2012). Studies on voice activity detection and feature diversity for speaker recognition (Doctoral dissertation, INDIAN
INSTITUTE OF TECHNOLOGY, MADRAS).

Perrachione, T. K., Del Tufo, S. N., & Gabrieli, J. D. (2011). Human voice recognition depends on language ability. Science, 333(6042), 595-595.

Pogue, D. (2010, November 17). Talk to the Macnihes:Progress in Speech Recognition Software. Retrieved June 28, 2014, from Scientific American: http://www.scientificamerican.com/article/talk-to-the-machine/

Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.

Rabiner, L. R., Juang, B. H., & Lee, C. H. (1996). An overview
of automatic speech recognition. In Automatic Speech and Speaker Recognition (pp. 1-30). Springer US.

Reynolds, D. A. (1995). Automatic speaker recognition using Gaussian mixture speaker models. In The Lincoln Laboratory Journal.

Downloads

Published

2014-06-30

How to Cite

Salazar, R. K., Cabillada, R. L., Borres, M. S., & Jagures, A. (2014). Fractal Dimensions of Voice Patterns and Voice Recognition. Recoletos Multidisciplinary Research Journal, 2(1). https://doi.org/10.32871/rmrj1402.01.07

Issue

Section

Articles