A survey of Machine Learning Approaches for Speech Recognition. | [Bahria University Journal of Information & Communication Technologies • 2017]

Author(s):

1. Bakhtiar K Kasi: Department of Computer Engineering, Balochistan University of information Technology, Engineering & Management Sciences (BUITEMS), Quetta, Pakistan

2. Riaz Ulamin: Department of Computer Sciences, Balochistan University of information Technology, Engineering & Management Sciences (BUITEMS), Quetta, Pakistan

3. Mumraiz Kasi: Department of Computer Sciences, Balochistan University of information Technology, Engineering & Management Sciences (BUITEMS), Quetta, Pakistan

4. Masood Ur Rehman: Department of Computer Engineering, Balochistan University of information Technology, Engineering & Management Sciences (BUITEMS), Quetta, Pakistan

Abstract:

- Machine learning approaches have been used for a wide range of applications in the recent years. The strength of these approaches mainly lies in their ability to learn from experience. Speech recognition has been area which has gained a lot of popularity in recent years. Siri, Genie, and Cortana are some commonly used examples. The focus of these applications have been to interpret human speech into a set of basic commands for a portable device. In this paper, we present a survey of the some commonly used machine learning approaches for speech recognition. While there are several variations to the speech recognition system, little is known about the challenges associated with each approach. In this paper we present a comparison of the available approaches and highlight the pros and cons of some of the popularly used approaches.

Page(s): 3-7

DOI: DOI not available

Published: Journal: Bahria University Journal of Information & Communication Technologies, Volume: 10, Issue: 1, Year: 2017

Keywords:

Keywords are not available for this article.

References:

[1] W. Fan and A. Bifet, “Mining big data: Current status, and forecast to the future,” SIGKDD Explor. Newsl., vol. 14, no. 2, pp. 1-5, Apr. 2013. [Online]. Available: http://doi.acm.org/10.1145/2481244.2481246 ), Quetta Explor,org/10 14 1 -5

[2] M. Szomszor, C. Cattuto, H. Alani, K. O?Hara, A. Baldassarri, V. Loreto, and V. D. Servedio, “Folksonomies, the semantic web, and movie recommendation,” 2007, event Dates: 3-7th, June 2007. [Online]. Available: https://eprints.soton.ac.uk/264007/

[3] Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE multimedia, vol. 19, no. 2, pp. 4-10, 2012.

[4] J. R. Bellegarda, “Spoken language understanding for natural interaction: The siri experience,” in Natural Interaction with Robots, Knowbots and Smartphones Springer, 2014, pp. 3-14.

[5] T. M.Mitchell, 1997.Machine learning,” Burr Ridge, IL: McGraw Hill 45 -

[6] L. R.Rabiner, 1997.Applications of speech recognition in the area of telecommunications,1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, Dec 501 -510

[7] K. F.Lee,H. W.Hon,R.Reddy, Jan 1990.An overview of the sphinx speech recognition system,” IEEE Transactions on Acoustics, Speech, and Signal Processing 38 35 -45

[8] S.Smith,, 2013.Digital signal processing: a practical guide for engineers and scientists,Newnes -

[9] B. H.Juang,L. R.Rabiner, 2015.Automatic speech recognition - A brief history of the technology development,” Elsevier Encyclopedia of Language and Linguistics -

[10] J.Baker, 1975.The dragon system-an overview,” IEEE Transactions on Acoustics, Speech, and Signal Processing 23 24 -29

[11] D. A.Reynolds,R. C.Rose, 1995.Robust text-independent speaker identification using gaussian mixture speaker models,” IEEE Transactions on Speech and Audio Processing 3 72 -83

[12] A.Waibel,T.Hanazawa,G.Hinton,K.Shikano,K. J.Lang, Mar 1989.Phoneme recognition using time-delay neural networks,” IEEE Transactions on Acoustics, Speech, and Signal Processing 37 328 -339

[13] P.Price,D. S.Pallett, Apr 1988.The darpa 1000-word resource management database for continuous speech recognition,” in ICASSP-88, 1 651 -654

[14] J. J.Godfrey,E. C.Holliman,J. McDaniel,Mar1992, .Switchboard: telephone speech corpus for research and development,[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing 1 517 -520

[15] W.Walker,P.Lamere,P.Kwok,B.Raj,R.Singh,E.Gouvea,P.Wolf,J.Woelfel, 2004.Sphinx-4: A flexible open source framework for speech recognition, -

Citations

Downloads

Views