Home > Academics, Research, Reviews, Tutorials > Speech Processing and Machine Recognition of Speech

Speech Processing and Machine Recognition of Speech

Speech processing has been an active area of research for decades. By nature, the research is highly multidisciplinary as it involves linguistics, psychology, anatomy, mathematics, and machine learning, so some newbies on the area might wonder what would be a good tutorial to read in order to catch up with the topic. Personally, without any clue about linguistics or psychology, I started with a classic HMMs paper [1] by Rabiner. However, this paper focuses on HMM part of speech recognition rather than the overview of speech processing.  Today I found a series of good video lectures, and I thought that it might help summarize what people have been doing in the area. Hopefully you find it useful.

  • Hidden Markov Model (HMM) Toolbox for Matlab by Kevin Murphy [link]: The web page links to many useful papers, and provides HMM toolbox for MATLAB.
  • JHU Summer School on Human Language Technology [link]: The web page contains a lot of video lectures hosted by videolectures.net
  • Here is the lecture by Hynek Hermansky, you may want to watch this first

Introduction to Speech Processing

Hynek Hermansky

Next, you may want to know more technical details

Machine Recognition of Speech

Brian Kingsbury

Recommended Reading:

[1] A tutorial on Hidden Markov Models and selected applications in speech recognition, L. Rabiner, 1989, Proc. IEEE 77(2):257–286. [pdf]

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: