0% found this document useful (0 votes)

124 views

Feature Extraction Using PCA

Speech recognition is the process of converting spoken words to text. It involves preprocessing the audio signal, recognizing the words, and communicating the output to other applications. Preprocessing converts speech to digital feature vectors and includes signal processing, feature extraction, and segmentation. Recognition identifies words using techniques like template matching, acoustic-phonetic modeling, stochastic processing using Hidden Markov Models, and neural networks. Variability in speech and accuracy of recognition systems are challenges that require advanced processing methods.

Uploaded by

Kamaraj Naidu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

Feature Extraction Using PCA

Uploaded by

Kamaraj Naidu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Speech Recognition

Definition
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognised words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding

Speech Processing
Signal processing:
Convert the audio wave into a sequence of feature vectors

Speech recognition:
Decode the sequence of feature vectors into a sequence of words

Semantic interpretation:
Determine the meaning of the recognized words

Dialog Management:
Correct errors and help get the task done

Response Generation
What words to use to maximize user understanding

Speech synthesis (Text to Speech):

Generate synthetic speech from a marked-up word string

Many kinds of Speech Recognition Systems

Speech recognition systems can be characterised by many parameters. An isolated-word (Discrete) speech recognition system requires that the speaker pauses briefly between words, whereas a continuous speech recognition system does not.

Spontaneous V Scripted
Spontaneous, speech contains disfluencies, periods of pause and restart, and is much more difficult to recognise than speech read from script.

Enrolment
Some systems require speaker enrolment, a user must provide samples of his or her speech before using them, whereas other systems are said to be speakerindependent, in that no enrolment is necessary.

Signal Variability
Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. The acoustic realisations of phonemes, the recognition systems smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variables are exemplified by the acoustic differences of the phoneme 't/'in two, true, and butter in English. At word boundaries, contextual variations can be quite dramatic, and devo andare sound like devandare in Italian.

More
Acoustic variability can result from changes in the environment as well as in the position and characteristics of the transducer. Within-speaker variability can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Differences in socio-linguistic background, dialect, and vocal tract size and shape can contribute to across-speaker variability.

What is a speech recognition system?

Speech recognition is generally used as a human computer interface for other software. When it functions in this role, three primary tasks need be performed. Pre-processing, the conversion of spoken input into a form the recogniser can process. Recognition, the identification of what has been said. Communication, to send the recognised input to the application that requested it.

How is pre-processing performed

To understand how the first of these functions is performed, we must examine, Articulation, the production of the sound. Acoustics, the stream of the speech itself. What characterises the ability to understand spoke input, Auditory perception.

Acoustics
Articulation provides valuable information about how speech sounds are produced, but a speech recognition system cannot analyse movements of the mouth. Instead, the data source for speech recognition is the stream of speech itself. This is an analogue signal, a sound stream, and a continuous flow of sound waves and silence.

Important Features (Acoustics)

Four important features of the acoustic analysis of speech are, (Carter, 1984) Frequency, the number of vibrations per second a sound produces Amplitude, the loudness of the sound. Harmonic structure added to the fundamental frequency of a sound are other frequencies that contribute to its quality or timbre. Resonance.

Auditory perception, hearing speech.

"Phonemes tend to be abstractions that are implicitly defined by the pronunciation of the words in the language. In particular, the acoustic realisation of a phoneme may heavily depend on the acoustic context in which it occurs. This effect is usually called coarticulation", (Ney, 1994). The way a phoneme is pronounced can be affected by its position in a word, neighbouring phonemes and even the word's position in a sentence. This affect is called the co-articulation effect. The variability in the speech signal caused by coarticulation and other sources make speech analysis very difficult.

Human Hearing
The human ear can detect frequencies from 20Hz to 20,000Hz but it is most sensitive in the critical frequency range, 1000Hz to 6000Hz, (Ghitza, 1994). Recent Research has uncovered the fact that humans do not process individual frequencies. Instead, we hear groups of frequencies, such as format patterns, as cohesive units and we are capable of distinguishing them from surrounding sound patterns, (Carrell and Opie, 1992) . This capability, called auditory object formation, or auditory image formation, helps explain how humans can discern the speech of individual people at cocktail parties and separate a voice from noise over a poor telephone channel, (Markowitz, 1995).

Pre-processing Speech
Like all sounds, speech is an analogue waveform. In order for a Recognition System to perform action on speech, it must be represented in a digital manner. All noise patterns silences and co-articulation effects must be captured. This is accomplished by digital signal processing. The way the analogue speech is processed is one of the most complex elements of a Speech Recognition system.

Recognition Accuracy
To achieve high recognition accuracy the speech representation process should, (Markowitz, 1995), Include all critical data. Remove Redundancies. Remove Noise and Distortion. Avoid introducing new distortions.

Signal Representation
In statistically based automatic speech recognition, the speech waveform is sampled at a rate between 6.6 kHz and 20 kHz and processed to produce a new representation as a sequence of vectors containing values of what are generally called parameters. The vectors typically comprise between 10 and 20 parameters, and are usually computed every 10 or 20 milliseconds.

Signal Recognition Technologies

Signal Recognition methodologies fall into to four categories, most system will apply one or more in the conversion process.

Template Matching,
Template match is the oldest and least effective method. It is a form of pattern recognition. It was the dominant technology in the 1950's and 1960's. Each word or phrase in an application is stored as a template. The user input is also arranged into templates at the word level and the best match with a system template is found. Although Template matching is currently in decline as the basic approach to recognition, it has been adapted for use in word spotting applications. It also remains the primary technology applied to speaker verification, (Moore, 1982).

Acoustic-Phonetic Recognition
Acoustic-phonetic recognition functions at the phoneme level. It is an attractive approach to speech as it limits the number of representations that must be stored. In English there are about forty discernible phonemes no matter how large the vocabulary, (Markowitz, 1995). Acoustic phonetic recognition involves three steps, Feature Extraction. Segmentation and Labelling. Word-Level recognition.

Acoustic phonetic recognition supplanted template matching in the early 1970's. The successful ARPA SUR systems highlighted potential benefits of this approach. Unfortunately acoustic phonetic was at the time a poorly researched area and many of the expected advances failed to materialise.

The high degree of acoustic similarity among phonemes combined with phoneme variability resulting from the co-articulation effect and other sources create uncertainty with regard to potential phoneme labels, (Cole 1986). If these problems can be overcome, there is certainly an opportunity for this technology to play a part in future Speech Recognition system.

Stochastic Processing,
The term stochastic refers to the process of making a sequence of non-deterministic selections from among a set of alternatives. They are non-deterministic because the choices during the recognition process are governed by the characteristics of the input and not specified in advance, (Markowitz, 1995). Like template matching, stochastic processing requires the creation and storage of models of each of the items that will be recognised. It is based on a series of complex statistical or probabilistic analyses. These statistics are stored in a network-like structure called a Hidden Markov Model (HMM), (Paul, 1990).

HMM
A Hidden Markov Model is made up of states and transitions, which are shown, in the diagram. Each state represents of a HMM holds statistics for a segment of a word, which describe the value and variations that are found in the model of that word segment. The transitions allow for speech variations such as The prolonging of a word segment, this would cause several recursive transitions in the recogniser. The omission of a word segment, This would cause a transition that skips a state. Stochastic processing using Hidden Markov Models is accurate, flexible, and capable of being fully automated, (Rabiner and Juang, 1986).

Neural networks
"if speech recognition systems could learn speech knowledge automatically and represent this knowledge in a parallel distributed fashion for rapid evaluation such a system would mimic the function of the human brain, which consists of several billion simple, inaccurate and slow processors that perform reliable speech processing", (Waibel and Hampshire, 1989). An artificial neural network is a computer program, which attempt to emulate the biological functions of the Human brain. They are an excellent classification systems, and have been effective with noisy, patterned, variable data streams containing multiple, overlapping, interacting and incomplete cues, (Markowitz, 1995).

Neural networks do not require the complete specification of a problem, learning instead through exposure to large amount of example data. Neural networks comprise of an input layer, one or more hidden layers, and one output layer. The way in which the nodes and layers of a network are organised is called the networks architecture. The allure of neural networks for speech recognition lies in their superior classification abilities. Considerable effort has been directed towards development of networks to do word, syllable and phoneme classification.

Auditory Models,
The aim of auditory models to allow a Speech Recognition system to screen all noise from the signal and concentrate on the central speech pattern in a similar way to the Human Brain. Auditory modelling offers the promise of being able to develop robust Speech Recognition systems that are capable of working in difficult environments. Currently, it is purely an experimental technology.

Performance of Speech Recognitions systems

Performance of speech recognition systems is typically described in terms of word error rate, defined as: Deletion, The loss of a word within the original speech. The system outputs "A E I U" while the input was "A E I O U". Substitution, The replacement of an element of the input, such as a word, with another. The system outputs "song" while the input was "long". Insertion, The system adds an element to the input, such as a word, when no word was input. The system outputs "A E I O U" while the input was "A E I U".

Speech Recognition as Assistive Technology

Main use is as alternative Hands Free Data entry mechanism Very effective Much faster than switch access Mainstream technology Used in many applications where hands are needed for other things e.g. mobile phone while driving, in surgical theatres

People with speech impairment (Dysarthic Speech) have shown improved articulation after using SR systems especially Discrete systems

Reasons why SR may fail some people

Crowded room - Cannot have everyone talking at once Too many errors because all noises, coughs, throat clearances etc are picked up Speech not good enough to use it Not enough training Cognitive overhead too much for some people

Some links
The following are links to major speech recognition links

Carnegie Mellon Speech Demos

CMU Communicator
Call: 1-877-CMU-PLAN (268-7526), also 2685144, or x8-1084 the information is accurate; you can use it for your own travel planning
CMU Universal Speech Interface (USI)

CMU Movie Line Seems to be about apartments now

Call: (412) 268-1185

Telephone Demos
Nuance
http://www.nuance.com Banking: 1-650-847-7438 Travel Planning: 1-650-847-7427 Stock Quotes: 1-650-847-7423

SpeechWorks
http://www.speechworks.com/demos/demos.htm Banking: 1-888-729-3366 Stock Trading: 1-800-786-2571

COV_Principle_Components_2_variables. xls

MIT Spoken Language Systems Laboratory

http://www.sls.lcs.mit.edu/sls/whatwedo/applicati ons.html Travel Plans (Pegasus): 1-877-648-8255 Weather (Jupiter): 1-888-573-8255

IBM http://www-3.ibm.com/software/speech/
Mutual Funds, Name Dialing: 1-877-VIAVOICE

CT 4201304
No ratings yet
CT 4201304
26 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
unit4ppttsa
No ratings yet
unit4ppttsa
19 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
No ratings yet
Speechrecognitionfinalpresentation 141124072610 Conversion Gate01
30 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Acoustic Parameters For Speaker Verification
No ratings yet
Acoustic Parameters For Speaker Verification
16 pages
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
No ratings yet
Speech Recognition Using Neural Networks: A. Types of Speech Utterance
24 pages
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
100% (1)
Voice Recognition Using Matlab: Presented By: Avienash Raibole Paresh Meshram Vinayak Kolpek
18 pages
Method To Study Speech Synthesis
No ratings yet
Method To Study Speech Synthesis
43 pages
NLP Project Reportttt
No ratings yet
NLP Project Reportttt
9 pages
Speaker Recognition Publish
No ratings yet
Speaker Recognition Publish
6 pages
Chapter14_ModelsandTheories
No ratings yet
Chapter14_ModelsandTheories
55 pages
Speech Recognition Using DSP PDF
No ratings yet
Speech Recognition Using DSP PDF
32 pages
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
No ratings yet
Using Gaussian Mixture: Automatic Speaker Recognition Speaker Models
20 pages
Speech Processing -Anu
No ratings yet
Speech Processing -Anu
78 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
30 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
No ratings yet
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
17 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
34 pages
Digital Speech Processing
No ratings yet
Digital Speech Processing
46 pages
8.5 Multilingual Speech Processing
No ratings yet
8.5 Multilingual Speech Processing
24 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
Introduction
No ratings yet
Introduction
9 pages
Speech Recognition (Dr. M. Sabarimalai Manikandan
No ratings yet
Speech Recognition (Dr. M. Sabarimalai Manikandan
2 pages
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
No ratings yet
Speech Recognition: College Name: Guru Nanak Engineering College Authors: Shruthi Tapse
13 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
No ratings yet
Speech Technology and Research Laboratory, SRI International, Menlo Park, CA, USA
5 pages
Gender Recognition by Speech Analysis
No ratings yet
Gender Recognition by Speech Analysis
24 pages
SPEECH
100% (1)
SPEECH
17 pages
Mca Voice Morphing Report
No ratings yet
Mca Voice Morphing Report
19 pages
Question
100% (1)
Question
17 pages
About Speaker Recognition Techology
No ratings yet
About Speaker Recognition Techology
9 pages
Speech Recognition1
100% (1)
Speech Recognition1
39 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
5G Smart Manufacturing
No ratings yet
5G Smart Manufacturing
14 pages
Final PPT On Speech Processing
0% (1)
Final PPT On Speech Processing
20 pages
Module1 SSP
No ratings yet
Module1 SSP
95 pages
Digital Signal Processing: The Final
No ratings yet
Digital Signal Processing: The Final
13 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
Presentation On Speech Recognition
No ratings yet
Presentation On Speech Recognition
11 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
Review of Feature Extraction Techniques in Automatic Speech Recognition
100% (1)
Review of Feature Extraction Techniques in Automatic Speech Recognition
6 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Advanced Signal Processing Using Matlab
No ratings yet
Advanced Signal Processing Using Matlab
20 pages
Unit 4 Speaker Identification
No ratings yet
Unit 4 Speaker Identification
50 pages
unit 2 sound or audio system
No ratings yet
unit 2 sound or audio system
29 pages
Speech To Text Conversion STT System Using Hidden Markov Model HMM
No ratings yet
Speech To Text Conversion STT System Using Hidden Markov Model HMM
4 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Speechsynthesis
No ratings yet
Speechsynthesis
6 pages
Theory and Application of Digital Speech Processing by L. R. Rabiner and R. W. Schafer
No ratings yet
Theory and Application of Digital Speech Processing by L. R. Rabiner and R. W. Schafer
35 pages
Morph
No ratings yet
Morph
10 pages
Speaker Recognition: Fundamentals and Applications
From Everand
Speaker Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Spoken Language Understanding: Systems for Extracting Semantic Information from Speech
From Everand
Spoken Language Understanding: Systems for Extracting Semantic Information from Speech
Gokhan Tur
No ratings yet
Voice Recognition
From Everand
Voice Recognition
Kai Turing
No ratings yet
Non Homogeneous Markov Chains and Systems Theory and Applications 1st Edition P C G Vassiliou - Download the ebook and explore the most detailed content
100% (3)
Non Homogeneous Markov Chains and Systems Theory and Applications 1st Edition P C G Vassiliou - Download the ebook and explore the most detailed content
78 pages
Probability and random processes 1st Edition Venkatarama Krishnan 2024 Scribd Download
100% (6)
Probability and random processes 1st Edition Venkatarama Krishnan 2024 Scribd Download
50 pages
2022 Syllabus - ISE 4th Sem Consolidated Syllabus - With - Assessment Process - 15.4.2024
No ratings yet
2022 Syllabus - ISE 4th Sem Consolidated Syllabus - With - Assessment Process - 15.4.2024
23 pages
Population Genetics Tutorial
No ratings yet
Population Genetics Tutorial
164 pages
PSR 1
No ratings yet
PSR 1
11 pages
HW 7
No ratings yet
HW 7
3 pages
Text Generation Using Markov Chain
No ratings yet
Text Generation Using Markov Chain
13 pages
שפות סימולציה- אוסף תרגילים מס' 2 - תורת התורים
No ratings yet
שפות סימולציה- אוסף תרגילים מס' 2 - תורת התורים
23 pages
Making Profit in The Stock Market Using HMMs
No ratings yet
Making Profit in The Stock Market Using HMMs
3 pages
Gardiner - Stochastic Meethods
No ratings yet
Gardiner - Stochastic Meethods
10 pages
Ai Ii Notes
No ratings yet
Ai Ii Notes
33 pages
7) Link Prediction in Dynamic Networks Using Time Aware Network Embedding and Time Series Forecasting
No ratings yet
7) Link Prediction in Dynamic Networks Using Time Aware Network Embedding and Time Series Forecasting
13 pages
A Markovian-Genetic Algorithm Model For Predicting Pavement Deterioration
No ratings yet
A Markovian-Genetic Algorithm Model For Predicting Pavement Deterioration
9 pages
Stochastic Analysis of Manpower and Business
No ratings yet
Stochastic Analysis of Manpower and Business
7 pages
103 Sept 2002 Question
No ratings yet
103 Sept 2002 Question
8 pages
Consumption
No ratings yet
Consumption
31 pages
Handbook of RAMS
No ratings yet
Handbook of RAMS
25 pages
Get Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis 2nd Edition Michael Mitzenmacher PDF ebook with Full Chapters Now
100% (1)
Get Probability and Computing Randomization and Probabilistic Techniques in Algorithms and Data Analysis 2nd Edition Michael Mitzenmacher PDF ebook with Full Chapters Now
65 pages
Node Ranking in Labeled Directed Graphs
No ratings yet
Node Ranking in Labeled Directed Graphs
10 pages
Stochastic Formulation of Ecological Models and Their Applications
No ratings yet
Stochastic Formulation of Ecological Models and Their Applications
9 pages
CSIR NET Mathematical Sciences Syllabus
No ratings yet
CSIR NET Mathematical Sciences Syllabus
4 pages
How Data-Driven Entrepreneur Analyzes Imperfect Information For Business Opportunity Evaluation
No ratings yet
How Data-Driven Entrepreneur Analyzes Imperfect Information For Business Opportunity Evaluation
14 pages
Markov Analysis-: A Transition Matrix, or Markov Matrix
No ratings yet
Markov Analysis-: A Transition Matrix, or Markov Matrix
1 page
Articles: Don't Simulate When... 10 Rules For Determining When Simulation Is Not Appropriate
No ratings yet
Articles: Don't Simulate When... 10 Rules For Determining When Simulation Is Not Appropriate
7 pages
Dujapas Jacobi NW
No ratings yet
Dujapas Jacobi NW
13 pages
CSM-B.Tech-MR22-Syllabus-Semesters
No ratings yet
CSM-B.Tech-MR22-Syllabus-Semesters
31 pages
Probability and Random Processes 4th Edition Geoffrey R. Grimmett Download PDF
100% (2)
Probability and Random Processes 4th Edition Geoffrey R. Grimmett Download PDF
66 pages
NB 11
No ratings yet
NB 11
22 pages
Chap 16 Markov Analysis
No ratings yet
Chap 16 Markov Analysis
37 pages