0% found this document useful (0 votes)

63 views

Method To Study Speech Synthesis

it helps to study how speech is produced from different method and helps to understand perception of speech.

Uploaded by

MSc Audio B

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Method To Study Speech Synthesis

it helps to study how speech is produced from different method and helps to understand perception of speech.

Uploaded by

MSc Audio B

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

METHODS OF SPEECH SYNTHESIS

Faculty: Dr. Ajith Kumar U Presenter: Aman Kumar

INTRODUCTION
• The main important use of synthetic speech is that to check on our analysis of speech.
• Formant pattern is important to the production and comprehension of a particular
speech sound.
• The perception of vowels and consonants are dependent on their distinctive acoustical
cues.
• Vowels- periodic, high energy and long duration
• Consonants- complex and require multiple cues
• Speech perception varies with intra and inter subjects variability.
• To understand these variables speech needs to be synthesized through artificial means
which helps the researcher in manipulating the specific acoustic cues in which they are
interested.
Speech synthesis
• Synthetic speech is a production of speech by artificial means, especially the
generation of speech by computers or computer controlled devices.
• Various speech synthesizers in daily life Ex: Railway station, Toys, customer care, AAC
• Synthesis useful to understand how speech is perceived.
• Modern speech synthesis able to control and manipulate the any features of speech
which is important for speech production.
• But, all the features of natural speech can not be manipulated individually.
Method used for Studying Speech Perception are:

 Pattern Playback

 Articulatory synthesis

 Parametric Synthesis

 Analysis by synthesis
Pattern Playback
• The first example of a real speech synthesizer appears in 1951 at Haskins
Laboratories, with the PATTERN PLAYBACK
• Sonagraph machine working in reverse.
• The sonagraph transforms recorded speech into a '3-dimensional' plot
• the two first dimensions being time and frequency, the third one being
intensity, represented on a gray scale
• Conversely, by drawing schematic evolutions of formant frequencies on a
glass plate, and by scanning this spectrogram along the time axis (using a
set of frequency modulated light beams, and a light collector that is fed
into a loudspeaker), one can actually hear the sound corresponding to the
spectrogram." Stella (1985).
• hand drawn spectrograph instead of the printed one.
ARTICULATORY SYNTHESIS

Articulatory synthesis is also parametric approach which attempts to model the physical
properties of the human vocal tract.
The goals of articulatory synthesis:
• Naturality of the model.
• Accuracy of the model in comparison with the speaker(s) on, which it is based
• Intelligibility of the model.
• Understanding and information gained from the model.
 In Articulatory Synthesis method of synthesizing speech the speech articulators are
controlled (e.g. jaw, tongue, lips, etc.).
The natural speech production process is modeled as accurately as possible in
articulatory synthesis.
 This is done by creating an artificial or a synthetic replica of human physiology and
making it produce speech.
• The vocal tract geometry is described in 1, 2 or 3 D’s based on the
articulatory synthesizer.
• Area function directly represents the vocal tract in an 1-D model.
The variation of cross sectional area of the vocal tract tube between the mouth
opening and glottis is described by the area function
•The advantage of the two and three-dimensional models is that the
position and form of the articulators can be talked about in a very direct
fashion and specific manner.
ARTICULATORY MODEL (Coker & Fujimura 1966)

Seven parameters are used

 Position of tongue body (X,Y)
 Lip protrusion (L)
 Lip rounding (W)
 Place and degree of tongue tip constriction (R,B)
 Degree of velar coupling (N)
In summary, an articulatory synthesizer comprises at least the following three
parts:
1.A mechanism to control the parameters during an utterance
2.Based on a set of articulatory parameters the geometric description of the vocal
tract
3. A model for the acoustic simulation
Advantages:

• Coarticulation patterns are reflected

• Connected speech can be produced

• Nasal and oral sounds are represented

Disadvantages:

• Complex instrumentation and computation

• Practically impossible to create sounds with 100% accuracy.

• Difficult to use by those who are not well trained.

Applications
• Designed for studying the linguistically and perceptually significant
aspects of articulatory events.
• Speech sounds for use in perceptual tests can be generated through
controlled variations in timing or position parameters.
Eg., /banana/, /bandana/, /badnana/, /baddata/.
• Purposes of this device is to study vocal-tract anatomy and dynamic
behavior.
• Investigation of detailed relationships between velar control and the
perceptual oral-nasal distinction.
Parametric synthesis
• It’s a rule based way of synthesizing speech.
• Parametric synthesis makes the use of either acoustic information (time-domain and
frequency domain) or articulatory information.
• Parametric synthesis that depends upon the acoustic information is known as signal
based (bottom-up) synthesis as it specifies acoustic properties of speech such as
formants, duration of segments and type of noise for fricatives.
• The other name for signal based synthesis is terminal analog because it attempts
to produce an analog of the terminal (acoustic) level of speech and pay little or
no attention to articulatory aspects of speech.
• Articulatory synthesis is another parametric approach which attempts to model
the physical properties of the human vocal tract. (Top- down approach).
3 Types based upon parametric synthesis:-
 linear predictive coding
 Formant synthesis
 synthesis by rule
 Linear predictive coding (LPC)
• It is a class of method used to obtain a spectrum.
• LPC comes from 2 sources
1.Branch of statistics
2.Branch of engineering
LPC builds on the fact that
• Any sample in digitized speech is partly predictable from immediate predecessors.

• Speech does not vary wildly from sample to sample.

• Hence LPC is just the hypothesis that any sample is a linear function of those that
precede it.
• LPC parameterizes the speech signal that is it analyses the complex, constantly changing
speech signal into a few values called “ parameters” which changes relatively slowly.

• The parameters which represent the signal are the frequencies & bandwidths of a set of
filters which would produce that signal, given a certain excitation.

• speech signal is represented as a set of parameters, one can edit these parameters.
Merits:
• LPC represents speech as a set of parameters, hence can be edited easily.
• Variations in frequency parameters easy.
• Extremely accurate estimation of Speech Parameters
• High speed of Computation
• Robust, reliable & accurate method

Demerits:
• LPC considers “resonances” because of which they have difficulty to describe anti-
resonance i.e., nasals.
 Formant synthesis
• Recreates the changing formants of speech, each one being specified to different
parameters, updated every 5ms or during an utterance.

• Received a big boost in 1980 with Dennis Klatt’s publication of an elaborate synthesizer
model, complete with a computer program which synthesized speech on a laboratory
computer.
Klatt’s Model
• Basis for this model is source filter theory.
• There are 2 sound sources:
1.Voicing
2.Friction

These drive 2 resonating systems

1.Cascade resonator for vowels
2.Parallel resonator for fricatives
Cascade system
• In the cascade resonator, the out put of first formant resonator becomes the input to
the second formant resonator.

• The voicing source generates a train of impulses like that produced by the vocal folds.
The filters RGP, RGZ & RGS smoothens this simulated glottal waveform & shapes its
spectrum.
• AV controls the amplitude of voicing
• Source then enters the resonating system in which RNP and RNZ represent nasal
pole nasal zero.
• R1 to R5 represents formants 1 through 5.
Parallel system
• It models the production of fricatives, in which noise source is higher , usually in the
oral cavity and only that part of vocal tract which is in front of the source serves as the
resonator.
• MOD provides for mixing the noise and voicing source for voiced fricatives.
• LPF is a low pass filter which shapes the source spectrum.
• AH and AF control the amplitude of aspiration and friction respectively.

• Aspiration noise goes to the cascade resonator because generates at the larynx, like
voicing uses the entire vocal tract as a resonator.

• The noise source for fricatives goes through the parallel resonators, each with its own
amplitude control.

• The boxes labeled “ first diff ” are high pass filters ; the one at the output simulates
the emphasis given to higher frequencies as sound radiates from the lips.
Merits:-
•Cascade synthesis yields the correct relative amplitudes corresponding to formant
peaks for vowels without any individual amplitude controls for each formant.( Fant,
1956)

•It has been useful in comparative researches like the effect of changing one or more
parameters within a relatively small number of syllables.
Demerits
• Difficulty in representing abrupt transitions.
• Setting parameter every 5 ms is not a practical way of meeting the commercial
needs for speech synthesis.
• Lack of variations in F0 is one of the sources of unnaturalness of speech
 Synthesis by rule
Few rules of thumb which are well known bits of phonology of English and other
languages are :

 F0 declines slowly over utterance

 F0 declines rapidly at the end of declarative sentence

 vowels are lengthened before voiced consonants.

• If we can quantify those rules, we can automate much of the parameter setting in
synthesis.

• In this system , the user should type in the sequence of phonemes in an utterance.

• Synthesizer would then start with a table of default values for each phoneme

• It would automatically tailor each of those values according to the context of each
phoneme.
A reading machine is required

• Eg ; of such synthesizer is DEC talk tm. It takes ordinary spelling from keyboard or
scanner & produce highly intelligible and reasonable natural speech as output.
Merits:
• Logan, Green and Pisoni (1989) studied the intelligibility of 10 synthesis by rule
systems and concluded that Dec talk yielded the lowest error rate and was stated
as most equivalent to natural speech.
• The procedure is easy.
Demerits:
• The frequency range of the system is limited to 5 kHz .
• Produces inappropriate intonation.
• Fewer variations in amplitude.
• Abrupt transitions are difficult to represent.
• Certain sounds are more aspirated than in normal speech.
E.g. /p/ in speech is aspirated the same way as /p/ in peach.
ANALYSIS BY SYNTHESIS
The heart of an analysis by synthesis system is a signal generator capable of
synthesizing all and only the signals to be analyzed.

 Signals generated are compared with the signals to be analyzed and a measure of
error is calculated.

 Different signals are generated until one with smallest error value is found.
• A true analysis by synthesis coder should synthesize all possible output speech signals
and identify the combination that gives the minimum error.
Advantages:
•Comparatively easy to use
•Coarticulation patterns are better represented
•Quality of speech is comparatively good.

Disadvantages:
•Abrupt transitions are difficult to represent
•Frequency variation is difficult to represent
REFERENCES:

• Kent R.D& Read , C (1995). The acoustic analysis of speech.

• Klatt ,D (1980). Software for cascade/ parallel formant synthesizer, Journal of Acoustic
Society of America, 67 : 971- 995.
• Harington , J & Cassidy ,S (1999). Techniques in speech acoustics. Netherlands; Kulwer
Academic Publishers.
• Flanagan, J. L(1972). Speech analysis, synthesis and perception
• Flanagan, J. L and Rainer, L.R (1973). Speech synthesis
• Jacob, B., Mohan,M and Yiteng,H(2006). Hand book of speech processing
• Kent and Reed(1995), Acoustic analysis of speech

[Ebooks PDF] download Living with Complexity Norman Donald A full chapters
100% (4)
[Ebooks PDF] download Living with Complexity Norman Donald A full chapters
24 pages
Samsung Full Apps List
No ratings yet
Samsung Full Apps List
24 pages
DAFX: Digital Audio Effects
From Everand
DAFX: Digital Audio Effects
Udo Zölzer
3.5/5 (2)
Engineering Principles and Techniques in Room Acoustics Prediction
No ratings yet
Engineering Principles and Techniques in Room Acoustics Prediction
8 pages
A History of Spatial Music
100% (1)
A History of Spatial Music
9 pages
CD 442 Speech Science Spectrograms and Acoustic Analysis Lab Project Instructions
No ratings yet
CD 442 Speech Science Spectrograms and Acoustic Analysis Lab Project Instructions
3 pages
Assistive Device For Blind, Deaf and Dumb
100% (2)
Assistive Device For Blind, Deaf and Dumb
54 pages
Spectrograms
No ratings yet
Spectrograms
5 pages
Auralization
No ratings yet
Auralization
10 pages
Spectrogram
No ratings yet
Spectrogram
17 pages
The Physics and Biology of Audition
No ratings yet
The Physics and Biology of Audition
100 pages
AD Unit-4
No ratings yet
AD Unit-4
38 pages
Instant Download (Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora PDF All Chapter
100% (5)
Instant Download (Ebook PDF) Digital Audio and Acoustics For The Creative Arts by Mark Ballora PDF All Chapter
51 pages
Acoustic Phonetics 1
No ratings yet
Acoustic Phonetics 1
6 pages
JOHN YOUNG Reflexions On Sound Image Design PDF
100% (2)
JOHN YOUNG Reflexions On Sound Image Design PDF
10 pages
Micro-MIDI: A Real Time, Dynamic Microtonal MIDI Application
100% (1)
Micro-MIDI: A Real Time, Dynamic Microtonal MIDI Application
7 pages
Stereo Shuffling A4
No ratings yet
Stereo Shuffling A4
10 pages
History & Development of Delay FX
No ratings yet
History & Development of Delay FX
2 pages
Loudspeaker As Musical Instrument
No ratings yet
Loudspeaker As Musical Instrument
13 pages
Speak and Unspeak With P: by Paul Boersma and Vincent Van Heuven
No ratings yet
Speak and Unspeak With P: by Paul Boersma and Vincent Van Heuven
7 pages
The Music of The Environment: For Making Mu Performances Orga A N
No ratings yet
The Music of The Environment: For Making Mu Performances Orga A N
6 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Introduction To Hearing Aid Features
No ratings yet
Introduction To Hearing Aid Features
24 pages
Waves & Sound
No ratings yet
Waves & Sound
32 pages
AES 122 Paper
No ratings yet
AES 122 Paper
7 pages
The Use of Computer Modeling in Room Acoustics
No ratings yet
The Use of Computer Modeling in Room Acoustics
6 pages
Nord Modular G2 Tutorial
No ratings yet
Nord Modular G2 Tutorial
10 pages
Kahn 1995
100% (1)
Kahn 1995
4 pages
Product Data: PULSE Acoustic Material Testing in A Tube Type 7758
No ratings yet
Product Data: PULSE Acoustic Material Testing in A Tube Type 7758
4 pages
Acousmatics: Chaeff R Calle Dtsclo Ed
No ratings yet
Acousmatics: Chaeff R Calle Dtsclo Ed
3 pages
Masking Frequency Selectivity and The Critical Band
No ratings yet
Masking Frequency Selectivity and The Critical Band
80 pages
Speech Audiometry: at Biswajeet Sarangi, B.Sc. (Audiology & Speech Language Pathology)
No ratings yet
Speech Audiometry: at Biswajeet Sarangi, B.Sc. (Audiology & Speech Language Pathology)
13 pages
Max at Seventeen
No ratings yet
Max at Seventeen
18 pages
24bit 192kHz Music Downloads and Why They Make No Sense - People - Xiph
No ratings yet
24bit 192kHz Music Downloads and Why They Make No Sense - People - Xiph
14 pages
Audio Researchers Are We Not Listening
No ratings yet
Audio Researchers Are We Not Listening
26 pages
Acoustical and Perceptual Characteristics of Alaryngeal Speech
100% (1)
Acoustical and Perceptual Characteristics of Alaryngeal Speech
59 pages
Spatial Hearing
No ratings yet
Spatial Hearing
511 pages
As 60118.14-2007 Hearing Aids Specification of A Digital Interface Device
No ratings yet
As 60118.14-2007 Hearing Aids Specification of A Digital Interface Device
8 pages
Psychoacoustics: Art Medium Sound
No ratings yet
Psychoacoustics: Art Medium Sound
3 pages
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
No ratings yet
A Paradigm For Physical Interaction With Sound in 3-D Audio Space
9 pages
Low-Latency Convolution For Real-Time Application
No ratings yet
Low-Latency Convolution For Real-Time Application
7 pages
Cognitive Approach To Electronic Music Theoretical
No ratings yet
Cognitive Approach To Electronic Music Theoretical
4 pages
Music For Solo Performer by Alvin Lucier in An Investigation of Current Trends in Brainwave Sonification
No ratings yet
Music For Solo Performer by Alvin Lucier in An Investigation of Current Trends in Brainwave Sonification
23 pages
SPEECH
100% (1)
SPEECH
17 pages
The Paradigm of Sonic Effects by Gabriel Santander
No ratings yet
The Paradigm of Sonic Effects by Gabriel Santander
19 pages
Complete Download An Introduction to the Psychology of Hearing 6th Edition Brian Moore PDF All Chapters
100% (6)
Complete Download An Introduction to the Psychology of Hearing 6th Edition Brian Moore PDF All Chapters
81 pages
Physics 1010 Project
No ratings yet
Physics 1010 Project
16 pages
Auditory Neuropathy Morlet
No ratings yet
Auditory Neuropathy Morlet
20 pages
Fletcher Auditory Patterns
100% (1)
Fletcher Auditory Patterns
20 pages
Equal Loudness Level
No ratings yet
Equal Loudness Level
16 pages
Transducer Tutorial
No ratings yet
Transducer Tutorial
33 pages
Abstract: This Paper Considers Three Conceptions of Musical Distance (Or
100% (1)
Abstract: This Paper Considers Three Conceptions of Musical Distance (Or
15 pages
WISHART Trevor - Extended Vocal Technique
100% (1)
WISHART Trevor - Extended Vocal Technique
3 pages
An Economical Hearing Aid Loop
No ratings yet
An Economical Hearing Aid Loop
10 pages
Stewart Spatial Auditory 2010
No ratings yet
Stewart Spatial Auditory 2010
186 pages
Acoustic Measurements On Prosody Using Praat: Bert Remijsen
No ratings yet
Acoustic Measurements On Prosody Using Praat: Bert Remijsen
70 pages
Implementing Loudness Models in Matlab
No ratings yet
Implementing Loudness Models in Matlab
5 pages
Syllabus IMDEA 2022 2023
No ratings yet
Syllabus IMDEA 2022 2023
50 pages
Acoustic Echo Cancellation Using Nlms-Neural Network Structures
No ratings yet
Acoustic Echo Cancellation Using Nlms-Neural Network Structures
4 pages
Ass 3
No ratings yet
Ass 3
3 pages
Fundamentals of Acoustics
From Everand
Fundamentals of Acoustics
Michel Bruneau
3/5 (1)
What's Inside Headphones?
From Everand
What's Inside Headphones?
Arnold Ringstad
No ratings yet
Sign Language To Text Conversion
50% (2)
Sign Language To Text Conversion
27 pages
200 Questions AI-900
No ratings yet
200 Questions AI-900
19 pages
Phonetic Analysis of Khmer PDF
No ratings yet
Phonetic Analysis of Khmer PDF
32 pages
Babylon Help
No ratings yet
Babylon Help
25 pages
Smart-Image-to-Text-and-Text-to-Speech-Reorganization-Using-Machine-Learning
No ratings yet
Smart-Image-to-Text-and-Text-to-Speech-Reorganization-Using-Machine-Learning
5 pages
2
No ratings yet
2
6 pages
Repeat After Me User's Guide
No ratings yet
Repeat After Me User's Guide
5 pages
Trinetra: An Assistive Eye For The Visually Impaired
No ratings yet
Trinetra: An Assistive Eye For The Visually Impaired
6 pages
Vocalware API Reference
No ratings yet
Vocalware API Reference
21 pages
Midterm Exam Multimedia
No ratings yet
Midterm Exam Multimedia
5 pages
Phonetics and Phonology in The Last 50 Years
No ratings yet
Phonetics and Phonology in The Last 50 Years
11 pages
SHRAYAN's Resume
No ratings yet
SHRAYAN's Resume
2 pages
Manual de Instalacion Tts 7 English by Samychantutos
No ratings yet
Manual de Instalacion Tts 7 English by Samychantutos
14 pages
User's Manual: Natural Speech & Complex Sound Synthesizer
No ratings yet
User's Manual: Natural Speech & Complex Sound Synthesizer
17 pages
Optical Character Recognition Based Speech Synthesis: Project Report
0% (1)
Optical Character Recognition Based Speech Synthesis: Project Report
17 pages
ChatGPT For Visually Impaired and Blind
No ratings yet
ChatGPT For Visually Impaired and Blind
6 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
8 pages
Ai - Assisted - Spoken - Language Batch 14
No ratings yet
Ai - Assisted - Spoken - Language Batch 14
8 pages
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
No ratings yet
Development of A Voice-Controlled Personal Assistant For The Elderly and Disabled
6 pages
Language Translator App
No ratings yet
Language Translator App
9 pages
Comparison of Urdu Text To Speech Synthesis Using Unit Selection and HMM Based Techniques PDF
No ratings yet
Comparison of Urdu Text To Speech Synthesis Using Unit Selection and HMM Based Techniques PDF
5 pages
AI Tools for Digital Marketing
No ratings yet
AI Tools for Digital Marketing
17 pages
CALL and Language Learning - pdf-1
No ratings yet
CALL and Language Learning - pdf-1
33 pages
Icall 2012
No ratings yet
Icall 2012
36 pages
High-Quality Text-To-Speech Synthesis: An Overview
No ratings yet
High-Quality Text-To-Speech Synthesis: An Overview
21 pages
Intelligence Hands Free Speech Based SMS System On Android
No ratings yet
Intelligence Hands Free Speech Based SMS System On Android
5 pages
Text Reader For Blind
No ratings yet
Text Reader For Blind
6 pages
Tacotron 2 Speech Synthesis Tutorial
No ratings yet
Tacotron 2 Speech Synthesis Tutorial
68 pages