Speech Compression (2)

Uploaded by

muthukumarveveaham538

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Speech Compression (2)

Uploaded by

muthukumarveveaham538

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

SPEECH COMPRESSION

When you speak:

Air is pushed from your lung through your vocal tract and out of your mouth
comes speech.
For certain voiced sound, your vocal cords vibrate (open and close). The rate at
which the vocal cords vibrate determines the pitch of your voice.

 Women and young children tend to have high pitch (fast vibration) while adult
males tend to have low pitch (slow vibration).
For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not
vibrate but remain constantly opened.
The shape of your vocal tract determines the sound that you make.
As you speak, your vocal tract changes its shape producing different sound.

The shape of the vocal tract changes relatively slowly (on the scale of 10 msec
to 100 msec).

The amount of air coming from your lung determines the loudness of your voice
VOWELS

Diphthongs
SEMIVOWELS
Is the sound that intermediate between a vowel and
consonant
It is quite difficult to characterize.They are having the
vowel like nature
Example : /w/, /I/, /r/ and /y/
CONSONANT
A nasal consonant is a type of consonant produced with a
lowered velum in the
mouth, allowing air to come out through the nose, while the air
is not allowed to
pass through the mouth
The air is flows through the nasal tract with sound being
radiated at the nostrils
The three nasal consonant are distinguished by the place
along the oral tract at
which a total constriction is made.
o For /m/ the constriction is at the lips
o For /n/ the constriction is behind the teeth
o For /Ƞ / constriction is just forward the velum itself
PCM
• Sample Rate
– Nyquist Critria
– Bandwidth Limitation
• Sample Size
– Quantization Levels
– Quantization Noise / Distortion
Compression
• Lossey • Source Independent
• Lossless • Source Dependent
Voice Coding
(Compression)
• Waveform
• Frequency Domain
• Vocoder
Waveform Coding
• DSI - Digital Speech Interpolation
 Law / A Law
• Differential PCM
– ADPCM – Adaptive Differential PCM
• Delta Modulation
– CVSD – Continuously Variable Slope Detector
Frequency Domain
• SBC – Sub Band Coding
Vocoder
• LPC – Linear Predictive Coding
– CELP – Code Excited Linear Prediction
– VSELP – Vector Sum Excited Linear
Prediction
At the transmitter, the speech is divided into segments. Each segment is
analyzed to determine an excitation signal and the parameters of the
vocal tract filter.

In some of the schemes, a model for the excitation signal is transmitted

to the receiver.

The excitation signal is then synthesized at the receiver and used to drive
the vocal tract filter. In other schemes,
the excitation signal itself is obtained using an analysis-by-synthesis
approach.
This signal is then used by the vocal tract filter to generate the speech
signal.
each segment of input speech is analyzed using a bank of band-pass
filters called the analysis filters. The energy at the output of each filter
is estimated at fixed intervals and transmitted to the receiver.

Phonems

Shanon coding theory – 1st coding theorem

120 bits per
Computer - PCM – 8000 samples per second
sample for
Audio , speech 44.1 kbps
singing voice
Bits per sample is 8
Bit rate is 8000 *8= 64,000 bits per second

Speech will be divided 20-30 milli seconds frame

10 LPC coeficients
Bits per coefficient 16 bit (we can give)
10*16=160

(20 bits for other parameters)

So totally, 180 bits
No of frames in a second =50
Total number of bit for one second = bit rate of 1 sec speech =50
9000 bits
Shanon coding
Divide into phonems – 10 phones per second(approx)
128 phonems so we need only 7 bit
So 1 second = 10 x 7= 70 bits is enough
This is called entropy of speech for English
What is phonemes and examples?
phoneme, in linguistics, smallest unit of speech
distinguishing one word (or word element) from
another, as the element p in “tap,” which separates
that word from “tab,” “tag,” and “tan.” A phoneme
may have more than one variant, called an allophone
(q.v.), which functions as a single sound; for
example, the p's of “ ...
Formants are distinctive
frequency components of
the acoustic signal
produced by speech,
musical instruments or
singing. The information
that humans require to
distinguish between speech
sounds can be represented
purely quantitatively by Channel Vocoders
specifying peaks in the
amplitude or frequency
spectrum.
Linear Predictive Coder
Instead of the vocal tract being modeled by a bank of filters, in the
linear predictive coder the vocal tract is modeled as a single linear filter
whose output is related to the input by

where G is called the gain of the filter. As in the case of the channel vocoder,
the input to the vocal tract filter is either the output of a random noise
generator or a periodic pulse generator.
At the transmitter, a segment of speech is analyzed. The parameters obtained
include a decision as to whether the segment of speech is voiced or unvoiced,
the pitch period if the segment is declared voiced, and the parameters of the
vocal tract filter.

The input speech is generally sampled at 8000 samples per second. In the
LPC-10 standard, the speech is broken into 180 sample segments,
corresponding to 22.5 milliseconds of speech per segment.
The Voiced/Unvoiced Decision

the samples of the voiced speech have larger amplitude; that is,
there is more energy in the voiced speech.
The unvoiced speech contains higher frequencies.
Both speech segments have average values close to zero, this means that the unvoiced speech
waveform crosses the x = 0 line more often than the voiced speech sample.

The speech is voiced or unvoiced based on the energy in the segment relative to background
noise and the number of zero crossings within a specified window.

In the LPC-10 algorithm, the speech segment is first low-pass filtered using a filter with a
bandwidth of 1 kHz. The energy at the output relative to the background noise is used to obtain
a tentative decision about whether the signal in the segment should be declared voiced or
unvoiced.
The estimate of the background noise is basically the energy in the unvoiced speech segments.
This tentative decision is further refined by counting the number of zero crossings and
checking the magnitude of the coefficients of the vocal tract filter.
E s t i m a t i n g the Pitch Period

the autocorrelation of a periodic function Rxx(k) will have a maximum when

k is equal to the pitch period. Coupled with the fact that the estimation of the
autocorrelation function generally leads to a smoothing out of the noise, this
makes the autocorrelation function a useful tool for obtaining the pitch
period.
Voiced speech is not exactly periodic
When there is uncertainty about the magnitude of the maximum value, it is
difficult to select a value for the threshold. Another problem occurs because of
the interference due to other resonances in the vocal tract.
LPC-10 algorithm, that uses the average magnitude difference
function (AMDF).

it can be used to identify the pitch period as well as the

voicing condition.
O b t a i n i n g the Vocal Tract F i l t er
In linear predictive coding, the vocal tract is
modeled by a linear filter with the input-output
relationship shown in Equation

if yn are the speech samples in that particular segment, then we

want to choose ai to minimize the average value of en2 where
autocorrelation approach,
In order to compute the filter
coefficients of an Mth-order filter, the
Levinson-Durbin algorithm requires the
computation of all filters of order less
than M. Furthermore, during the
computation of the filter coefficients,
the algorithm generates a set of
constants k known as the reflection
coefficients, or partial correlation
(PARCOR) coefficients.
T r a n s m i t t i n g the Parameters

The parameters that need to be transmitted include the voicing

decision, the pitch period, and the vocal tract filter parameters.
CELP G.728,

A codebook of excitation patterns is constructed. Each

entry in this codebook is an excitation sequence that
consists of a few nonzero values separated by zeros.
Given a segment from the speech sequence to be
encoded, the encoder obtains the vocal tract filter
using the LPC analysis described previously. The
encoder then excites the vocal tract filter with the
entries of the codebook. The difference between the
original speech segment and the synthesized speech
is fed to a perceptual weighting filter, which weights
the error using a perceptual weighting criterion. The
codebook entry that generates the minimum average
weighted error is declared to be the best match. The
index of the best-match entry is sent to the receiver
along with the parameters for the vocal tract filter.
Silence Compression
Silence compression provides a way to squeeze redundancy
out of sound files. The silence compression scheme is
essential for efficient voice communication systems. It allows
significant reduction of transmission bandwidth during a period
of silence.
Voice Message Silence Compression
1.Determine threshold value that can be considered
silence, even though it is not pure silence.
2. Extract the data from the sound file to be compressed
pass through threshold check if it is below the threshold
(considered to be silence) make it pure silence.
3.Run length coding to the manipulated data.
4.Store it as a compressed file.
What are the parameters that are used in silence
compression?

- Silence compression is used in compressing sound files.

- It is equivalent to run length coding on normal data files.

- The parameters are:

1. A threshold value. It is a parameter that specifies, below
which the compression can be considered as silence.
2. A silence code followed by a single byte. It indicates the
numbers of consecutive silence codes are present.
3. To specify the start of a run of silence, which is a
threshold.

Solid Starts - First 100 Days
94% (18)
Solid Starts - First 100 Days
287 pages
Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Read People Like A Book by Patrick King-Edited
62% (65)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (44)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
36 Questions To Fall in Love 1
97% (31)
36 Questions To Fall in Love 1
2 pages
Avenger 2008 Wiring Diagram
50% (4)
Avenger 2008 Wiring Diagram
7 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
LPC Modeling: Unit 5 1.speech Compression
No ratings yet
LPC Modeling: Unit 5 1.speech Compression
13 pages
Unit 2 A
No ratings yet
Unit 2 A
48 pages
Wireless and Mobile Communication_unit2
No ratings yet
Wireless and Mobile Communication_unit2
20 pages
Speech Generation
No ratings yet
Speech Generation
11 pages
LPC Analysis and Synthesis
No ratings yet
LPC Analysis and Synthesis
17 pages
Formant Tracking Using LPC Root Solving
No ratings yet
Formant Tracking Using LPC Root Solving
24 pages
Speech Coders For Wireless Communication
No ratings yet
Speech Coders For Wireless Communication
53 pages
A Pitch Detection Algorithm
100% (1)
A Pitch Detection Algorithm
6 pages
Human Speech Producing Organs: 2.4 Kbps
No ratings yet
Human Speech Producing Organs: 2.4 Kbps
108 pages
The Diagram Outlines The Key Steps Involved in Co
No ratings yet
The Diagram Outlines The Key Steps Involved in Co
20 pages
Multimedia Communications: Speech Compression
No ratings yet
Multimedia Communications: Speech Compression
26 pages
Multimedia Digital Audio
No ratings yet
Multimedia Digital Audio
7 pages
Adaptive Multi Rate Coder Using ACLP
No ratings yet
Adaptive Multi Rate Coder Using ACLP
45 pages
Unit2 1
No ratings yet
Unit2 1
23 pages
Linear Prediction Coding Vocoders: Institute of Space Technology Islamabad
No ratings yet
Linear Prediction Coding Vocoders: Institute of Space Technology Islamabad
15 pages
Subject Related Assignment No.1
No ratings yet
Subject Related Assignment No.1
6 pages
Lecture LPC
No ratings yet
Lecture LPC
7 pages
Digital Audio Concept
No ratings yet
Digital Audio Concept
13 pages
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
No ratings yet
A Noise Suppression System For The AMR Speech Codec: 2.1 Analysis and Synthesis
4 pages
Waveform Coding Techniques: Document ID: 8123
No ratings yet
Waveform Coding Techniques: Document ID: 8123
7 pages
DSP Lab Assignment 4
No ratings yet
DSP Lab Assignment 4
9 pages
Acoustic Parameters For Speaker Verification
No ratings yet
Acoustic Parameters For Speaker Verification
16 pages
Channel Vocoder: Submitted by Srijith. V Paul James Mithun Mathew
No ratings yet
Channel Vocoder: Submitted by Srijith. V Paul James Mithun Mathew
4 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
69 pages
Linear Predictive Coding
No ratings yet
Linear Predictive Coding
4 pages
Unit I Content Beyond Syllabus Introduction To Information Theory What Is "Information Theory" ?
No ratings yet
Unit I Content Beyond Syllabus Introduction To Information Theory What Is "Information Theory" ?
16 pages
Waveform Coding Techniques: Document ID: 8123
No ratings yet
Waveform Coding Techniques: Document ID: 8123
7 pages
Dolby Audio Coders
100% (3)
Dolby Audio Coders
17 pages
D2 Report 2022JTM2399
No ratings yet
D2 Report 2022JTM2399
18 pages
Lab 4a 1D Filtering in Frequency Domain
No ratings yet
Lab 4a 1D Filtering in Frequency Domain
11 pages
Audio Terminology
No ratings yet
Audio Terminology
25 pages
Speech Processing Unit 4 Notes
No ratings yet
Speech Processing Unit 4 Notes
16 pages
Unit-Ii Itc
No ratings yet
Unit-Ii Itc
42 pages
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
No ratings yet
Performance Evaluation of MLP For Speech Recognition in Noisy Environments Using MFCC & Wavelets
5 pages
DC Solved Paper
No ratings yet
DC Solved Paper
20 pages
Speech Enhancement Using Signal Subspace Algorithm
No ratings yet
Speech Enhancement Using Signal Subspace Algorithm
4 pages
Voice Morphing 2
No ratings yet
Voice Morphing 2
29 pages
Music Technology: Digitalization of Sound
No ratings yet
Music Technology: Digitalization of Sound
4 pages
Chapter 2 - Speech Signal Processing
No ratings yet
Chapter 2 - Speech Signal Processing
60 pages
Unit 2 Wireless
No ratings yet
Unit 2 Wireless
159 pages
Audio Compression
No ratings yet
Audio Compression
53 pages
Assignment On Speech
No ratings yet
Assignment On Speech
9 pages
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
No ratings yet
Mel-Scaled Filter Bank: Mel (F) 2595 Log10 (1+f/700)
3 pages
McCree MixedExcitationLPCVocoderModel ieeetSAP95
No ratings yet
McCree MixedExcitationLPCVocoderModel ieeetSAP95
9 pages
Pulse Code Modulation
No ratings yet
Pulse Code Modulation
43 pages
Audio Theory Basics PDF
No ratings yet
Audio Theory Basics PDF
10 pages
Ijetae 0612 54 PDF
No ratings yet
Ijetae 0612 54 PDF
4 pages
Report_24001756
No ratings yet
Report_24001756
15 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
59 pages
1.1.3 Sound
No ratings yet
1.1.3 Sound
6 pages
Configuring Echo Cancellation
No ratings yet
Configuring Echo Cancellation
22 pages
Digital Audio Primer
No ratings yet
Digital Audio Primer
90 pages
A Project Report On A Time-Varying Convergence Parameter For The LMS Algorithm in The Presence of White Gaussian Noise
No ratings yet
A Project Report On A Time-Varying Convergence Parameter For The LMS Algorithm in The Presence of White Gaussian Noise
63 pages
Speech File
No ratings yet
Speech File
14 pages
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
No ratings yet
Time Frequency Analysis and Wavelet Transform Tutorial Time-Frequency Analysis For Voiceprint (Speaker) Recognition
22 pages
Filter Bank: Insights into Computer Vision's Filter Bank Techniques
From Everand
Filter Bank: Insights into Computer Vision's Filter Bank Techniques
Fouad Sabry
No ratings yet
Sound Design and Mixing in Reason
From Everand
Sound Design and Mixing in Reason
Andrew Eisele
3/5 (2)
Adaptive Filter: Enhancing Computer Vision Through Adaptive Filtering
From Everand
Adaptive Filter: Enhancing Computer Vision Through Adaptive Filtering
Fouad Sabry
No ratings yet
The Music Producer's Guide To EQ: The Music Producer's Guide
From Everand
The Music Producer's Guide To EQ: The Music Producer's Guide
Ashley Hewitt
No ratings yet
Digital Audio Formats
From Everand
Digital Audio Formats
Ambrose Delaney
No ratings yet
IEEE_STD_1128-1998_IEEE_Recommended_Practice_for_RF_Absorber_Evaluation_in_the_Range_of_30_MHz_to_5_GHz
No ratings yet
IEEE_STD_1128-1998_IEEE_Recommended_Practice_for_RF_Absorber_Evaluation_in_the_Range_of_30_MHz_to_5_GHz
16 pages
Nuevas GDL
No ratings yet
Nuevas GDL
55 pages
Bt/Aux/Usb/Tuner/Nfc/Mic/Ripall/Dj/Mix: NTX800/All
No ratings yet
Bt/Aux/Usb/Tuner/Nfc/Mic/Ripall/Dj/Mix: NTX800/All
45 pages
r7 DA40 AMM Complete
No ratings yet
r7 DA40 AMM Complete
2,691 pages
GMRT
No ratings yet
GMRT
20 pages
UV-5R.c Om: Radio Should Now Operate The Repeater If in Range
No ratings yet
UV-5R.c Om: Radio Should Now Operate The Repeater If in Range
1 page
Investigation of Transmission of Electromagnetic Wave (Microwave)
No ratings yet
Investigation of Transmission of Electromagnetic Wave (Microwave)
4 pages
DX-1710-2170-65-21i-2F: Electrical Properties
No ratings yet
DX-1710-2170-65-21i-2F: Electrical Properties
1 page
Module 5 (Radio Receivers)
No ratings yet
Module 5 (Radio Receivers)
13 pages
Development of Radio Journalism and Television Journalism
No ratings yet
Development of Radio Journalism and Television Journalism
5 pages
Basic Vector Signal Analysis and Hardware Connectivity (89601200C)
No ratings yet
Basic Vector Signal Analysis and Hardware Connectivity (89601200C)
35 pages
T1002L6R015
No ratings yet
T1002L6R015
2 pages
English Revition Class 8 (2) (2)
No ratings yet
English Revition Class 8 (2) (2)
5 pages
User's Manual: SENTRY Wireless Temperature Measuring System
No ratings yet
User's Manual: SENTRY Wireless Temperature Measuring System
94 pages
Voltage Standing Wave Ratio: December 2010
No ratings yet
Voltage Standing Wave Ratio: December 2010
6 pages
2022 Trinity LB Jessica AUG
No ratings yet
2022 Trinity LB Jessica AUG
21 pages
Active Bass Boost Filter
No ratings yet
Active Bass Boost Filter
1 page
5 Basic Elements of Aristotle's Model
No ratings yet
5 Basic Elements of Aristotle's Model
16 pages
CSFB FR Script BSC6900
No ratings yet
CSFB FR Script BSC6900
6 pages
Spitfire MK IX: Instruction Manual - Bedienungsanleitung - Manuel D'utilisation - Manuale Di Istruzioni
No ratings yet
Spitfire MK IX: Instruction Manual - Bedienungsanleitung - Manuel D'utilisation - Manuale Di Istruzioni
18 pages
Quadriga Documentation v2.2.0
No ratings yet
Quadriga Documentation v2.2.0
215 pages
Pronoun Spotting Error Board
No ratings yet
Pronoun Spotting Error Board
10 pages
Understanding broadband RF Transformer
No ratings yet
Understanding broadband RF Transformer
8 pages
WIFI3
No ratings yet
WIFI3
36 pages
Lesson 3 Tuned Oscillators
No ratings yet
Lesson 3 Tuned Oscillators
25 pages
Db201 P Product Specifications
No ratings yet
Db201 P Product Specifications
2 pages
34 36 00049
No ratings yet
34 36 00049
36 pages
Antenna & Wave Propagation
No ratings yet
Antenna & Wave Propagation
109 pages
SYDC 10 52VHP+ - Dashboard
No ratings yet
SYDC 10 52VHP+ - Dashboard
11 pages