Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
192 views

Speech File

The document discusses three experiments related to speech signal processing. Experiment 1 analyzes the time and frequency domain plots of speech sampled at different frequencies, validating that speech is bandlimited to 4 kHz. Experiment 2 examines the effect of varying bit resolution on speech quality. Experiment 3 studies the significance of using an anti-aliasing filter before resampling speech and analyzes short-term speech parameters in the time domain.

Uploaded by

Vineet Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views

Speech File

The document discusses three experiments related to speech signal processing. Experiment 1 analyzes the time and frequency domain plots of speech sampled at different frequencies, validating that speech is bandlimited to 4 kHz. Experiment 2 examines the effect of varying bit resolution on speech quality. Experiment 3 studies the significance of using an anti-aliasing filter before resampling speech and analyzes short-term speech parameters in the time domain.

Uploaded by

Vineet Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

EXPERIMENT # 1(a)

AIM : Compare the time domain and frequency domain plots of a speech signal sampled at
44.5KHz, 16KHz and 8KHz respectively and thus validate the view that speech signal can be
bandlimited to 4KHz.

THEORY

Speech signal processing on a digital machine needs sampling and storing of the analog
version of the speech signal generated at the output of microphone. Sampling frequency is the
parameter that controls the sampling process. The number of bits per sample is the parameter
that controls the bit resolution. The number of samples/second is more commonly termed as
sampling frequency fs. According to sampling theorem, fs should be greater than or equal to 2
fm. The standard sampling frequency to sample the entire audio range is 44.5 kHz. This is
because 20 kHz is the maximum frequency component and allowing some guard band.

When telephone communication started, with bandwidth being a precious resource, the speech
signal was passed through an anti-aliasing low pass filter with cutoff frequency of 3.3 kHz
and sampled at 8 kHz sampling frequency. Thus, the speech signal collected over telephone
networks will have a message bandwidth of about 4 kHz.

Information beyond 4 kHz is eliminated in case of telephone bandwidth speech. Even though
the sampling frequency seems to be fine for sounds like a and aa, it severely affects other
sounds like s and sh. However, information upto 4 kHz bandwidth seem to be sufficient for
intelligible speech. If the sampling frequency is further decreased from 8 kHz, then
intelligibility of speech degrades significantly. Hence 8 kHz was chosen as the sampling
frequency for the telephone communication. By listening to the three files, namely, file
sampled at 44.1 kHz, file sampled at 16 kHz and file sampled at 8 kHz, one can observe
difference in the naturalness between the speech sampled at 8 kHz and other higher sampling
frequencies. Thus wherever possible, the speech signal sampled 16 kHz should be used for
signal processing.

The speech signal sampled at 8 kHz is termed as narrowband speech and the speech signal
sampled at 16 kHz is termed as wideband speech. For practical speech signal processing, if
the speech signal is available from a telephone or mobile channel, we have left with little
option, but to process the speech signal at 8 kHz sampling frequency. Recently, methods are
being developed to convert narrowband speech to wideband speech. Alternatively, if the
speech signal is available from a wideband channel, then 16 kHz sampling frequency is
suggested. The final proposal is 16 kHz is the optimal sampling frequency for speech.

EXPERIMENTAL SETUP :
The experiment was conducted on the different values of sampling frequency. The speech data
“Should we chase” is used in wav format. MATLAB function audioread was used to read the
sound file and subsequent Fourier Transform was computed with the functions fft and
resample function was used to resample the speech data at different frequencies.
CODE :

close all
clear
clc

[x,fs] = audioread('swchase1.wav');
audioinfo('swchase1.wav') %8khz
x1 = resample(x,89,16); %44.5khz
x2 = resample(x,2,1); %16khz
X = abs(fft(x));
X1 = abs(fft(x1));
X2 = abs(fft(x2));
figure
subplot(3,1,1)
plot(X/max(X))
title('speech at fs=8KHz')
xlabel('Hz')
subplot(3,1,2)
plot(X1/max(X1))
title('speech at fs=44.5KHz')
xlabel('Hz')
subplot(3,1,3)
plot(X2/max(X2))
title('speech at fs=16KHz')
xlabel('Hz')

OBSERVATION :
CONCLUSION :
From the frequency plot of all sampled frequencies we can conclude that speech signal is
bandlimited to 4KHz.
EXPERIMENT # 1(b)

AIM : To study the significance of bit resolution of sentence “Should we chase” at Fs= 8KHz
and R = 16bits/sample and R = 8bit/sample.

THEORY

After the sampling frequency, the next important parameter in the digitization process of
speech is bit resolution. The number of bits used for storing each sample of speech is termed
as bit resolution. The number of bits/sample in turn depends on the number of quantization
levels used during analog to digital conversion. More the number of quantization levels, finer
will be the quantization step and hence better will be the information preserved in the
digitized form. However, more will be the requirement of number of bits/sample. Hence, it is
a tradeoff between the number of bits and information representation. The effect of bit
resolution can be analyzed experimentally. For this experiment, the sampling frequency of 8
kHz is used.
All speech signal processing applications invariably use 16 bits/sample as bit resolution. The
number of quantization levels will therefore be 2^16=65536 and are found to be optimal for
preserving information present in the analog version of the speech signal. The next lower bit
length possible with binary power is 8 bits. With 8 bits, the number of quantization levels will
be 2^8=256. As it can be observed, the number of quantization levels are significantly lower
compared to the 16 bit case and hence poor representation of information in the quantized
signal.

EXPERIMENTAL SETUP :
The experiment was conducted on the different values of bit rate . The speech data “Should
we chase” is used in wav format. MATLAB function audioread was used to read the sound
file and audiowrite was used to write the speech data with different configuration (different
value bit rate).

CODE :

close all
clear
clc

[x,fs] = audioread('swchase1.wav');
audioinfo('swchase1.wav')
audiowrite('swchase2.wav',x,fs,'BitsPerSample',8)
audioinfo('swchase2.wav')

RESULT :
Sound quality is slightly degraded by reducing Bits per sample. File size of two speech is
greatly affected from 36.4 Kb in 16 bps to 18.2 Kb in 8bps.
EXPERIMENT # 1(c)

AIM : To study the significance of anti-aliasing filter : The sentence “should we


chase” is recorded at R = 16bits/sample and fs = 8KHz. It is re-sampled to 4KHz by
i. Directly taking alternate samples
ii. First passing through anti aliasing LPF (cut off<=2KHz) and then taking
alternate samples. For all the three signals time domain and frequency domain
plots are obtained.

THEORY

An anti-aliasing filter (AAF) is a filter used before a signal sampler to restrict the bandwidth
of a signal to approximately or completely satisfy thorem over the band of interest. Since the
theorem states that unambiguous reconstruction of the signal from its samples is possible
when the power of frequencies above the Nyquist frequency is zero, a real anti-aliasing filter
trades off between bandwidth and aliasing. A realizable anti-aliasing filter will typically
either permit some aliasing to occur or else attenuate some in-band frequencies close to the
Nyquist limit. For this reason, many practical systems sample higher than would be
theoretically required by a perfect AAF in order to ensure that all frequencies of interest can
be reconstructed, a practice called oversampling.
Any part of the signal or noise that is higher than a half of the sampling rate will cause
aliasing. In order to avoid this problem, the analog signal is usually filtered by a low pass
filter prior to being sampled, and this filter is called an anti-aliasing filter.
As an example, an audio signal contains frequency component up to 20 KHz. The Nyquist
sampling theorem states a required sampling frequency of 40 Khz. The anti-aliasing would
have a cut-off frequency of 20 KHz, but since this is not an ideal filter usually the sampling
frequency used goes from 44.1 KHz to 96 KHz, allowing a transition band of at least 2 KHz.

EXPERIMENTAL SETUP :
The experiment was conductedto see the significance of anti-aliasing filter on the speech data
“Should we chase” in wav format with fs = 8KHz resampled to 4KHz and bit resolution =
16bit/sample. MATLAB function audioread was used to read the sound file. Matlab
command fir1 is used as FIR low pass filter for anti aliasing filter and .audiowrite was used
to write the speech data with different configuration (different sampling frequency).
CODE

close all
clear
clc

[x,fs] = audioread('swchase1.wav');
audioinfo('swchase.wav')
l = length(x);
X = abs(fftshift(fft(x)));
y = x(1:2:end);
l_y = length(y);
Y = abs(fftshift(fft(y)));
audiowrite('converted.wav',y,fs/2)
figure
subplot(2,1,1)
plot(x)
title('original speech signal in time domain')
subplot(2,1,2)
plot(-l/2:l/2-1,X/max(X))
title('original speech signal in freq domain')
figure
subplot(2,1,1)
plot(y)
title('taking alternate samples in time domain')
subplot(2,1,2)
plot(-l_y/2:l_y/2-1,Y/max(Y))
title('taking alternate samples in freq domain')
f = 2000; % cutoff frequency
o = 40; % order of the filter
fc = f/fs; % normalized cut-off frequency (as of signal freq)
[b,a] = fir1(o,fc); % Nr and Dr coeff. of FIR filter
figure
freqz(b,a); % filter frequency response
x_f_iir = filter(b,a,x);
y_f = x_f_iir(1:2:end);
l_f = length(y_f);
Y_f = abs(fftshift(fft(y_f)));
figure
subplot(2,1,1)
plot(y_f)
title('Alternate samples of filtered Speech Signal in time domain')
subplot(2,1,2)
plot(-l_f/2:l_f/2-1,Y_f/max(Y_f))
title('Alternate samples of filtered Speech Signal in freq domain')
OBSERVATION:
RESULT: Time response after passing through LPF gives smoother result.We can observe
that no information was lost after passing thought cut off frequency of 2KHz.
EXPERIMENT # 3(a)

AIM : To study short term speech parameters in time domain short time energy, zero crossing
rate and pitch period without overlapping.

THEORY

1. Short Time Energy: Short-Time energy is a simple short-time speech measurement. It is


defined as:

This measurement can in a way distinguish between voiced and unvoiced speech segments,
since unvoiced speech has significantly smaller short- time energy. For the length of the window
a practical choice is 10-20 msec that is 160-320 samples for sampling frequency 16kHz. This
way the window will include a suitable number of pitch periods so that the result will be neither
too smooth, nor too detailed.
The rectangular window has smaller bandwidth for the same length, compared to the Hamming
window. So the results are expected to be smoother. In any case, as the length of the window
increases short-time energy is less detailed.
For the experiment, we have used rectangular window.
2. Zero Crossing Rate: ZCR of any signal frame is the rate at which a signal changes its sign
during the frame. It denotes the number of times the signal changes value, from positive to
negative and vice versa, divided by the total length of the frame. Zero crossing rate of each
frame is calculated as:

w(n) = 1/2N, 0 <= n <= N-1 This measure could allow the discrimination between voiced and
unvoiced regions of speech, or between speech and silence. Unvoiced speech has in general,
higher zero-crossing rate. The signals in the graphs are normalized. ZCR of voiced speech is
lower than that of ZCR of unvoiced speech.
STE and ZCR are calculated to extraxt voiced and unvoiced parts of speech signal.
3. Pitch Period: In a short term we may treat the voiced speech segments to be periodic for all
practical analysis and processing. The periodicity associated with such segments is defined is
'pitch period To' in the time domain and 'Pitch frequency or Fundamental Frequency Fo' in the
frequency domain. Unless specified, the term 'pitch' refers to the fundamental frequency ' Fo'.
Pitch is an important attribute of voiced speech.
EXPERIMENTAL SETUP :
The experiment was conducted on “Should we chase” speech data. MATLAB function
audioread was used to read the sound file. Rectangular window of length N=201 without
overlapping was used to calculate STE, ZCR and Pitch period.

CODE :

close all
clear
clc

[x,fx] = audioread('swchase1.wav');
audioinfo('swchase1.wav')
l = length(x);
N = 201; %frame size
xx = x.^2;
total_frame = l/N;
E = [];
ZCC = [];
for a = 0:N:l-N
w = [zeros(a,1);ones(N,1);zeros(l-a-N,1)];
h = w.^2;
E = [E sum(xx.*h)/N];
s = x.*w;
s = s/mean(s);
zc_sum = 0;
for b = 1:l-1
zc_sum = zc_sum + abs(sign(s(b))-sign(s(b+1)));
end
ZCC = [ZCC 0.5*zc_sum];
end
time = 0:N:l-N;
figure
subplot(3,1,2)
plot(time,E)
title('Energy spectrum')
subplot(3,1,1)
plot(0:l-1,x)
title('speech signal')
subplot(3,1,3)
plot(time,ZCC)
title('ZCC')

for a = 20:200
sum = 0;
for b = 0:l-a
sum = sum + x(b+1)*x(b+a);
end
phi(a-19) = sum/N;
end
figure
plot(20:200,phi)
title('pitch'
OBSERVATION:

RESULT: Energy spectrum is following the speech signal and ZCR gives good idea of
frequency
EXPERIMENT # 3(b)

AIM : To study short term speech parameters in time domain short time energy, zero crossing
rate and pitch period with overlapping window.

THEORY

1. Short Time Energy: Short-Time energy is a simple short-time speech measurement. It is


defined as:

This measurement can in a way distinguish between voiced and unvoiced speech segments,
since unvoiced speech has significantly smaller short- time energy. For the length of the window
a practical choice is 10-20 msec that is 160-320 samples for sampling frequency 16kHz. This
way the window will include a suitable number of pitch periods so that the result will be neither
too smooth, nor too detailed.
The rectangular window has smaller bandwidth for the same length, compared to the Hamming
window. So the results are expected to be smoother. In any case, as the length of the window
increases short-time energy is less detailed.
For the experiment, we have used rectangular window.
2. Zero Crossing Rate: ZCR of any signal frame is the rate at which a signal changes its sign
during the frame. It denotes the number of times the signal changes value, from positive to
negative and vice versa, divided by the total length of the frame. Zero crossing rate of each
frame is calculated as:

w(n) = 1/2N, 0 <= n <= N-1 This measure could allow the discrimination between voiced and
unvoiced regions of speech, or between speech and silence. Unvoiced speech has in general,
higher zero-crossing rate. The signals in the graphs are normalized. ZCR of voiced speech is
lower than that of ZCR of unvoiced speech.
STE and ZCR are calculated to extraxt voiced and unvoiced parts of speech signal.
3. Pitch period estimation:
In a short term we may treat the voiced speech segments to be periodic for all practical
analysis and processing. The periodicity associated with such segments is defined is 'pitch
period To' in the time domain and 'Pitch frequency or Fundamental Frequency Fo' in the
frequency domain. Unless specified, the term 'pitch' refers to the fundamental frequency ' Fo'.
Pitch is an important attribute of voiced speech.
EXPERIMENTAL SETUP :
The experiment was conducted on “Should we chase” speech data . MATLAB function
audioread was used to read the sound file. Rectangular window of length N=200 with
overlapping of 50% was used to calculate STE, ZCR and Pitch period .

CODE :

close all
clear
clc

[x,fx] = audioread('swchase1.wav');
audioinfo('swchase1.wav')
l = length(x);
N = 200; %frame size
xx = x.^2;
total_frame = l/N;
E = [];
ZCC = [];
for a = 0:N/2:l-N
w = [zeros(a,1);ones(N,1);zeros(l-a-N,1)];
h = w.^2;
E = [E sum(xx.*h)/N];
s = x.*w;
s = s/mean(s);
zc_sum = 0;
for b = 1:l-1
zc_sum = zc_sum + abs(sign(s(b))-sign(s(b+1)));
end
ZCC = [ZCC 0.5*zc_sum];
end

figure
subplot(3,1,2)
plot(E)
title('Energy spectrum')
subplot(3,1,1)
plot(0:l-1,x)
title('speech signal')
subplot(3,1,3)
plot(ZCC)
title('ZCC')

for a = 20:200
sum = 0;
for b = 0:l-a
sum = sum + x(b+1)*x(b+a);
end
phi(a-19) = sum/N;
end
figure
plot(20:200,phi)
title('pitch')
OBSERVATION:

RESULT: Using an overlapping window gives better results for the estimation of STE, ZCR
and Pitch period.

You might also like